Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kite Avro SDK: Merge Can Create 'type' Lists with Default Not as First Element #446

Open
jake-greene opened this issue May 5, 2016 · 3 comments

Comments

@jake-greene
Copy link

jake-greene commented May 5, 2016

From the Avro spec

Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.

However, merging two record schemas does not always enforce this rule. For example: "type":["string","null"], "default":null which should be "type":["null", "string"], "default":null. I can reproduce this bug with the following:

  1. A Json String with a null value for key X
  2. A Json String with a non-null value for key X
  3. A Json String with no entry for X (DNE)
  4. Schemas inferred from each of the sample Json Strings
  5. The schemas merged in the following way: merge(non-null, merge(null, dne)) or merge(non-null, merge(dne, null))
scala> val nul = """{"key":null}"""
nul: String = {"key":null}

scala> val dne = """{"other":3}"""
dne: String = {"other":3}

scala> val str = """{"key":"hello"}"""
str: String = {"key":"hello"}

scala> def stream(s: String): InputStream = new ByteArrayInputStream(s.getBytes("UTF-8"))
stream: (s: String)java.io.InputStream

scala> val nulSchema = JsonUtil.inferSchema(stream(nul), "com.example", 1)
nulSchema: org.apache.avro.Schema = {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":"null","doc":"Type inferred from 'null'"}]}

scala> val dneSchema = JsonUtil.inferSchema(stream(dne), "com.example", 1)
dneSchema: org.apache.avro.Schema = {"type":"record","name":"example","namespace":"com","fields":[{"name":"other","type":"int","doc":"Type inferred from '3'"}]}

scala> val nPlusDne = SchemaUtil.merge(dneSchema, nulSchema)
nPlusDne: org.apache.avro.Schema = {"type":"record","name":"example","namespace":"com","fields":[{"name":"other","type":["null","int"],"doc":"Type inferred from '3'","default":null},{"name":"key","type":"null","doc":"Type inferred from 'null'","default":null}]}

scala> val strSchema = JsonUtil.inferSchema(stream(str), "com.example", 1)
strSchema: org.apache.avro.Schema = {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":"string","doc":"Type inferred from '\"hello\"'"}]}

scala> val merged = SchemaUtil.merge(strSchema, nPlusDne)
[WARNING] Avro: Invalid default for field key: null not a ["string","null"]
merged: org.apache.avro.Schema = {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":["string","null"],"doc":"Type inferred from '\"hello\"'","default":null},{"name":"other","type":["null","int"],"doc":"Type inferred from '3'","default":null}]}

The final merge produces "type":["string","null"], "default":null, despite the type of the default value needing to be the first element of the type list.

@mkwhitacre
Copy link
Contributor

@jake-greene issues for Kite are usually tracked on the projects JIRA instance.[1]

[1] - https://issues.cloudera.org/projects/KITE

@jake-greene
Copy link
Author

Thank you, @mkwhitacre. Should I create an issue there and close this one?

@mkwhitacre
Copy link
Contributor

Probably would be good so this issue doesn't get forgotten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants