Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using other string types besides UTF8.class #399

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oleg-smith
Copy link

No description provided.

@oleg-smith
Copy link
Author

@FelixGV could you please review

@radai-rosenblatt
Copy link
Contributor

i see this adding "defaultStringClass", but where is it used? is it intendd to be used by subclasses? a followup PR?

@oleg-smith
Copy link
Author

oleg-smith commented Oct 20, 2022

it's intended to be used in client's code like this

FastSpecificDatumReader<T> reader = new FastSpecificDatumReader<>(
            writerSchema,
            readerSchema,
            new FastSerdeCache(String.class)
        );

@gaojieliu
Copy link
Collaborator

@oleg-smith
Can this be done by adding string annotations to the reader schema instead of updating the serde since fast-avro is trying to align with the standard avro implementation?

@oleg-smith
Copy link
Author

@gaojieliu can you bring an example of how to instruct the generator to use String. class this way?

@gaojieliu
Copy link
Collaborator

@oleg-smith
If you want to use Java String type for the regular string type, you could put additional annotation like this way:

 "type": {
                "type": "string",
                "avro.java.string": "String"
            }

If you want to use Java String type as the deserialized Map key:

"type": {
        "type": "map",
        "values": "long",
        "java-key-class": "java.lang.String", // required by specific deserializer
        "avro.java.string": "String" // required by generic deserializer
      },

Of coz, you need to use Avro-1.7+ to take advantage of these annotations.

@oleg-smith
Copy link
Author

@gaojieliu I meant how to do it from Java code? I don't have access to schema file

@oleg-smith
Copy link
Author

and how to do it for all String fields at once?

@gaojieliu
Copy link
Collaborator

@oleg-smith
The fast-avro class is generated from the schema string/file, so how come you don't have access to the schema file?
Inside the application, before generating fast classes, I think you could traverse the schema file to add annotation to all the string types, so that we could keep the fast-avro logic align with the vanilla Avro.
Maybe we could build some utility for this in fast-avro to add annotations to all the string types of one schema.

@radai-rosenblatt
What is your thought on this?

@oleg-smith
Copy link
Author

how come - it comes from the external jar as already generated Avro schema

@oleg-smith
Copy link
Author

oleg-smith commented Nov 9, 2022

also, is there a way to make deserializer generation synchronous? it's a bit of undeterministic now

@gaojieliu
Copy link
Collaborator

@oleg-smith
https://github.com/linkedin/avro-util/blob/master/avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastSerdeCache.java#L355
Here is the method to generate a fast specific class synchronously.

Regarding supporting specific CharSequence impl override in fast-avro, I think a utility to add right string annotation could be the right way to go since the string annotated schema can be used by vanilla Avro as well in case fast-avro has a bug, you still could fall back to vanilla Avro with the same outcome.

@oleg-smith
Copy link
Author

Can this method be used in FastSerdeCache?

@gaojieliu
Copy link
Collaborator

yeah, it is a public method in FastSerdeCache.

@radai-rosenblatt
Copy link
Contributor

a config would still be needed - because some folks would like to ignore hints on schemas even if you plan on supporting those hints

@oleg-smith
Copy link
Author

@gaojieliu I mean is there a way to build it and put it into the cache synchronously?

looks like currently it's only async

@radai-rosenblatt
Copy link
Contributor

to clarify what i posted above: even if fast-avro respected "logical types" (which those string hints are), some users (me) would like to be able to override them to get consistent behaviour across all schemas in my codebase at runtime.

so i think making this a config option to fast-avro at runtime (as originally suggested) is a better approach, and definitely needs to happen before fast-avro starts respecting logical types :-)

also - this means that an implementation is required, which is not part of this PR?

@gaojieliu
Copy link
Collaborator

I see.
Such method doesn't exist today, and feel free to add them in FastSerdeCache and it should be fairly straightforward.

@FelixGV
Copy link
Collaborator

FelixGV commented Nov 9, 2022

Regarding supporting specific CharSequence impl override in fast-avro, I think a utility to add right string annotation could be the right way to go since the string annotated schema can be used by vanilla Avro as well in case fast-avro has a bug, you still could fall back to vanilla Avro with the same outcome.

By this, do you mean that we would provide a "schema processing" utility, where the input is a schema containing string fields, and the output would be the same schema where all string fields were overridden to be of a type defined by the user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants