-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CBORGenerator.Feature.LENIENT_UTF_ENCODING
for lenient handling of Unicode surrogate pairs on writing
#222
Conversation
fc5aaa5
to
df54c36
Compare
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Show resolved
Hide resolved
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Show resolved
Hide resolved
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Outdated
Show resolved
Hide resolved
Ok, first of all, thank you for the contribution! I am bit concerned about the fact that some code is trying to output invalid Unicode content, essentially, but as long as that is documented and user has to explicitly enable said feature I think that is fine. I added some small notes in PR itself, but 2 bigger questions:
|
* the output buffer, but not all characters are single-byte (ASCII) | ||
* characters. | ||
*/ | ||
private final int _shortUTF8Encode2(char[] str, int i, int end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious as to why this was removed? Or did it just get moved and diff is confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been removed because this code was duplicated: _shortUTF8Encode2
and _encode2
were basically the same with the difference that one was taking a String
as an argument and the other one was taking a char[]
. Since this code contains some really non trivial logic I thought it would be better to not duplicate it.
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Show resolved
Hide resolved
Ok fair enough I will base the PR on 2.12 |
e73a98e
to
39109c6
Compare
39109c6
to
b35e164
Compare
If enabled, the generator will output the Unicode Replacement Character for invalid unicode sequence (invalid surrogate chars in the Java String) instead of failing with an IllegalArgumentException
b35e164
to
5760c70
Compare
@guillaumebort Apologies for slow follow-up here: now back to getting this merge for 2.12. https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf and usually easiest to print, fill & sign, scan/photo, email to |
Thanks! I need to handle that with the legal team at my company and I do it ASAP. |
Sounds good -- we have both individual CLA that I linked earlier (and used by most contributors), as well as Corporate CLA (CCLA), at https://github.com/FasterXML/jackson/blob/master/contributor-agreement-corporate.txt if that makes more sense. |
Should be ok now, my company (Datadog) sent a a signed copy of the Corporate CLA over. |
@guillaumebort For some reason I don't see one yet? I assume it'd be sent to |
Received the CLA. |
CBORGenerator.Feature.LENIENT_UTF_ENCODING
for lenient handling of Unicode surrogate pairs on writing
Ended up merging this manually, hence closing PR, feature is in, tests, will be included in |
If enabled, the generator will output the Unicode Replacement Character for invalid unicode sequence (invalid surrogate chars in the Java String) instead of failing with an IllegalArgumentException.
Also this PR remove the code duplication between
_shortUTF8Encode2
and_encode2
.