-
-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 BOM not accounted for in JsonLocation.getByteOffset() #533
Comments
JsonLocation.getByteOffset()
Thank you for reporting this and doing troubleshooting. Sounds like a bug. |
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
I submitted a PR a couple of weeks ago. Please let me know of any feedback or necessary changes. |
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
UTF8StreamJsonParser tracks read pointer (offset) and bytes processed separately and uses those to generate JsonLocation. When the byte payload starts with a UTF BOM, ByteSourceJsonBootstrapper processes a few bytes ahead of the parser, moves/increases the offset and passes the newly computed offset to the parser without telling it some bytes have been pre-processed. With this change, the number of bytes pre-processed for encoding detection is passed to the parser. JsonLocation instances returned by the parser now point to the correct byte offset when payload has a BOM. Issue: FasterXML#533
Looks good. I can merge this in The only thing we now need is the Contributor License Agreement, found from https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf Thank you in advance! |
Just submitted contributor agreement. |
@fabienrenaud I think I'd rather backport it only in 2.10 (had to do that manually). But 2.10.0 should be out within month. |
Version: Jackson 2.9.8
parser.getCurrentLocation().getByteOffset()
returns the wrong byte offset for the underlying byte array when the payload start with a BOM.The json parser processes well such json payloads but the
JsonLocation
it returns ignores the offset introduced by the BOM.Full standalone repro:
Output:
You can see the result for the BOM payload gets shifted to the left by exactly 3 bytes while other payloads with or without padding characters are handled as expected.
The text was updated successfully, but these errors were encountered: