WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly #32

nealmcb · 2021-04-02T03:47:10Z

This warning message

WARNING: problems with charset recognition', b'\x1b'

shows up for me and a variety of folks as documented at https://stackoverflow.com/questions/50407738/python-disable-iptcinfo-warning, and as previously documented here with a sample image (but abandoned for some reason) at #23.

I think the code is parsing the value of Iptc.Envelope.CharacterSet wrong. The manner of specifying a character set that I observe, and that is used in the example image at #23, is a string of control characters, e.g. the three characters ESC % G for UTF-8, as discussed at https://en.wikipedia.org/wiki/ISO/IEC_2022 and as documented in the standard:

1:90 Coded Character Set
Optional, not repeatable, up to 32 octets, consisting of one or
more control functions used for the announcement, invocation or
designation of coded character sets. The control functions follow
the ISO 2022 standard and may consist of the escape control
character and one or more graphic characters.

$ exiv2 -p a charset-example.jpg |egrep 'Iptc|Xmp'|cat -vt
Iptc.Envelope.ModelVersion                   Short       1  4
Iptc.Envelope.CharacterSet                   String      3  ^[%G
Iptc.Application2.RecordVersion              Short       1  4
Iptc.Application2.Caption                    String     16  a seller of eggs
Xmp.xmp.ModifyDate                           XmpText    25  2012-08-13T08:55:36-05:00
Xmp.xmp.CreatorTool                          XmpText    37  Windows Photo Editor 10.0.10011.16384
Xmp.dc.description                           LangAlt     1  lang="x-default" a seller of eggs

But iptcinfo3.py is trying to unpack the three characters as an unsigned short in network order (!H) and failing (perhaps because shorts are 2 characters?)

https://github.com/jamesacampbell/iptcinfo3/blob/a9cea6cb1981e4ad29cf317d44419e4fd45c2170/iptcinfo3.py#L802-L810

It then throws a confusing warning suggesting that the character set is \x1b, which is decimal 27. I initially went on a wild goose chase, finding character set 27 to not even be listed in the IPTC Spec I looked at, though itis listed in the official list of IANA Character Sets where it is identified as ISO-10646-UTF-1 27 Universal Transfer Format (1), this is the multibyte encoding, that subsets ASCII-7. But again, as far as I can see, the IPTC standard doesn't indicate character sets via simple binary numbers, but via old-school ISO 2022 escape sequences.

FWIW, the tool that created my metadata was Xmp.xmp.CreatorTool: Windows Photo Editor 10.0.10011.16384.

The text was updated successfully, but these errors were encountered:

james-see · 2021-10-30T13:57:46Z

@nealmcb finally having some time to look at this. This makes sense to me. What do you propose the code change be? Id happily review a PR.

james-see · 2023-09-04T02:31:26Z

Not working on this unless sponsored.

nealmcb · 2023-09-04T15:58:23Z

Thank you @james-see for your work on this nice library! And thank you for the previous invitation to submit a PR.

I think it would be useful to keep issues like this (which are not "completed") open even though you don't plan to work on them yourself. Using the help wanted label would make that clear, and allow you to filter them out when prioritizing things.

james-see · 2023-09-04T17:13:37Z

@nealmcb thank you for the thoughfulness here. I did not think of that. I am reopening the ones that need work and adding that label based on this suggestion. Cheers!

nealmcb mentioned this issue Apr 2, 2021

n/a #23

Closed

nealmcb changed the title ~~WARNING: problems with charset recognition', b'\x1b' from parsing Iptc.Envelope.CharacterSet incorrectly~~ WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly Apr 2, 2021

james-see self-assigned this Oct 30, 2021

james-see added this to the iptcinfo3 next version milestone Oct 30, 2021

james-see closed this as completed Sep 4, 2023

james-see reopened this Sep 4, 2023

james-see added the help wanted label Sep 4, 2023

github-project-automation bot added this to Clean up the issues backlog Aug 26, 2024

github-project-automation bot moved this to Done in Clean up the issues backlog Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly #32

WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly #32

nealmcb commented Apr 2, 2021

james-see commented Oct 30, 2021

james-see commented Sep 4, 2023

nealmcb commented Sep 4, 2023

james-see commented Sep 4, 2023

WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly #32

WARNING: problems with charset recognition (b'\x1b') from parsing Iptc.Envelope.CharacterSet incorrectly #32

Comments

nealmcb commented Apr 2, 2021

james-see commented Oct 30, 2021

james-see commented Sep 4, 2023

nealmcb commented Sep 4, 2023

james-see commented Sep 4, 2023