You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In [1]: fromtatsuimportcompile
...:
...: parser=compile(r""" ...: start = text ; ...: text = { sentence }* ; ...: sentence = { word }+ { punctuation }* ; ...: word = /\w+/ ; ...: punctuation = /[?!.,;\"']/ ; ...: """)
...:
In [2]: parser.parse("Hello world!")
Out[2]: [(['Hello', 'world'], ['!'])]
In [3]: parser.parse(b"Hello world!")
Out[3]: [(['b'], ["'"]), (['Hello', 'world'], ['!', "'"])]
I hoped to be able to parse bytes instead of str because I am trying to use TatSu to parse file format that uses ASCII for data structure but also can contain strings in other encoding. The other encoding is specified in content of the file. So I should try to bytes.decode() these parts only after parsing what encoding is used.
I kinda expected to get TypeError or similar on this. But actual errorbehaviour is more interesting. It seem like TatSu parser tries to cast input data to str which in turn gets bytes.__repr__() return value instead of contents.
Would it be possible to implement accepting bytes as data for parsing? If not I guess there should be some kind of type check on input instead of blind cast to str which in turn leads to parsing python representation of incompatible types instead of data itself.
The text was updated successfully, but these errors were encountered:
I hoped to be able to parse
bytes
instead ofstr
because I am trying to use TatSu to parse file format that uses ASCII for data structure but also can contain strings in other encoding. The other encoding is specified in content of the file. So I should try tobytes.decode()
these parts only after parsing what encoding is used.I kinda expected to get
TypeError
or similar on this. But actualerrorbehaviour is more interesting. It seem like TatSu parser tries to cast input data tostr
which in turn getsbytes.__repr__()
return value instead of contents.Would it be possible to implement accepting
bytes
as data for parsing? If not I guess there should be some kind of type check on input instead of blind cast tostr
which in turn leads to parsing python representation of incompatible types instead of data itself.The text was updated successfully, but these errors were encountered: