Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing bytes doesn't work #346

Open
gjask opened this issue Oct 15, 2024 · 1 comment
Open

Parsing bytes doesn't work #346

gjask opened this issue Oct 15, 2024 · 1 comment

Comments

@gjask
Copy link

gjask commented Oct 15, 2024

In [1]: from tatsu import compile
   ...: 
   ...: parser = compile(r"""
   ...:     start       = text ;
   ...:     text        = { sentence }* ;
   ...:     sentence    = { word }+ { punctuation }* ;
   ...:     word        = /\w+/ ;
   ...:     punctuation = /[?!.,;\"']/ ;
   ...: """)
   ...: 

In [2]: parser.parse("Hello world!")
Out[2]: [(['Hello', 'world'], ['!'])]

In [3]: parser.parse(b"Hello world!")
Out[3]: [(['b'], ["'"]), (['Hello', 'world'], ['!', "'"])]

I hoped to be able to parse bytes instead of str because I am trying to use TatSu to parse file format that uses ASCII for data structure but also can contain strings in other encoding. The other encoding is specified in content of the file. So I should try to bytes.decode() these parts only after parsing what encoding is used.

I kinda expected to get TypeError or similar on this. But actual error behaviour is more interesting. It seem like TatSu parser tries to cast input data to str which in turn gets bytes.__repr__() return value instead of contents.

Would it be possible to implement accepting bytes as data for parsing? If not I guess there should be some kind of type check on input instead of blind cast to str which in turn leads to parsing python representation of incompatible types instead of data itself.

@apalala
Copy link
Collaborator

apalala commented Oct 18, 2024

Let me think about this...

It may be solved by adding str() somewhere, but that would be a patch.

If someone could provide a pull request to deal with this, it would be ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants