Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argparse's FileType argument causes UnicodeDecodeError on CLI parsing of .msg files #304

Open
jeremybmerrill opened this issue Dec 30, 2022 · 0 comments

Comments

@jeremybmerrill
Copy link

Describe the bug
When parsing .msg files in Python 3.9, I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte.

To Reproduce
Steps to reproduce the behavior:

  1. In Python 3.9, with msg_parser installed
  2. msg_parser -i path_to_my_msgfile.msg -e .
  3. Observe error

Expected behavior
No error!

Desktop (please complete the following information):

  • OS: Mac OS X
  • Python: 3.9.13
  • Version: msg_parser @ d16260d

Additional context

I was able to fix the problem by removing https://github.com/vikramarsid/msg_parser/blob/master/msg_parser/cli.py#L40; after that, everything worked fine. Evidently, the argparse FileType argument tries to open the file as utf-8, which it is not. The problem can also be fixed by changing the line to specify that the file is binary, type=FileType(mode="rb"),

Happy to submit a PR, but I cannot test if the type=Filetype() line is expected to do something in particular. As with #303 , I cannot submit any test files because all of my .msg files are confidential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant