Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in Python3 #61

Open
JarryShaw opened this issue Nov 30, 2018 · 3 comments
Open

UnicodeDecodeError in Python3 #61

JarryShaw opened this issue Nov 30, 2018 · 3 comments

Comments

@JarryShaw
Copy link
Contributor

This is an issue with thrift (dependency of this library), an open issue is already filed to that project.

Environment:

  • Operating System: Windows 10 Pro (Simplified Chinese)
  • Python Interpreter: Python 3.6.6
  • osquery Version: 3.3.0
  • osquery-python Version: 3.0.5

When querying, UnicodeDecodeError raised with error message: "'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte" from thrift.compat.binary_to_str, which is because the encoding of bin_val parameter should be "gbk".

Maybe try hacking the source code of thrift and include it as a vendor package when distribution? (just as pipenv and other projects do)

@theopolis
Copy link
Member

@JarryShaw, did you have a chance to follow up on the comments on the Thrift bug report?

@JarryShaw
Copy link
Contributor Author

JarryShaw commented Aug 10, 2019

It's been quite a long time ago and I'm trying to reproduce the issue recently. Btw, I just found two other issues 🤦‍♂ I'll make a pull request on one of them.

@JarryShaw
Copy link
Contributor Author

JarryShaw commented Aug 10, 2019

Also, FYI, you can find the failed query at THRIFT-4677.

It should be linked to Windows internal issue. Some of the Chinese contexts are encoded with utf8, such as os_version, whilst some of them are encoded with system legacy encoding (cp936/gbk/gb2312 in my case), for example scheduled_tasks.

Also, according to James, contributor of Thrift, "Thrift only handles strings as UTF8 internally." Maybe this is some issue related to osquery internal data schema or some design fraud with Thrift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants