Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Unicode encoding in Python 3 #283

Open
wzyboy opened this issue Feb 25, 2015 · 0 comments
Open

Error with Unicode encoding in Python 3 #283

wzyboy opened this issue Feb 25, 2015 · 0 comments
Labels

Comments

@wzyboy
Copy link

wzyboy commented Feb 25, 2015

When running twitter-archiver in Python 3, non-ASCII characters in tweets are incorrectly saved as "bytes":

$ head -1 archive.txt
510676XXXXXX3200 b'2014-09-13 14:28:05 CST <XXXXXX>  \xe4\xb8\x96\xe7\x95\x8c\xe6\x98\xaf\xe4\xbb\x8e 1970-01-01 \xe5\xbc\x80\xe5\xa7\x8b\xe7\x9a\x84\xe2\x80\xa6'

@rerox points out that this is caused by this line. When running in Python 3, file should be opened with 'wb' mode instead of 'w'. A better approach is to use codecs.

@RouxRC RouxRC added the CLI label Feb 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants