-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct handling of Unicode in py2 #14
Comments
This fixes the lacking unicode support in the default python csv module, by replacing it with the unicodecsv module instead of adding potential buggy code. Fixes ickc#14.
A quick look into the unicodecsv's README seems to suggest backport.csv is a better fit here. I mostly use python 3 (and code in python 3 while pasteurized into python 2). So a backport of csv to python 2 seems better. |
I'm thinking if we should avoid the extra dependency for Python 3 users. Do you know a quick and clean way to do so? |
I don't have any preferences for which module. I just ended up not being able to load in my .csv files when they contained Danish language instead of English, which I had primarily used this filter for in the past. Well we could obviously do a conditional import of the different modules, but that doesn't take care of the dependency in I thought the py3 csv module had the same issues as the py2, but it seems that it was only an issue in py3.0, and it was fixed in 3.1. It seems that the backport.csv module is a pure python implementation, meaning it is slow (according to the author), However I don't know how bit of a deal this is in practice. I could try and make a proposal using this and the io.open() as changes instead? |
Dealing with this in setup.py actually seems quite easy. Environment Markers (PEP 508) is designed for this. However there seems to be some fuzz about old versions of setuptools. Instead of specifying the environment markers in install_requires, it should be set in extra_requires as this is supported since setuptools 18 (e.g., here) It seems you need version 20.6.8 (May 2016) for support to be fully functional in install_requires. |
See reenberg@7e3fa94 for an initial test at using environment markers. It works like a charm. And I don't think it is unreasonable to depend on a fairly new version of setuptools. |
@ickc And with the added backports.csv module instead for py2 (reenberg@bb13cd2). It Actually simplified the code a bit, as there wasn't a need for differentiating between io.BytesIO and io.StringIO any more. However the tests seems to fail on python 2, as some of the parameters is unicode instead of str:
However this is an issue in the master branch as well, so doesn't seem related to what i have changed. See full log: pantable.test.txt Should I make a PR for this, or do you see anything that needs changing? It has minimum impact on py3 as requested. |
Actually I can see that something changed, since the test_read_data now also fails for Is there a particular reason why this is done so? Can there be any valid ways to actually sneak a bool or anything else than a string into this variable? It comes from the yaml meta-data block. |
Also see #21 |
I'm not a py3 coder, so this is regarding py2.
When trying to include a .csv file which is utf-8 encoded, the filter fails miserably as panflute expects unicode data (it calls
text.encode('utf-8')
on line 338 of panflute/tools.py)You are using the built in
csv
module, which clearly states in the docs, that "The csv module doesn’t directly support reading and writing Unicode [...]". Aka, you should convert the read data to unicode before returning it asraw_table_list
inread_data()
at line 197 of pantable.py.Or perhaps more preferrable, just use the unicodecsv module, which claims to be a "drop-in replacement for Python 2’s csv module which supports unicode strings without a hassle."
I tested unicodecsv in a local checkout, and it seems to work flawlessly. I just added
import unicodecsv as csv
, not caring about anything than py2 :)The text was updated successfully, but these errors were encountered: