Correct handling of Unicode in py2 #14

reenberg · 2017-07-19T12:06:33Z

I'm not a py3 coder, so this is regarding py2.

When trying to include a .csv file which is utf-8 encoded, the filter fails miserably as panflute expects unicode data (it calls text.encode('utf-8') on line 338 of panflute/tools.py)

You are using the built in csv module, which clearly states in the docs, that "The csv module doesn’t directly support reading and writing Unicode [...]". Aka, you should convert the read data to unicode before returning it as raw_table_list in read_data() at line 197 of pantable.py.

Or perhaps more preferrable, just use the unicodecsv module, which claims to be a "drop-in replacement for Python 2’s csv module which supports unicode strings without a hassle."

I tested unicodecsv in a local checkout, and it seems to work flawlessly. I just added import unicodecsv as csv, not caring about anything than py2 :)

The text was updated successfully, but these errors were encountered:

This fixes the lacking unicode support in the default python csv module, by replacing it with the unicodecsv module instead of adding potential buggy code. Fixes ickc#14.

ickc · 2017-08-01T05:02:59Z

A quick look into the unicodecsv's README seems to suggest backport.csv is a better fit here. I mostly use python 3 (and code in python 3 while pasteurized into python 2). So a backport of csv to python 2 seems better.

ickc · 2017-08-01T05:05:51Z

I'm thinking if we should avoid the extra dependency for Python 3 users. Do you know a quick and clean way to do so?

reenberg · 2017-08-01T11:15:16Z

I don't have any preferences for which module. I just ended up not being able to load in my .csv files when they contained Danish language instead of English, which I had primarily used this filter for in the past.

Well we could obviously do a conditional import of the different modules, but that doesn't take care of the dependency in setup.py. I don't have a solution on top of my head for doing conditional dependencies in there.

I thought the py3 csv module had the same issues as the py2, but it seems that it was only an issue in py3.0, and it was fixed in 3.1.

It seems that the backport.csv module is a pure python implementation, meaning it is slow (according to the author), However I don't know how bit of a deal this is in practice.

I could try and make a proposal using this and the io.open() as changes instead?
Also trying to figure out how to deal with setup.py. Alternatively we could just try and import it, and if it works, then great, if not then just use the default csv module. This way it the user install the module herself, then it will work and the setup.py file wouldn't polute py3 installations.

reenberg · 2017-08-01T11:45:56Z

Dealing with this in setup.py actually seems quite easy. Environment Markers (PEP 508) is designed for this.

However there seems to be some fuzz about old versions of setuptools. Instead of specifying the environment markers in install_requires, it should be set in extra_requires as this is supported since setuptools 18 (e.g., here)

It seems you need version 20.6.8 (May 2016) for support to be fully functional in install_requires.

reenberg · 2017-08-01T14:39:36Z

See reenberg@7e3fa94 for an initial test at using environment markers. It works like a charm. And I don't think it is unreasonable to depend on a fairly new version of setuptools.

reenberg · 2017-08-01T23:06:27Z

@ickc And with the added backports.csv module instead for py2 (reenberg@bb13cd2).

It Actually simplified the code a bit, as there wasn't a need for differentiating between io.BytesIO and io.StringIO any more.

However the tests seems to fail on python 2, as some of the parameters is unicode instead of str:

- [...] TableRow(TableCell(Para(Math(E=mc^2; format=u'InlineMath'))) [...]
?
+ [...] TableRow(TableCell(Para(Math(E=mc^2; format='InlineMath'))) [...]

However this is an issue in the master branch as well, so doesn't seem related to what i have changed.

See full log: pantable.test.txt

Should I make a PR for this, or do you see anything that needs changing? It has minimum impact on py3 as requested.

reenberg · 2017-08-01T23:51:30Z

Actually I can see that something changed, since the test_read_data now also fails for assert read_data(True, '') is None in py3, which it doesn't on master, due to the fact that i removed the str() call inside io.open(str(include)).

Is there a particular reason why this is done so? Can there be any valid ways to actually sneak a bool or anything else than a string into this variable? It comes from the yaml meta-data block.
So you would have to pass something else than a string here, for example a list or something, but calling str() on that would still yield something that is not useful.

ickc · 2017-08-02T20:40:40Z

Also see #21

ickc · 2017-08-13T04:54:17Z

@reenberg, please check pantable v0.11 in #25 fixes your problem. Thanks.

reenberg mentioned this issue Aug 1, 2017

Replaced the csv module with unicodecsv #15

Closed

ickc mentioned this issue Aug 2, 2017

Use another CSV parser? #21

Closed

ickc added the CSV parser label Aug 2, 2017

ickc mentioned this issue Aug 13, 2017

Add unicode support for py2 #25

Merged

ickc closed this as completed Aug 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct handling of Unicode in py2 #14

Correct handling of Unicode in py2 #14

reenberg commented Jul 19, 2017

ickc commented Aug 1, 2017

ickc commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017 •

edited

Loading

reenberg commented Aug 1, 2017 •

edited

Loading

ickc commented Aug 2, 2017

ickc commented Aug 13, 2017 •

edited

Loading

Correct handling of Unicode in py2 #14

Correct handling of Unicode in py2 #14

Comments

reenberg commented Jul 19, 2017

ickc commented Aug 1, 2017

ickc commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017

reenberg commented Aug 1, 2017 • edited Loading

reenberg commented Aug 1, 2017 • edited Loading

ickc commented Aug 2, 2017

ickc commented Aug 13, 2017 • edited Loading

reenberg commented Aug 1, 2017 •

edited

Loading

reenberg commented Aug 1, 2017 •

edited

Loading

ickc commented Aug 13, 2017 •

edited

Loading