Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joinp --infer-len=0 data type issue #2405

Open
jqnatividad opened this issue Jan 3, 2025 Discussed in #2402 · 5 comments
Open

joinp --infer-len=0 data type issue #2405

jqnatividad opened this issue Jan 3, 2025 Discussed in #2402 · 5 comments

Comments

@jqnatividad
Copy link
Collaborator

Discussed in #2402

Originally posted by datatraveller1 January 2, 2025
With the qsv joinp --infer-len=0 command, some data types of the output file (-o joined.csv) seem to get changed.
E.g. there is a column (not part of the join columns) PT with original values 0101. The output column PT values get changed to 101. However, I don't want the output column content to be changed.
@jqnatividad If you need more information, I'll write an issue report tomorrow or latest this weekend. Thank you!

@jqnatividad
Copy link
Collaborator Author

jqnatividad commented Jan 3, 2025

Thanks for the report @datatraveller1.

Can you add some additional context and a minimal reproducible example?

v2.0.0 release is imminent, so it might be fixed by upstream changes in Polars.

I also fine-tuned the usage text to show the interplay between the --infer-len and --cache-schema options.

382c6e0

jqnatividad added a commit that referenced this issue Jan 3, 2025
…sables Polars schema inferencing/caching

see #2405
@datatraveller1
Copy link

Hi @jqnatividad,

ok, here is a little example:

file a.csv

art_no,PG,PT
1,104,0101
2,104,0101
3,104,0101

file b.csv

art_no,PG2,PT2
1,105,0101
3,105,0101
4,105,0101

command:
qsv joinp --infer-len=0 art_no a.csv art_no b.csv -o joined.csv

result joined.csv:

art_no,PG,PT,PG2,PT2
1,104,101,105,101
3,104,101,105,101

I want PT and PT2 to be depicted with the original values 0101 instead of 101

@datatraveller1
Copy link

BTW, without --infer-len=0 (qsv joinp art_no a.csv art_no b.csv -o joined.csv), the result is the same.

@datatraveller1
Copy link

datatraveller1 commented Jan 6, 2025

BTW, with join instead of joinp (qsv join art_no a.csv art_no b.csv -o joined.csv), the result looks correct:

art_no,PG,PT,art_no,PG2,PT2
1,104,0101,1,105,0101
3,104,0101,3,105,0101

@jqnatividad
Copy link
Collaborator Author

@datatraveller1 , thanks for providing the reproducible test.

I can confirm I can replicate your report, and further, that it's still the same behavior in the 2.0.0 pre-release (https://github.com/dathere/qsv/releases/tag/2.0.0).

I think it might be an issue on the Polars end. My hypothesis is that its inferring the column as an integer and stripping the leading zero, but I'm still in the middle of verifying it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants