-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv-parser does not parse big csv file correct, after ~95K rows it begins merge all rows in single JSON. #207
Comments
I'm seeing something like this too with the authors dump on https://openlibrary.org/developers/dumps. Replacing However, I don't think it's failing after N rows. Rather, there seems to be a bug with quote/end-of-line detection as it will produce a row that contains hundreds of concatenated rows in the final column, go back to parsing rows correctly, and then parse a long concatenated row many more times, back and forth. This code will demonstrate the issue on https://openlibrary.org/data/ol_dump_authors_latest.txt.gz (0.4GB):
This code will reveal many problem rows that accidentally concatenate following rows into the final column.
I notice that it happens on any row that has an escaped quote Perhaps |
I also hit this bug, somewhere around line 2.7M in the following data set: https://ridb.recreation.gov/downloads/reservations2022.zip Switching to papaparse worked on the same file. |
Centos 7
16.13.0
v3.0.0
Expected Behavior
167K rows parsed
Actual Behavior
95 K rows parsed
How Do We Reproduce?
https://edbq.xyz/test/Freight3.csv
The text was updated successfully, but these errors were encountered: