Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read gz files larger than 4 GB #1553

Open
johan-gson opened this issue Aug 24, 2024 · 0 comments
Open

Cannot read gz files larger than 4 GB #1553

johan-gson opened this issue Aug 24, 2024 · 0 comments

Comments

@johan-gson
Copy link

Hi,

It seems that read_tsv cannot read files larger than 4GB if they are gunzipped. I did some calculations, and it seemed like it got to about 4GB of unpacked data and then ended reading, meaning that the last 150k lines were not read. I suspect there is an unsigned 32-bit integer somewhere in the code, but I cannot be sure ofcourse. This was on Windows. I think the problem exists for any huge file that is gunzipped. It didn't crash or anything, just gave too few lines. If I unzipped the file, I could read all of it without a problem. I don't think I ran out of memory either. Hard to provide an example, I cannot share the file, but I would guess any file large enough will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant