Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow INT64 values as string #41

Open
jeffbski-rga opened this issue Apr 3, 2020 · 0 comments
Open

allow INT64 values as string #41

jeffbski-rga opened this issue Apr 3, 2020 · 0 comments

Comments

@jeffbski-rga
Copy link

In older versions of parquetjs/parquetjs-lite it was acceptable to send values to parqutTransform as strings. This worked great for converting csv to parquet since everything was coming in as a string.

Unfortunately in the new version of parquetjs-lite when you provide strings for a column that is of INT64, it logs a message that assertion failed number is not an integer. I couldn't really see where this is occurring in the code, so it might be in a dependency. So the work around is to convert the string to number or int64 before passing value through in parquetTransform. This is not ideal since that means the preprocess code must know the types for each value and adjust the ones that are INT64 (all the others are fine as strings). The schema already has this info so it would be nice to have it fall squarely back on parquetTransform and not have to deal with transformation in multiple places.

Converting CSV to parquet is a popular use case with the cloud now, so it would be nice to make this as easy as possible.

If we could once again allow strings for INT64 then no transform is needed for the common types (UTF8, Double, boolean, INT64). The nice thing about allowing strings is that it can support values larger than js number and even on older platforms that don't yet support BigInt.

I am happy to put in a PR for this if I knew what needed to change, but I couldn't really see where the assertion was coming from. It seemed like the parquet was still getting created correctly in either case, so I think we just need to silence the assertion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant