You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In older versions of parquetjs/parquetjs-lite it was acceptable to send values to parqutTransform as strings. This worked great for converting csv to parquet since everything was coming in as a string.
Unfortunately in the new version of parquetjs-lite when you provide strings for a column that is of INT64, it logs a message that assertion failed number is not an integer. I couldn't really see where this is occurring in the code, so it might be in a dependency. So the work around is to convert the string to number or int64 before passing value through in parquetTransform. This is not ideal since that means the preprocess code must know the types for each value and adjust the ones that are INT64 (all the others are fine as strings). The schema already has this info so it would be nice to have it fall squarely back on parquetTransform and not have to deal with transformation in multiple places.
Converting CSV to parquet is a popular use case with the cloud now, so it would be nice to make this as easy as possible.
If we could once again allow strings for INT64 then no transform is needed for the common types (UTF8, Double, boolean, INT64). The nice thing about allowing strings is that it can support values larger than js number and even on older platforms that don't yet support BigInt.
I am happy to put in a PR for this if I knew what needed to change, but I couldn't really see where the assertion was coming from. It seemed like the parquet was still getting created correctly in either case, so I think we just need to silence the assertion.
The text was updated successfully, but these errors were encountered:
In older versions of parquetjs/parquetjs-lite it was acceptable to send values to parqutTransform as strings. This worked great for converting csv to parquet since everything was coming in as a string.
Unfortunately in the new version of parquetjs-lite when you provide strings for a column that is of INT64, it logs a message that assertion failed number is not an integer. I couldn't really see where this is occurring in the code, so it might be in a dependency. So the work around is to convert the string to number or int64 before passing value through in parquetTransform. This is not ideal since that means the preprocess code must know the types for each value and adjust the ones that are INT64 (all the others are fine as strings). The schema already has this info so it would be nice to have it fall squarely back on parquetTransform and not have to deal with transformation in multiple places.
Converting CSV to parquet is a popular use case with the cloud now, so it would be nice to make this as easy as possible.
If we could once again allow strings for INT64 then no transform is needed for the common types (UTF8, Double, boolean, INT64). The nice thing about allowing strings is that it can support values larger than js number and even on older platforms that don't yet support BigInt.
I am happy to put in a PR for this if I knew what needed to change, but I couldn't really see where the assertion was coming from. It seemed like the parquet was still getting created correctly in either case, so I think we just need to silence the assertion.
The text was updated successfully, but these errors were encountered: