You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a parquet file that has 10k records in it. It has 7 columns that are strings and 1 is Double. When I read this file and convert them to a sql query(batch insert) I realized that somewhere in the file, it starts to give a different value for this double column. My iteration code is very simple;
record.readValue is the double column. Parquet file is written with parquet-mr version 1.10.1. I couldn't find a clear correlation about wrong values. Here is screenshot from a diff of the result of same parquet file with has been read with a different reader and parquetjs-lite reader.
When same value starts repeating in actual data, parquetjs-lite reader starts using a different value(1542.3070...) then correct one. And that value is not a "random" value actually. It one of the values from document, but from another index(somewhere in previous rows).
I hope I could explain the issue. I tried to debug this problem in last 12 hours but couldn't find a clear cause yet. I only feel that this is something about repetition levels but can not confirm. It's an issue on our production currently. Even I started to write this function with Python just because of this. I hope this can be addressed properly and I can return back to JavaScript.
The text was updated successfully, but these errors were encountered:
muratcorlu
changed the title
Wrong value red randomly from a DOUBLE column
Wrong value from a DOUBLE column with repeating values
Aug 3, 2020
I have a parquet file that has 10k records in it. It has 7 columns that are strings and 1 is Double. When I read this file and convert them to a sql query(batch insert) I realized that somewhere in the file, it starts to give a different value for this double column. My iteration code is very simple;
record.readValue
is thedouble
column. Parquet file is written withparquet-mr version 1.10.1
. I couldn't find a clear correlation about wrong values. Here is screenshot from a diff of the result of same parquet file with has been read with a different reader and parquetjs-lite reader.When same value starts repeating in actual data, parquetjs-lite reader starts using a different value(1542.3070...) then correct one. And that value is not a "random" value actually. It one of the values from document, but from another index(somewhere in previous rows).
I hope I could explain the issue. I tried to debug this problem in last 12 hours but couldn't find a clear cause yet. I only feel that this is something about repetition levels but can not confirm. It's an issue on our production currently. Even I started to write this function with Python just because of this. I hope this can be addressed properly and I can return back to JavaScript.
The text was updated successfully, but these errors were encountered: