Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong value from a DOUBLE column with repeating values #48

Open
muratcorlu opened this issue Aug 1, 2020 · 1 comment
Open

Wrong value from a DOUBLE column with repeating values #48

muratcorlu opened this issue Aug 1, 2020 · 1 comment

Comments

@muratcorlu
Copy link

muratcorlu commented Aug 1, 2020

I have a parquet file that has 10k records in it. It has 7 columns that are strings and 1 is Double. When I read this file and convert them to a sql query(batch insert) I realized that somewhere in the file, it starts to give a different value for this double column. My iteration code is very simple;

    while (record = await cursor.next()) {
      count++;
      if (queryData) {
        queryData += ',';
      }
      queryData += `("${record.someId}","${record.someId2}","${record.someId3}","${record.someId4}","${record.readDate}",${record.readValue},"${record.unit}")`;
    }

record.readValue is the double column. Parquet file is written with parquet-mr version 1.10.1. I couldn't find a clear correlation about wrong values. Here is screenshot from a diff of the result of same parquet file with has been read with a different reader and parquetjs-lite reader.

image

When same value starts repeating in actual data, parquetjs-lite reader starts using a different value(1542.3070...) then correct one. And that value is not a "random" value actually. It one of the values from document, but from another index(somewhere in previous rows).

I hope I could explain the issue. I tried to debug this problem in last 12 hours but couldn't find a clear cause yet. I only feel that this is something about repetition levels but can not confirm. It's an issue on our production currently. Even I started to write this function with Python just because of this. I hope this can be addressed properly and I can return back to JavaScript.

@muratcorlu muratcorlu changed the title Wrong value red randomly from a DOUBLE column Wrong value from a DOUBLE column with repeating values Aug 3, 2020
@garyirick-rga
Copy link

This might be fixed by this PR: #81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants