Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect preview of parquet files with decimals #7957

Open
3 tasks done
ihenry opened this issue May 24, 2024 · 7 comments
Open
3 tasks done

Incorrect preview of parquet files with decimals #7957

ihenry opened this issue May 24, 2024 · 7 comments
Assignees
Labels
🪲 bug Issue is not intended behavior
Milestone

Comments

@ihenry
Copy link

ihenry commented May 24, 2024

Preflight Checklist

Storage Explorer Version

1.33.1

Regression From

No response

Architecture

x64

Storage Explorer Build Number

20240410.2

Platform

All

OS Version

Windows 11 & MacOS 14.5

Bug Description

Incorrect preview of parquet files with multiple decimal precision (5,3), (9,5) and (38,6).

Steps to Reproduce

Previewing a parquet file in Azure Storage Explorer containing columns defined with various decimal precision (5,3), (9,5) and Decimal (38,6) shows incorrect results. The file should be previewed as in DBeaver with 0 or 0.xxx as appropriate.
DBeaver with DuckDB shows the following preview
Screenshot 2024-05-24 at 12 58 58

DBeaver with DuckDB Metadata
Screenshot 2024-05-24 at 12 52 55

Azure Storage Explorer 1.33.1
Screenshot 2024-05-24 at 12 52 32

Actual Experience

Expecting to see raw values, but we actually see {"type":"Buffer","data":[0,0,0,0]}

@craxal
Copy link
Contributor

craxal commented May 25, 2024

@ihenry Can you share your Parquet file or a small sample file that we could test with?

@craxal craxal added the 🪲 bug Issue is not intended behavior label May 25, 2024
@ihenry
Copy link
Author

ihenry commented May 27, 2024

Thanks @craxal. I havew emailed a sample parquet file to the sehelp mailbox.

@craxal craxal self-assigned this Jun 3, 2024
@craxal craxal added this to the 1.35.0 milestone Jun 3, 2024
@craxal
Copy link
Contributor

craxal commented Jun 11, 2024

Issue reproduced on our end.

Are all of your decimal values intentionally 0? Every buffer that's parsed seems to contain only zeroes.

It seems that the library we use does not currently support decimal values (see https://github.com/LibertyDSNP/parquetjs#list-of-supported-types--encodings). We might be able to work around this by parsing the buffer ourselves.

@ihenry
Copy link
Author

ihenry commented Jun 12, 2024

Yes, that extract was from a system that contains sample data. It appears the default value is zero.
I have seen the same behaviour with non-zero decimal values too, that was real data which is more difficult to share.

@craxal craxal modified the milestones: 1.35.0, 1.36.0 Jul 15, 2024
@craxal
Copy link
Contributor

craxal commented Sep 3, 2024

@ihenry I'm having trouble producing a Parquet file with non-zero decimal values encoded as byte arrays. Can you either provide another sample file or guidance as to how you produced the sample you emailed earlier?

@ihenry
Copy link
Author

ihenry commented Sep 5, 2024

@craxal, I have emailed a new sample file to the sehelp email address. I am using SAP Datasphere to extract the data from an SAP system and land that in Microsoft Azure Data Lake Storage Gen2 target.

@craxal
Copy link
Contributor

craxal commented Sep 9, 2024

Thank you very much.

Unfortunately, it does not look like parsing a buffer array into a displayable decimal ourselves is as trivial as I had hoped. If each element in the array had represented a base-10 digit, this would have been pretty straightforward, but it doesn't appear to be the case. Attempting to decode these values correctly makes me nervous and seems wasteful when the library already has support lined up (LibertyDSNP/parquetjs#91).

We'll keep this work item open for tracking, but we'll need to wait for library support.

@craxal craxal modified the milestones: 1.36.0, 1.37.0 Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 bug Issue is not intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants