Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional support for decimal type #81

Closed
nirmal82 opened this issue May 10, 2023 · 8 comments
Closed

additional support for decimal type #81

nirmal82 opened this issue May 10, 2023 · 8 comments
Assignees

Comments

@nirmal82
Copy link

help needed
I am trying to generate parquet file with existing schema having field with following datatype
fixed_len_byte_array(16) LICENSE_TERM_IN_MONTHS (DECIMAL(38,0));

requesting for enabling support for this dataType

@wilwade
Copy link
Member

wilwade commented May 10, 2023

@nirmal82 We only just added Decimal support in v1.2.3 (https://github.com/LibertyDSNP/parquetjs/releases/tag/v1.2.3)

If you upgrade to that release it should work!

If it doesn't, then it sounds like we might have a bug.

@wilwade wilwade self-assigned this May 10, 2023
@nirmal82
Copy link
Author

@wilwade

I am using the latest code. even while reading the parquet it generating some Buffer value instead of number/string. i am not good with javascript, so can you please help me to unblock with reading and writing parquet having DECIMAL datatype

@wilwade
Copy link
Member

wilwade commented May 31, 2023

@nirmal82 could you share an example file that produces this issue?

@wilwade wilwade added the needs information Needs more information from the original issue creator label Jun 12, 2023
@nirmal82
Copy link
Author

@wilwade attaching my sample parquet file and json data files both

here are the steps

  • extracting schema from parquet using reader
  • reading content from json file
  • passing the schema and content to parquet writer for generating new parquet files.

testParquet.zip

Thanks

@wilwade wilwade removed the needs information Needs more information from the original issue creator label Jun 21, 2023
@wilwade
Copy link
Member

wilwade commented Jun 21, 2023

@nirmal82 I see that other readers work with it, but it doesn't appear to be supported.

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

Decimal is required to have precision <= 18, but this has it set at 38

pqrs schema decimal.parquet
Metadata for file: decimal.parquet

version: 1
num of rows: 1
created by: parquet-mr version 1.12.2 (build 77e30c8093386ec52c3cfa6c34b7ef3321322c94)
metadata:
  writer.model.name: example
message spark_schema {
  REQUIRED FIXED_LEN_BYTE_ARRAY (16) ID (DECIMAL(38,0));
  OPTIONAL BYTE_ARRAY NAME (STRING);
  OPTIONAL BYTE_ARRAY ADDRESS (STRING);
  OPTIONAL INT96 CREATE_DTM;
  OPTIONAL INT96 UPDATE_DTM;
}

@wilwade
Copy link
Member

wilwade commented Jun 21, 2023

Which now that I read it again, you said in your original post, I just missed it.

@wilwade
Copy link
Member

wilwade commented Jun 22, 2023

Looked a bit more into it. So this library currently only supports up to 18 precision because it doesn't convert over that into the correct fixed_len_byte_array or binary base type.

I'm not sure we would be able to do so without just using strings for the input and output due to the limitations of JavaScript working with big decimals.

Open to ideas or PRs for sure!

@wilwade
Copy link
Member

wilwade commented Jun 23, 2023

@nirmal82 I added a new issue to better document the issue for those searching and track the support: #91

@wilwade wilwade closed this as completed Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants