-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-33155 Revisit Parquet Test Suite #19405
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Jack Del Vecchio <[email protected]>
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33155 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question about the contents of a comment block.
* ├── district=1 | ||
* │ ├── firstname=Alice | ||
* │ │ └── city=New%20York | ||
* │ │ └── part_0_of_table_0_from_worker_0.parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These part... names are files? If so, they are all the same for every value combination. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments/questions.
DATASET([{0, 'Corrupt Parquet File', ''}], RECORDDEF), | ||
CORRUPT_PARQUET); | ||
|
||
OUTPUT(CORRUPT_RESULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the output this workunit will not do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -31,6 +36,3 @@ EMPTY_PARQUET := DATASET([], RECORDDEF); | |||
ParquetIO.Write(EMPTY_PARQUET, filePath, TRUE); | |||
|
|||
read_data := ParquetIO.Read(RECORDDEF, filePath); | |||
|
|||
OUTPUT(read_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this deleted it will not read the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
- createDirectoryIterator is used to list all matching Parquet files and the order is not guarenteed - Fix parquetEmpty.ecl and parquetCorrupt.ecl by adding the OUTPUT statements back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
Change the plugin to throw an error if a file is not found when opening a file for reading
Change output filename of partitioned file parts. part_00_0.parquet -> part_0_of_table_0_from_worker_0.parquet
These files aren't meant to be read directly, but I think the new format makes more sense when trying to debug.
Update tests so that they are all being run and are able to use external files in testing/regress/download
Remove parquetString test and key file and move testing to parquetTypes.ecl
Update test cases in parquetTypes.ecl
Type of change:
Checklist:
Smoketest:
Testing: