-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): Support use parquet format when spilling #16612
Conversation
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
PR Summary
|
Benchmark: dataset: tpch sf100 settings:
sql
Compared with arrow ipc, the optimization of parquet's file size mainly comes from dictionary encoding. parquet's cpu usage is quite high at the same time. There is no significant advantage for highly discrete data. |
Signed-off-by: coldWater <[email protected]>
Docker Image for PR
|
Signed-off-by: coldWater <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
LGTM, need rebase. |
Signed-off-by: coldWater <[email protected]>
cd8304a
to
22fca28
Compare
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Support use
parquet
format when spilling, you can switch to arrow ipc viaset spilling_file_format = 'arrow'
.Tests
Type of change
This change is