-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support pre-split in GPU project exec #11916
Comments
For a project we just need to be careful with window operations. We also need to be careful with the performance impact that this can have. For some window operations we need to have the entire window in a single batch to be able to process the data. That is archived with the spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/window/GpuWindowExec.scala Lines 156 to 162 in 4df6d60
A ProjectExec can be inserted before the window and after the sort in some cases. spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/window/GpuWindowExecMeta.scala Line 145 in 4df6d60
We mainly need to make sure that if we can split the input batch into smaller batches that we do not mark Project as preserving the batching. spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala Line 223 in 4df6d60
If we update the code to do a pre-project split we could also update it to split on retry as well. spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala Line 395 in 4df6d60
When I first put in the pre-project split code for hash aggregate I also implemented it for project, but I saw a large performance regression so I reverted that part of the code. Please make sure that we measure this performance regression, especially around window operations that require that all of the data for a window be in a single batch. |
After enabling the pre-split for Project, we met another case there are some exrepssions of complex type in the project list, but their sizes are wrongly estimated, then it produced a ~3GB batch, leading to OOM when performing the project.
We need to react the estimation code to cover these cases at least. |
Is your feature request related to a problem? Please describe.
We met a CPU OOM due to a quite large batch(~5.4G), who has more than 250 columns.
After checking the query eventlog, we found there is a big projection after a symmetric join. This big projection was trying to build about 266 columns from a batch with only about 50 columns. So the projected batch size (~5G) grew up to about 5 times of the input batch size (1G).
Describe the solution you'd like
Add the pre-split support to GPU project exe, similarly as what we have done in GPU aggregate exec to avoid producing large batches after some aggregations.
Additional context
The exception call stack
The text was updated successfully, but these errors were encountered: