-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Rewrite collect_set and collect_list aggregate function #4805
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Soft suggestion: Could we add a Gluten test case to ensure the function doesn't get fallen back? Thanks. |
backends-velox/src/test/scala/io/glutenproject/execution/VeloxAggregateFunctionsSuite.scala
Show resolved
Hide resolved
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
cc @zhztheplayer @PHILO-HE @rui-mo @liujiayi771 thank you |
@@ -54,9 +54,7 @@ class VeloxTestSettings extends BackendTestSettings { | |||
"SPARK-32038: NormalizeFloatingNumbers should work on distinct aggregate", | |||
// Replaced with another test. | |||
"SPARK-19471: AggregationIterator does not initialize the generated result projection" + | |||
" before using it", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test whether the test case for "SPARK-31993: concat_ws in agg function with plenty of string/array types columns" can now be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the reminder, enabled that test
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for your fix! It's nice to introduce such rule before inconsistency behavior is fixed in velox.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
IsNotNull(partial_in)
to skip null value before going to native collect_setIf(IsNull(result), CreateArray(Seq.empty), result)
to replace null to empty arrayHow was this patch tested?
add test
pass
SPARK-17641: collect functions should not collect null values