Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Rewrite collect_set and collect_list aggregate function #4805

Merged
merged 3 commits into from
Mar 7, 2024

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Feb 29, 2024

What changes were proposed in this pull request?

  • Add IsNotNull(partial_in) to skip null value before going to native collect_set
  • Add If(IsNull(result), CreateArray(Seq.empty), result) to replace null to empty array

How was this patch tested?

add test

pass SPARK-17641: collect functions should not collect null values

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@zhztheplayer
Copy link
Member

Soft suggestion: Could we add a Gluten test case to ensure the function doesn't get fallen back? Thanks.

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

@ulysses-you ulysses-you changed the title [VL] Support collect_set aggregate function [VL] Rewrite collect_set and collect_list aggregate function Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor Author

cc @zhztheplayer @PHILO-HE @rui-mo @liujiayi771 thank you

@@ -54,9 +54,7 @@ class VeloxTestSettings extends BackendTestSettings {
"SPARK-32038: NormalizeFloatingNumbers should work on distinct aggregate",
// Replaced with another test.
"SPARK-19471: AggregationIterator does not initialize the generated result projection" +
" before using it",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test whether the test case for "SPARK-31993: concat_ws in agg function with plenty of string/array types columns" can now be included?

Copy link
Contributor Author

@ulysses-you ulysses-you Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the reminder, enabled that test

@liujiayi771
Copy link
Contributor

#4763

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 7, 2024

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for your fix! It's nice to introduce such rule before inconsistency behavior is fixed in velox.

@ulysses-you ulysses-you merged commit ae97d0f into apache:main Mar 7, 2024
19 of 20 checks passed
@ulysses-you ulysses-you deleted the agg branch March 7, 2024 08:32
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4805_time.csv log/native_master_03_06_2024_bddc3fd79_time.csv difference percentage
q1 38.25 38.77 0.523 101.37%
q2 23.77 24.36 0.584 102.46%
q3 37.27 39.60 2.333 106.26%
q4 37.43 37.58 0.152 100.41%
q5 69.45 69.92 0.470 100.68%
q6 7.27 8.34 1.065 114.65%
q7 81.93 84.40 2.471 103.02%
q8 83.66 86.73 3.070 103.67%
q9 123.16 119.49 -3.666 97.02%
q10 45.95 43.18 -2.774 93.96%
q11 20.52 20.86 0.342 101.67%
q12 28.37 28.05 -0.316 98.89%
q13 47.79 44.55 -3.246 93.21%
q14 18.74 17.03 -1.715 90.85%
q15 29.08 28.15 -0.928 96.81%
q16 14.17 14.07 -0.091 99.36%
q17 101.45 101.46 0.013 100.01%
q18 143.17 145.26 2.092 101.46%
q19 13.57 13.99 0.420 103.10%
q20 28.64 28.21 -0.426 98.51%
q21 229.27 224.18 -5.089 97.78%
q22 14.09 14.99 0.898 106.37%
total 1236.99 1233.17 -3.818 99.69%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants