Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

Closed
wants to merge 3 commits into from

Conversation

eejbyfeldt
Copy link
Contributor

@eejbyfeldt eejbyfeldt commented Jun 20, 2024

Still has some test failures that needs to be addressed.

Which issue does this PR close?

Closes #10999.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 20, 2024
Still has not test failures that needs to be addressed.
@@ -37,7 +37,7 @@ async fn csv_query_array_agg_distinct() -> Result<()> {
Schema::new(vec![Field::new_list(
"ARRAY_AGG(DISTINCT aggregate_test_100.c2)",
Field::new("item", DataType::UInt32, true),
false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have access to the nullability of input arguments when defining UDAFs? I did not see how to maintain the behavior here.

Comment on lines +190 to +191
|| name_lower_case == "array_agg"
|| name_lower_case == "nth_value"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert to lower name in follow up to keep the PRs managable size.

@@ -46,6 +46,7 @@ arrow-array = { workspace = true }
arrow-schema = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true }
datafusion-functions-aggregate= { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not desired. Needed because of this function:

https://github.com/eejbyfeldt/datafusion/blob/b42c70bac4af5ef17ce24ca1a2e95efa3f7cece9/datafusion/sql/src/expr/mod.rs#L596-L626

The comment claims to that it should be possible to move to ArrayAgg::simplify, will need to look more into that API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how it would be possible to do with the current simplify API.

@@ -4974,7 +4974,7 @@ logical_plan
02)--Aggregate: groupBy=[[multiple_ordered_table.a, multiple_ordered_table.b]], aggr=[[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]]]
03)----TableScan: multiple_ordered_table projection=[a, b, c]
physical_plan
01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]], ordering_mode=Sorted
01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]], ordering_mode=Sorted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so 🤔

)
match agg_func.func_def {
AggregateFunctionDefinition::UDF(ref udf) => {
udf.inner().as_any().downcast_ref::<ArrayAgg>().is_some()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can compare with name for UDAF

@eejbyfeldt
Copy link
Contributor Author

Will be replaced by smaller PRs like #11045

@eejbyfeldt eejbyfeldt closed this Jun 21, 2024
@eejbyfeldt eejbyfeldt deleted the i10999 branch September 1, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert ArrayAgg to UDAF
2 participants