Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move First Value UDAF and builtin first / last function to aggregate-functions #9960

Merged
merged 40 commits into from
Apr 7, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Part of #8708

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Apr 5, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211
Copy link
Contributor Author

I think the next step is to make last value udaf, so we can replace builtin first/last.

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review April 5, 2024 14:49
@jayzhan211 jayzhan211 requested review from alamb and mustafasrepo April 5, 2024 15:00
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is looking sweet and close

Can we add a re-export in of this crate in datafusion (like https://github.com/apache/arrow-datafusion/blob/c8584557cdfa7c138ab9039ceac31323f48a44d3/datafusion/core/src/lib.rs#L530-L533)?

Otherwise I think this is basically ready to go. 🚀

Cargo.toml Outdated
@@ -73,6 +74,7 @@ chrono = { version = "0.4.34", default-features = false }
ctor = "0.2.0"
dashmap = "5.4.0"
datafusion = { path = "datafusion/core", version = "37.0.0", default-features = false }
datafusion-aggregate-functions = { path = "datafusion/aggregate-functions", version = "37.0.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling the new crate datafusion-functions-aggregates (to mirror the naming of datafusion-functions and datafusion-functions-array)?

panic!("Failed to register UDAF: {}", err);
}
}
datafusion_aggregate_functions::register_all(&mut new_self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -612,6 +612,7 @@ async fn roundtrip_expr_api() -> Result<()> {
lit(1),
),
array_replace_all(make_array(vec![lit(1), lit(2), lit(3)]), lit(2), lit(4)),
// TODO: Add first value after built-in functions are deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we would wait until the built in functions are deprecated 🤔

Copy link
Contributor Author

@jayzhan211 jayzhan211 Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Sorry for leaving this confusing comment. I should have left it in the review comment, not in the code 😆

I plan to support the expr API with arguments like distinct, and ordering after I deprecated all the built-in functions.

However, I added a simple example with lit(1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upd: It seems I need to support expr with arguments before deprecated builtin

datafusion/aggregate-functions/src/first_last.rs Outdated Show resolved Hide resolved
acc_args: AccumulatorArgs,
) -> Result<Box<dyn Accumulator>> {
let mut all_sort_orders = vec![];
/// TO BE DEPRECATED: Builtin FIRST_VALUE physical aggregate expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment -- what does it mean to be deprecated? What is the alternative?

Is the idea it will be replaced with a UDF version ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be replaced with udaf

datafusion/aggregate-functions/src/macros.rs Outdated Show resolved Hide resolved
datafusion/aggregate-functions/src/macros.rs Outdated Show resolved Hide resolved
datafusion/expr/src/utils.rs Outdated Show resolved Hide resolved
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 requested a review from alamb April 6, 2024 02:24
@jayzhan211 jayzhan211 marked this pull request as draft April 6, 2024 05:42
@jayzhan211 jayzhan211 marked this pull request as ready for review April 6, 2024 05:43
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Apr 6, 2024

Besides addressing comments, I also move AggregateFunctionExpr and create_aggregate_expr to functions-aggregate from physical-expr-common because I think they are specific to aggregate.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I think this looks really cool and could be merged.

I think it would be better to leave AggregateFunctionExpr and create_aggregate_expr in the physical-expr-core crate for reasons I mentioned but if you disagree or want to put them back as a follow on PR I think that would be ok too.

let mut all_sort_orders = vec![];

// Construct PhysicalSortExpr objects from Expr objects:
let mut sort_exprs = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// Any implementation of this trait also needs to implement the
/// `PartialEq<dyn Any>` to allows comparing equality between the
/// trait objects.
pub trait AggregateExpr: Send + Sync + Debug + PartialEq<dyn Any> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also move AggregateFunctionExpr and create_aggregate_expr to functions-aggregate from physical-expr-common

While I agree that these two structures are aggregate specific, I still think they should be in datafusion-physical-expr-common because that would allow people to use DataFusion and create their own aggregate functions without having to take DataFusion's built in aggregate functions

I see real value in keeping the APIs required to use datafusion separate from the actual implementations

Copy link
Contributor Author

@jayzhan211 jayzhan211 Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should avoid the user pulling out unnecessary things for them. I thought they would need builtin functions, but they probably don't. I will keep them in common

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211
Copy link
Contributor Author

Thanks for your review @alamb.

@jayzhan211 jayzhan211 merged commit 2f1c3ab into apache:main Apr 7, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants