-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move First Value UDAF and builtin first / last function to aggregate-functions
#9960
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
I think the next step is to make last value udaf, so we can replace builtin first/last. |
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is looking sweet and close
Can we add a re-export in of this crate in datafusion (like https://github.com/apache/arrow-datafusion/blob/c8584557cdfa7c138ab9039ceac31323f48a44d3/datafusion/core/src/lib.rs#L530-L533)?
Otherwise I think this is basically ready to go. 🚀
Cargo.toml
Outdated
@@ -73,6 +74,7 @@ chrono = { version = "0.4.34", default-features = false } | |||
ctor = "0.2.0" | |||
dashmap = "5.4.0" | |||
datafusion = { path = "datafusion/core", version = "37.0.0", default-features = false } | |||
datafusion-aggregate-functions = { path = "datafusion/aggregate-functions", version = "37.0.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling the new crate datafusion-functions-aggregates
(to mirror the naming of datafusion-functions
and datafusion-functions-array
)?
panic!("Failed to register UDAF: {}", err); | ||
} | ||
} | ||
datafusion_aggregate_functions::register_all(&mut new_self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -612,6 +612,7 @@ async fn roundtrip_expr_api() -> Result<()> { | |||
lit(1), | |||
), | |||
array_replace_all(make_array(vec![lit(1), lit(2), lit(3)]), lit(2), lit(4)), | |||
// TODO: Add first value after built-in functions are deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we would wait until the built in functions are deprecated 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Sorry for leaving this confusing comment. I should have left it in the review comment, not in the code 😆
I plan to support the expr
API with arguments like distinct, and ordering after I deprecated all the built-in functions.
However, I added a simple example with lit(1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upd: It seems I need to support expr
with arguments before deprecated builtin
acc_args: AccumulatorArgs, | ||
) -> Result<Box<dyn Accumulator>> { | ||
let mut all_sort_orders = vec![]; | ||
/// TO BE DEPRECATED: Builtin FIRST_VALUE physical aggregate expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment -- what does it mean to be deprecated? What is the alternative?
Is the idea it will be replaced with a UDF version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be replaced with udaf
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Besides addressing comments, I also move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- I think this looks really cool and could be merged.
I think it would be better to leave AggregateFunctionExpr
and create_aggregate_expr
in the physical-expr-core crate for reasons I mentioned but if you disagree or want to put them back as a follow on PR I think that would be ok too.
let mut all_sort_orders = vec![]; | ||
|
||
// Construct PhysicalSortExpr objects from Expr objects: | ||
let mut sort_exprs = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/// Any implementation of this trait also needs to implement the | ||
/// `PartialEq<dyn Any>` to allows comparing equality between the | ||
/// trait objects. | ||
pub trait AggregateExpr: Send + Sync + Debug + PartialEq<dyn Any> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also move AggregateFunctionExpr and create_aggregate_expr to functions-aggregate from physical-expr-common
While I agree that these two structures are aggregate specific, I still think they should be in datafusion-physical-expr-common
because that would allow people to use DataFusion and create their own aggregate functions without having to take DataFusion's built in aggregate functions
I see real value in keeping the APIs required to use datafusion separate from the actual implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should avoid the user pulling out unnecessary things for them. I thought they would need builtin functions, but they probably don't. I will keep them in common
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Thanks for your review @alamb. |
Which issue does this PR close?
Part of #8708
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?