-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add 'first' function #697
Conversation
Signed-off-by: andrew-coleman <[email protected]>
@@ -40,3 +40,13 @@ aggregate_functions: | |||
decomposable: MANY | |||
intermediate: any1? | |||
return: any1? | |||
- name: "first" | |||
description: First of a set of values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably returns null if given an empty set.
@@ -40,3 +40,13 @@ aggregate_functions: | |||
decomposable: MANY | |||
intermediate: any1? | |||
return: any1? | |||
- name: "first" | |||
description: First of a set of values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional caveat: Assumes the input is ordered. If the input is not ordered then the result is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission. The most similar function is any_value which does not have ordering requirements. This still represents a different enough concept to me for inclusion purposes.
FWIW, we had some discussion around this earlier, and then I concluded to just map first into any_value since in Spark they're the same anyways. That said I'm def fine with having it 🤷 |
Thanks, that worked for me, so I'm happy to close this PR |
Add
first
to the set of generic aggregate functions.This is required to support queries containing
distinct
aggregations that get rewritten by the Spark catalyst query optimiser to containExpand
relations andfirst
function calls.This PR, together with #696 will allow the implementation of the
Expand
relation insubstrait-java
, improving the test pass rate of the TPC-DS suite in thespark
module.