Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add 'first' function #697

Closed
wants to merge 1 commit into from

Conversation

andrew-coleman
Copy link
Contributor

Add first to the set of generic aggregate functions.

This is required to support queries containing distinct aggregations that get rewritten by the Spark catalyst query optimiser to contain Expand relations and first function calls.

This PR, together with #696 will allow the implementation of the Expand relation in substrait-java, improving the test pass rate of the TPC-DS suite in the spark module.

Signed-off-by: andrew-coleman <[email protected]>
@@ -40,3 +40,13 @@ aggregate_functions:
decomposable: MANY
intermediate: any1?
return: any1?
- name: "first"
description: First of a set of values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably returns null if given an empty set.

@@ -40,3 +40,13 @@ aggregate_functions:
decomposable: MANY
intermediate: any1?
return: any1?
- name: "first"
description: First of a set of values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional caveat: Assumes the input is ordered. If the input is not ordered then the result is undefined.

Copy link
Member

@EpsilonPrime EpsilonPrime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission. The most similar function is any_value which does not have ordering requirements. This still represents a different enough concept to me for inclusion purposes.

@Blizzara
Copy link
Contributor

FWIW, we had some discussion around this earlier, and then I concluded to just map first into any_value since in Spark they're the same anyways. That said I'm def fine with having it 🤷

@andrew-coleman
Copy link
Contributor Author

FWIW, we had some #652 (comment) around this earlier, and then I concluded to just map first into any_value since in Spark they're the same anyways. That said I'm def fine with having it 🤷

Thanks, that worked for me, so I'm happy to close this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants