feat: add 'first' function #697

andrew-coleman · 2024-08-28T15:07:02Z

Add first to the set of generic aggregate functions.

This is required to support queries containing distinct aggregations that get rewritten by the Spark catalyst query optimiser to contain Expand relations and first function calls.

This PR, together with #696 will allow the implementation of the Expand relation in substrait-java, improving the test pass rate of the TPC-DS suite in the spark module.

Signed-off-by: andrew-coleman <[email protected]>

EpsilonPrime · 2024-08-28T15:10:03Z

extensions/functions_aggregate_generic.yaml

@@ -40,3 +40,13 @@ aggregate_functions:
        decomposable: MANY
        intermediate: any1?
        return: any1?
+  - name: "first"
+    description: First of a set of values.


Probably returns null if given an empty set.

EpsilonPrime · 2024-08-28T15:11:35Z

extensions/functions_aggregate_generic.yaml

@@ -40,3 +40,13 @@ aggregate_functions:
        decomposable: MANY
        intermediate: any1?
        return: any1?
+  - name: "first"
+    description: First of a set of values.


Additional caveat: Assumes the input is ordered. If the input is not ordered then the result is undefined.

EpsilonPrime

Thanks for the submission. The most similar function is any_value which does not have ordering requirements. This still represents a different enough concept to me for inclusion purposes.

Blizzara · 2024-08-28T15:32:10Z

FWIW, we had some discussion around this earlier, and then I concluded to just map first into any_value since in Spark they're the same anyways. That said I'm def fine with having it 🤷

andrew-coleman · 2024-08-29T12:28:31Z

FWIW, we had some #652 (comment) around this earlier, and then I concluded to just map first into any_value since in Spark they're the same anyways. That said I'm def fine with having it 🤷

Thanks, that worked for me, so I'm happy to close this PR

feat: add 'first' function

34a1274

Signed-off-by: andrew-coleman <[email protected]>

andrew-coleman requested review from jacques-n, cpcloud, westonpace, EpsilonPrime and vbarua as code owners August 28, 2024 15:07

EpsilonPrime reviewed Aug 28, 2024

View reviewed changes

andrew-coleman closed this Aug 29, 2024

andrew-coleman mentioned this pull request Sep 24, 2024

feat: add ExpandRel support to core and spark substrait-io/substrait-java#295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add 'first' function #697

feat: add 'first' function #697

andrew-coleman commented Aug 28, 2024

EpsilonPrime Aug 28, 2024

EpsilonPrime Aug 28, 2024

EpsilonPrime left a comment

Blizzara commented Aug 28, 2024

andrew-coleman commented Aug 29, 2024

feat: add 'first' function #697

feat: add 'first' function #697

Conversation

andrew-coleman commented Aug 28, 2024

EpsilonPrime Aug 28, 2024

Choose a reason for hiding this comment

EpsilonPrime Aug 28, 2024

Choose a reason for hiding this comment

EpsilonPrime left a comment

Choose a reason for hiding this comment

Blizzara commented Aug 28, 2024

andrew-coleman commented Aug 29, 2024