Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Support array_join function #1290

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erenavsarogullari
Copy link
Member

Which issue does this PR close?

Related to Epic: #1042
array_join: select array_join(array('hello', '-', 'world'), ' ') => hello - world
DataFusion' s array_join function is alias of array_to_string function:
https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-join

Rationale for this change

Defined under Epic: #1042

What changes are included in this PR?

planner.rs: Created DataFusion array_join physical expression from Spark physical expression with return type: DataType::Utf8,
expr.proto: array_join message has been added (array_intersect is 62),
QueryPlanSerde.scala: array_join pattern matching case has been added,
CometExpressionSuite.scala: A new UT has been added for array_join function.

How are these changes tested?

A new UT has been added and first query result is as follows:
Query Result II

@@ -86,6 +86,7 @@ message Expr {
ArrayInsert array_insert = 59;
BinaryExpr array_contains = 60;
BinaryExpr array_remove = 61;
BinaryExpr array_join = 63;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use 62?

Copy link
Member Author

@erenavsarogullari erenavsarogullari Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another PR for array_intersect and it uses 62. I think array_intersect can go first.

@NoeB
Copy link
Contributor

NoeB commented Jan 18, 2025

I think the handling of the third argument (nullReplacement) is missing see: https://docs.databricks.com/en/sql/language-manual/functions/array_join.html#arguments. I think this will also require a new expression type in expr.proto

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Jan 19, 2025

I think the handling of the third argument (nullReplacement) is missing see: https://docs.databricks.com/en/sql/language-manual/functions/array_join.html#arguments. I think this will also require a new expression type in expr.proto

Thanks - That is right both Spark and DataFusion support optional nullReplacement and this requires new ArrayJoin Proto message.

@erenavsarogullari erenavsarogullari force-pushed the array_join branch 2 times, most recently from a5e54c3 to c5f8e7e Compare January 19, 2025 23:07
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 86.20690% with 4 lines in your changes missing coverage. Please review.

Project coverage is 34.71%. Comparing base (be48839) to head (895e701).
Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 86.20% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1290      +/-   ##
============================================
+ Coverage     34.69%   34.71%   +0.01%     
- Complexity      991      992       +1     
============================================
  Files           116      117       +1     
  Lines         44891    45148     +257     
  Branches       9864     9956      +92     
============================================
+ Hits          15574    15671      +97     
- Misses        26168    26310     +142     
- Partials       3149     3167      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jatin510
Copy link
Contributor

lgtm 👍

@erenavsarogullari erenavsarogullari changed the title Feat: Support array_join Feat: Support array_join function Jan 20, 2025
@andygrove
Copy link
Member

@erenavsarogullari could you fix the conflicts on this PR?

@erenavsarogullari
Copy link
Member Author

Thanks @andygrove. Sure, i will have a look in today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants