-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support sort merge join #178
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #178 +/- ##
============================================
+ Coverage 33.32% 33.40% +0.08%
+ Complexity 769 768 -1
============================================
Files 107 107
Lines 35395 36331 +936
Branches 7669 7936 +267
============================================
+ Hits 11795 12137 +342
- Misses 21146 21645 +499
- Partials 2454 2549 +95 ☔ View full report in Codecov by Sentry. |
val operatorDisabledFlag = s"$COMET_EXEC_CONFIG_PREFIX.$operator.disabled" | ||
conf.getConfString(operatorFlag, "false").toBoolean || isCometAllOperatorEnabled(conf) && | ||
!conf.getConfString(operatorDisabledFlag, "false").toBoolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "disable" flag is useful to disable a particular operator in unit test. For example, I disable sort merge join in one existing test below.
// TODO: Support SortMergeJoin with join condition after new DataFusion release | ||
if (join.condition.isDefined) { | ||
return None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not added join filter support in this PR. I will do it in follow up.
@@ -859,6 +906,7 @@ class CometExecSuite extends CometTestBase { | |||
.saveAsTable("bucketed_table2") | |||
|
|||
withSQLConf( | |||
"spark.comet.exec.sort_merge_join.disabled" -> "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test explicitly checks for Spark sort merge join. I'd like to keep what it proposes to test so disable Comet sort merge join here.
common/src/main/java/org/apache/comet/vector/CometPlainVector.java
Outdated
Show resolved
Hide resolved
I will update the diff for failed Spark tests. |
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala
Outdated
Show resolved
Hide resolved
withParquetTable((0 until 10).map(i => (i, i % 5)), "tbl_a") { | ||
withParquetTable((0 until 10).map(i => (i % 10, i + 2)), "tbl_b") { | ||
val df1 = sql("SELECT * FROM tbl_a JOIN tbl_b ON tbl_a._2 = tbl_b._1") | ||
checkSparkAnswerAndOperator(df1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add checks to make sure the plan includes the Comet SMJ, i.e. stripAQEPlan(df.queryExecution.executedPlan).collectFirst...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkSparkAnswerAndOperator
already does the check. If there is Spark SMJ or other join operators, it will report error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How does checkSparkAnswerAndOperator
checks whether it is Comet SMJ?
Unless we provide includeClasses
, it only checks classes below?
https://github.com/apache/arrow-datafusion-comet/blob/main/spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala#L156
case _: CometScanExec | _: CometBatchScanExec => true
case _: CometSinkPlaceHolder | _: CometScanWrapper => false
case _: CometExec | _: CometShuffleExchangeExec => true
case _: CometBroadcastExchangeExec => true
case _: WholeStageCodegenExec | _: ColumnarToRowExec | _: InputAdapter => true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CometSortMergeJoinExec
along with other native operators are CometExec
so it is listed in checkCometOperators
. We don't white list all native operators but the common base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. I think it is good for now. Ideal if we can check CometSortMergeJoinExec
specifically, as we will add other joins. In that way, we can make sure that we are not testing different join accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. I can add something to verify join type (sort merge join or hash join or others) in a follow up. Thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
withParquetTable((0 until 10).map(i => (i, i % 5)), "tbl_a") { | ||
withParquetTable((0 until 10).map(i => (i % 10, i + 2)), "tbl_b") { | ||
val df1 = sql("SELECT * FROM tbl_a JOIN tbl_b ON tbl_a._2 = tbl_b._1") | ||
checkSparkAnswerAndOperator(df1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. I think it is good for now. Ideal if we can check CometSortMergeJoinExec
specifically, as we will add other joins. In that way, we can make sure that we are not testing different join accidentally.
common/src/main/java/org/apache/comet/vector/CometPlainVector.java
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (pending CI)
Merged. Thanks. |
Which issue does this PR close?
Closes #177.
Rationale for this change
What changes are included in this PR?
How are these changes tested?