-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes #299
Conversation
45b4588
to
d1257ea
Compare
@@ -377,7 +383,8 @@ case class CometProjectExec( | |||
override val output: Seq[Attribute], | |||
child: SparkPlan, | |||
override val serializedPlanOpt: SerializedPlan) | |||
extends CometUnaryExec { | |||
extends CometUnaryExec | |||
with PartitioningPreservingUnaryExecNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartitioningPreservingUnaryExecNode
implements outputPartitioning
for ProjectExec.
@@ -586,7 +607,8 @@ case class CometHashAggregateExec( | |||
mode: Option[AggregateMode], | |||
child: SparkPlan, | |||
override val serializedPlanOpt: SerializedPlan) | |||
extends CometUnaryExec { | |||
extends CometUnaryExec | |||
with PartitioningPreservingUnaryExecNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartitioningPreservingUnaryExecNode implements outputPartitioning for HashAggregateExec
.
* This is copied from Spark's `PartitioningPreservingUnaryExecNode` because it is only available | ||
* in Spark 3.4+. This is a workaround to make it available in Spark 3.2+. | ||
*/ | ||
trait PartitioningPreservingUnaryExecNode extends UnaryExecNode with AliasAwareOutputExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark 3.4 has a bit more change on outputPartitioning
of some nodes like ProjectExec
. Copied it from Spark 3.4.
extends CometUnaryExec { | ||
|
||
override def outputPartitioning: Partitioning = child.outputPartitioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add this outputPartitioning
to CometUnaryExec
as the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Spark's default outputPartitioning
is UnknownPartitioning
, if we add child.outputPartitioning
to CometUnaryExec
as default, it will possibly change outputPartitioning
if we don't notice it.
Currently CometExec
uses original Spark plan's outputPartitioning
as default which is safer option, I think. Except for the case that Spark dynamically changes output partitioning during execution like AQE, it should be correct because Comet doesn't change output partitioning from Spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on the specific changes but it looks like we are just porting Spark logic over, so LGTM.
208d9e2
to
f0cb1cb
Compare
Thank you @andygrove |
Merged. Thanks. |
…s after AQE interferes (apache#299) * fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes * Add compatibility with Spark 3.2 and 3.3 * Remove unused import
Which issue does this PR close?
Closes #298.
Rationale for this change
What changes are included in this PR?
How are these changes tested?