Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Broadcast Join should not generate nondetermistic GRF (backport #44111) #45167

Closed
wants to merge 1 commit into from

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented May 7, 2024

Why I'm doing:

When broadcast Join is executed on several BE(s), which GRF is adopted by remote Fragments relay on the GRF delivery time, so the first arival wins. so all instances of broadcast JOIN should generate identical GRF. however in two scenarios, this invariant can not be hold.

  1. There is Join appears above the broadcast Join, it is generates a runtime filter that take effect on the local right offsprings of the broadcast JOIN, the the right offsprings of broadcast JOIN output different rows.
  • The upper JOIN is BUCKET SHUFFLE(s) and in the same Fragment as broadcast JOIN, this upper Join generate different local RF.
  • The upper JOIN is in the different Fragment and generates GRF, but right offsprings of broadcast JOIN receive GRF at different time point, which filter out different rows.
  1. One instance of probe side of broadcast JOIN is empty while other instances are not empty, it make the broadcast JOIN finish in short-circuit way, so this instance generate empty GRF, while other instances generate non-empty GRF.

Another bug is runtime filter push down to wrong side of the left outer join, left anti join, full outer join, right outer join, right anti join when the probe expr of the runtime filter is not the column of the join equivalent predicate. runtime filter pushdown should abide to the principle as the filter pushdown rule.

What I'm doing:

  1. If a broadcast JOIN that can generate GRF, it local right offspring should not use runtime filter generated by Join.
  2. If a broadcast JOIN generate a empty GRF, it should not publish it.
  3. runtime filter pushdown on left outer join, full outer join, left anti join, right outer join, right anti join should not abide to the principle as the filter pushdown rule.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

This is an automatic backport of pull request #44111 done by [Mergify](https://mergify.com). ## Why I'm doing: When broadcast Join is executed on several BE(s), which GRF is adopted by remote Fragments relay on the GRF delivery time, so the first arival wins. so all instances of broadcast JOIN should generate identical GRF. however in two scenarios, this invariant can not be hold. 1. There is Join appears above the broadcast Join, it is generates a runtime filter that take effect on the local right offsprings of the broadcast JOIN, the the right offsprings of broadcast JOIN output different rows. - The upper JOIN is BUCKET SHUFFLE(s) and in the same Fragment as broadcast JOIN, this upper Join generate different local RF. - The upper JOIN is in the different Fragment and generates GRF, but right offsprings of broadcast JOIN receive GRF at different time point, which filter out different rows.
  1. One instance of probe side of broadcast JOIN is empty while other instances are not empty, it make the broadcast JOIN finish in short-circuit way, so this instance generate empty GRF, while other instances generate non-empty GRF.

Another bug is runtime filter push down to wrong side of the left outer join, left anti join, full outer join, right outer join, right anti join when the probe expr of the runtime filter is not the column of the join equivalent predicate. runtime filter pushdown should abide to the principle as the filter pushdown rule.

What I'm doing:

  1. If a broadcast JOIN that can generate GRF, it local right offspring should not use runtime filter generated by Join.
  2. If a broadcast JOIN generate a empty GRF, it should not publish it.
  3. runtime filter pushdown on left outer join, full outer join, left anti join, right outer join, right anti join should not abide to the principle as the filter pushdown rule.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: satanson <[email protected]>
(cherry picked from commit ecbc790)

# Conflicts:
#	be/src/exec/exec_node.h
#	be/src/exec/vectorized/hash_join_node.cpp
#	fe/fe-core/src/main/java/com/starrocks/planner/JoinNode.java
#	fe/fe-core/src/main/java/com/starrocks/planner/PlanFragment.java
#	fe/fe-core/src/main/java/com/starrocks/planner/RuntimeFilterDescription.java
#	fe/fe-core/src/main/java/com/starrocks/planner/RuntimeFilterPushDownContext.java
@mergify mergify bot added the conflicts label May 7, 2024
Copy link
Contributor Author

mergify bot commented May 7, 2024

Cherry-pick of ecbc790 has failed:

On branch mergify/bp/branch-2.5/pr-44111
Your branch is up to date with 'origin/branch-2.5'.

You are currently cherry-picking commit ecbc7907bb.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/exec/exec_node.cpp
	modified:   be/src/exec/pipeline/fragment_executor.cpp
	modified:   be/src/exec/vectorized/aggregate/aggregate_base_node.cpp
	modified:   be/src/exec/vectorized/hash_join_node.h
	modified:   be/src/exec/vectorized/project_node.cpp
	modified:   be/src/exprs/vectorized/runtime_filter_bank.cpp
	modified:   be/src/exprs/vectorized/runtime_filter_bank.h
	modified:   be/src/runtime/runtime_filter_worker.cpp
	modified:   be/src/runtime/runtime_state.h
	modified:   fe/fe-core/src/main/java/com/starrocks/analysis/JoinOperator.java
	modified:   fe/fe-core/src/main/java/com/starrocks/planner/PlanNode.java
	modified:   fe/fe-core/src/main/java/com/starrocks/sql/plan/PlanFragmentBuilder.java
	new file:   test/sql/test_runtime_filter_push_down_on_left_join/R/test_runtime_filter_push_down_on_left_join
	new file:   test/sql/test_runtime_filter_push_down_on_left_join/T/test_runtime_filter_push_down_on_left_join
	new file:   test/sql/test_runtime_filter_push_down_on_local_right_offsprings_of_broadcast_join_with_grf/R/test_runtime_filter_push_down_on_local_right_offsprings_of_broadcast_join_with_grf
	new file:   test/sql/test_runtime_filter_push_down_on_local_right_offsprings_of_broadcast_join_with_grf/T/test_runtime_filter_push_down_on_local_right_offsprings_of_broadcast_join_with_grf

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   be/src/exec/exec_node.h
	both modified:   be/src/exec/vectorized/hash_join_node.cpp
	both modified:   fe/fe-core/src/main/java/com/starrocks/planner/JoinNode.java
	both modified:   fe/fe-core/src/main/java/com/starrocks/planner/PlanFragment.java
	both modified:   fe/fe-core/src/main/java/com/starrocks/planner/RuntimeFilterDescription.java
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/planner/RuntimeFilterPushDownContext.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@wanpengfei-git wanpengfei-git enabled auto-merge (squash) May 7, 2024 02:17
@mergify mergify bot closed this May 7, 2024
auto-merge was automatically disabled May 7, 2024 02:17

Pull request was closed

Copy link
Contributor Author

mergify bot commented May 7, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@mergify mergify bot deleted the mergify/bp/branch-2.5/pr-44111 branch May 7, 2024 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant