Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28572: Support Distribute by and Cluster by clauses in CBO #5505

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Oct 14, 2024

What changes were proposed in this pull request?

Currently CBO is disabled when a query has DISTRIBUTE BY or CLUSTER BY clauses. This patch lift this limitation by implementing the support of these clauses.

  1. Enable these type of conversions from AST, QB to RelNode:
  • DISTRIBUTE BY key0..keyn -> HiveSortExchange(distribution=[hash[key0...keyn]], collation=[[]])
  • DISTRIBUTE BY key0..keyn SORT BY sortKey0...sortKeyn -> HiveSortExchange(distribution=[hash[key0...keyn]], collation=[[sortKey0...sortKeyn]])
  • CLUSTER BY key0..keyn -> HiveSortExchange(distribution=[hash[key0...keyn]], collation=[[key0...keyn]])
    Hence CBO remains enabled
  1. At RelNode to AST the conversion HiveSortExchange is converted back to
TOK_DISTRIBUTEBY
  key0
  ..
  keyn
TOK_SORTBY
  sortKey0
  ...
  sortKeyn
  1. When CBO return path is enabled distribute keys are set as partition keys to the ReduceSinkDesc.
  2. Extend HiveFilterSortTransposeRule to enable pushing through Filter on SortExchange
  3. HiveProjectSortExchangeTransposeRule: do not push down Project when it does not project necessary expressions referenced by distribution keys in SortExchange

Why are the changes needed?

CBO provides many optimizations at the logical level. Queries having DISTRIBUTE BY or CLUSTER BY clauses can also benefit from these.

Does this PR introduce any user-facing change?

No but execution plan may have changes.

Is the change a dependency upgrade?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver -Dqfile=distributeby.q,distributeby_cboret.q -pl itests/qtest -Pitests

@kasakrisz kasakrisz force-pushed the HIVE-28572-master-cbo-distribute-by branch from d38310b to fa959f2 Compare October 18, 2024 07:29
@kasakrisz kasakrisz marked this pull request as ready for review October 18, 2024 08:05
Copy link

sonarcloud bot commented Oct 18, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants