Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of external aggregation with a lot of temporary f… #262

Merged

Conversation

yokofly
Copy link
Collaborator

@yokofly yokofly commented Nov 8, 2023

…iles

porting ClickHouse/ClickHouse#55489

I try this SQL SELECT number, count() FROM numbers_mt(5000) GROUP BY number

| Implementation           | Elapsed Time | Processed Rows | Data Processed | Rows/s          | Data/s        |
|--------------------------|--------------|----------------|----------------|-----------------|---------------|
| Proton before this PR    | 0.007 sec    | 5.00 thousand  | 40.00 KB       | 665.96 thousand | 5.33 MB/s     |
| Proton after this PR     | 0.004 sec    | 5.00 thousand  | 40.00 KB       | 1.12 million    | 8.96 MB/s     |
| ClickHouse latest        | 0.003 sec    | 5.00 thousand  | 40.00 KB       | 1.97 million    | 15.78 MB/s    |

proton pipeline:

:) explain pipeline SELECT number, count() FROM numbers_mt(5000) GROUP BY number 

EXPLAIN PIPELINE
SELECT 
  number, count()
FROM 
  numbers_mt(5000)
GROUP BY 
  number

Query id: f25e0897-b550-476b-bfaa-133582a10464

┌─explain─────────────────┐
│ (Expression)            │
│ ExpressionTransform     │
│   (Aggregating)         │
│   AggregatingTransform  │
│     (Expression)        │
│     ExpressionTransform │
│       (ReadFromStorage) │
│       Limit             │
│         Numbers 0 → 1   │
└─────────────────────────┘

9 rows in set. Elapsed: 0.004 sec. 

:) 

clickhouse pipeline:

:) explain pipeline  SELECT number, count() FROM numbers_mt(5000) GROUP BY number

EXPLAIN PIPELINE
SELECT
    number,
    count()
FROM numbers_mt(5000)
GROUP BY number

Query id: 5ad3231d-0e56-4e04-adec-49a2beaaa0c8

┌─explain───────────────────┐
│ (Expression)              │
│ ExpressionTransform × 8   │
│   (Aggregating)           │
│   Resize 1 → 8            │
│     AggregatingTransform  │
│       (Expression)        │
│       ExpressionTransform │
│         (ReadFromStorage) │
│         Limit             │
│           Numbers 0 → 1   │
└───────────────────────────┘

10 rows in set. Elapsed: 0.002 sec. 

:) 

Copy link
Collaborator

@yl-lisen yl-lisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yokofly yokofly merged commit 727d819 into develop Nov 9, 2023
@yokofly yokofly deleted the porting/issue-3238-improve-performance-of-external-aggr branch November 9, 2023 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants