Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Bitmap explanation is incorrect #43110

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/zh/table_design/indexes/Bitmap_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ Bitmap 索引是一种使用 bitmap 的特殊数据库索引。bitmap 即为一

Bitmap 索引能够提高指定列的查询效率。如果一个查询条件命中列,StarRocks 即可使用[前缀索引](./Prefix_index_sort_key.md)提高查询效率,快速返回查询结果。但是前缀索引的长度有限,如果想要提高一个非前缀索引列的查询效率,即可以为这一列创建 Bitmap 索引。

Bitmap 索引一般适用于高基数列,基于列构建的 Bitmap 索引的选择性高,并且使用 Bitmap 索引后能筛选出较少数据行的场景。
Bitmap 索引一般适用于低基数列,基于列构建的 Bitmap 索引的选择性高,并且使用 Bitmap 索引后能筛选出较少数据行的场景。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the document is correct.
We want users to create a bitmap index for high cardinality columns.
Suppose creating a bitmap index for a sex column that just has male/female, half of the records will be read. In that case, the query performance can not be accelerated by the index.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the 2.5 version documentation and Wikipedia, and it says bitmap is suitable for low cardinality. The example on your official website is also a low-base example.

在 StarRocks 中使用 SSB 100G 测试数据集验证 Bitmap 索引加速查询的效果,测试结果如下:

- 只有为高基数列创建 Bitmap 索引,查询性能才会有比较明显的提升(在此测试数据集中,基数达到 100000 数量级可以看到比较明显的性能提升)。
- 高基数列可以是高基数的单列,也可以是高基数的多列组合
- 为低基数列创建 Bitmap 索引,查询性能基本没有提升甚至会下降。
- 只有为低基数列创建 Bitmap 索引,查询性能才会有比较明显的提升(在此测试数据集中,基数达到 100000 数量级可以看到比较明显的性能提升)。
- 低基数列可以是低基数的单列,也可以是低基数的多列组合
- 为高基数列创建 Bitmap 索引,查询性能基本没有提升甚至会下降。

## 优势

Expand Down
Loading