Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site cleanup, mostly minor changes #96

Merged
merged 2 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ time_format_blog = "2006.01.02"

[params]
# copyright = " Altinity Inc."
copyright = " Altinity Inc. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc."
copyright = " Altinity Inc. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kafka, Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners."
privacy_policy = "https://altinity.com/privacy-policy/"
favicon = "/favicon.ico"

Expand Down
8 changes: 7 additions & 1 deletion content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,13 @@ The [Altinity Knowledge Base is licensed under Apache 2.0](https://github.com/Al
For more detailed information about Altinity services support, see the following:

* [Altinity](https://altinity.com/): Providers of Altinity.Cloud, providing SOC-2 certified support for ClickHouse.
* [Altinity ClickHouse Documentation](https://docs.altinity.com): Detailed guides on installing and connecting ClickHouse software to other services.
* [Altinity.com Documentation](https://docs.altinity.com): Detailed guides on working with:
* [Altinity.Cloud](https://docs.altinity.com/altinitycloud/)
* [Altinity.Cloud Anywhere](https://docs.altinity.com/altinitycloudanywhere/)
* [The Altinity Cloud Manager](https://docs.altinity.com/altinitycloud/quickstartguide/clusterviewexplore/)
* [The Altinity Kubernetes Operator for ClickHouse](https://docs.altinity.com/releasenotes/altinity-kubernetes-operator-release-notes/)
* [The Altinity Sink Connector for ClickHouse](https://docs.altinity.com/releasenotes/altinity-sink-connector-release-notes/) and
* [Altinity Backup for ClickHouse](https://docs.altinity.com/releasenotes/altinity-backup-release-notes/)
* [Altinity Blog](https://altinity.com/blog/): Blog posts about ClickHouse the database and Altinity services.

The following sites are also useful references regarding ClickHouse:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linkTitle: "Dictionaries & arrays"
description: >
Dictionaries & arrays
---
## Dictionary with Clickhouse table as a source
## Dictionary with ClickHouse table as a source

### Test data

Expand Down
4 changes: 2 additions & 2 deletions content/en/altinity-kb-dictionaries/partial-updates.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linkTitle: "Partial updates"
description: >
Partial updates
---
Clickhouse is able to fetch from a source only updated rows. You need to define `update_field` section.
ClickHouse is able to fetch from a source only updated rows. You need to define `update_field` section.

As an example, We have a table in an external source MySQL, PG, HTTP, ... defined with the following code sample:

Expand Down Expand Up @@ -36,4 +36,4 @@ LIFETIME(MIN 30 MAX 30)

A dictionary with **update_field** `updated_at` will fetch only updated rows. A dictionary saves the current time (now) time of the last successful update and queries the source `where updated_at >= previous_update - 1` (shift = 1 sec.).

In case of HTTP source Clickhouse will send get requests with **update_field** as an URL parameter `&updated_at=2020-01-01%2000:01:01`
In case of HTTP source ClickHouse will send get requests with **update_field** as an URL parameter `&updated_at=2020-01-01%2000:01:01`
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >
---


## Dictionary with Clickhouse table as a source with named collections
## Dictionary with ClickHouse table as a source with named collections

### Data for connecting to external sources can be stored in named collections

Expand Down
6 changes: 3 additions & 3 deletions content/en/altinity-kb-integrations/Spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The trivial & natural way to talk to ClickHouse from Spark is using jdbc. There

ClickHouse-Native-JDBC has some hints about integration with Spark even in the main README file.

'Official' driver does support some conversion of complex data types (Roarring bitmaps) for Spark-Clickhouse integration: https://github.com/ClickHouse/clickhouse-jdbc/pull/596
'Official' driver does support some conversion of complex data types (Roarring bitmaps) for Spark-ClickHouse integration: https://github.com/ClickHouse/clickhouse-jdbc/pull/596

But proper partitioning of the data (to spark partitions) may be tricky with jdbc.

Expand Down Expand Up @@ -58,12 +58,12 @@ Arrays, Higher-order functions, machine learning, integration with lot of differ
* Using a bunch of ClickHouse and Spark in MFI Soft (Russian) https://www.youtube.com/watch?v=ID8eTnmag0s (russian)
* Spark read and write ClickHouse (Chinese: Spark读写ClickHouse) https://yerias.github.io/2020/12/08/clickhouse/9/#Jdbc%E6%93%8D%E4%BD%9Cclickhouse
* Spark JDBC write clickhouse operation summary (Chinese: Spark JDBC 写 clickhouse 操作总结) https://www.jianshu.com/p/43f78c8a025b?hmsr=toutiao.io&utm_campaign=toutiao.io&utm_medium=toutiao.io&utm_source=toutiao.io
* Spark-sql is based on Clickhouse's DataSourceV2 data source extension (Chinese: spark-sql基于Clickhouse的DataSourceV2数据源扩展)
* Spark-sql is based on ClickHouse's DataSourceV2 data source extension (Chinese: spark-sql基于ClickHouse的DataSourceV2数据源扩展)
https://www.cnblogs.com/mengyao/p/4689866.html
* Alibaba integration instructions (English) https://www.alibabacloud.com/help/doc-detail/191192.htm
* Tencent integration instructions (English) https://intl.cloud.tencent.com/document/product/1026/35884
* Yandex DataProc demo: loading files from S3 to ClickHouse with Spark (Russian) https://www.youtube.com/watch?v=N3bZW0_rRzI
* Clickhouse official documentation_Spark JDBC writes some pits of ClickHouse (Chinese: clickhouse官方文档_Spark JDBC写ClickHouse的一些坑) https://blog.csdn.net/weixin_39615984/article/details/111206050
* ClickHouse official documentation_Spark JDBC writes some pits of ClickHouse (Chinese: clickhouse官方文档_Spark JDBC写ClickHouse的一些坑) https://blog.csdn.net/weixin_39615984/article/details/111206050
* ClickHouse data import: Flink, Spark, Kafka, MySQL, Hive (Chinese: 篇五|ClickHouse数据导入 Flink、Spark、Kafka、MySQL、Hive) https://zhuanlan.zhihu.com/p/299094269
* Baifendian Big Data Technical Team: Practice of ClickHouse data synchronization solutionbased on multiple Spark tasks (Chinese: 百分点大数据技术团队:基于多 Spark 任务的 ClickHouse 数据同步方案实践) https://www.6aiq.com/article/1635461873075
* SPARK-CLICKHOUSE-ES REAL-TIME PROJECT EIGHTH DAY-PRECISE ONE-TIME CONSUMPTION SAVE OFFSET. (Chinese: SPARK-CLICKHOUSE-ES实时项目第八天-精确一次性消费保存偏移量) https://www.freesion.com/article/71421322524/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ We need to have something like transactions on ClickHouse side to be able to avo

## block-aggregator by eBay

Block Aggregator is a data loader that subscribes to Kafka topics, aggregates the Kafka messages into blocks that follow the Clickhouse’s table schemas, and then inserts the blocks into ClickHouse. Block Aggregator provides exactly-once delivery guarantee to load data from Kafka to ClickHouse. Block Aggregator utilizes Kafka’s metadata to keep track of blocks that are intended to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures. The identical blocks are guaranteed to be deduplicated by ClickHouse.
Block Aggregator is a data loader that subscribes to Kafka topics, aggregates the Kafka messages into blocks that follow the ClickHouse’s table schemas, and then inserts the blocks into ClickHouse. Block Aggregator provides exactly-once delivery guarantee to load data from Kafka to ClickHouse. Block Aggregator utilizes Kafka’s metadata to keep track of blocks that are intended to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures. The identical blocks are guaranteed to be deduplicated by ClickHouse.

[eBay/block-aggregator](https://github.com/eBay/block-aggregator)
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Article is based on feedback provided by one of Altinity clients.
CatBoost:

* It uses gradient boosting - a hard to use technique which can outperform neural networks. Gradient boosting is powerful but it's easy to shoot yourself in the foot using it.
* The documentation on how to use it is quite lacking. The only good source of information on how to properly configure a model to yield good results is this video: [https://www.youtube.com/watch?v=usdEWSDisS0](https://www.youtube.com/watch?v=usdEWSDisS0) . We had to dig around GitHub issues to find out how to make it work with ClickHouse.
* The documentation on how to use it is quite lacking. The only good source of information on how to properly configure a model to yield good results is this video: [https://www.youtube.com/watch?v=usdEWSDisS0](https://www.youtube.com/watch?v=usdEWSDisS0) . We had to dig around GitHub issues to find out how to make it work with ClickHouse®.
* CatBoost is fast. Other libraries will take ~5X to ~10X as long to do what CatBoost does.
* CatBoost will do preprocessing out of the box (fills nulls, apply standard scaling, encodes strings as numbers).
* CatBoost has all functions you'd need (metrics, plotters, feature importance)
Expand Down
4 changes: 2 additions & 2 deletions content/en/altinity-kb-integrations/mysql-clickhouse.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: "MySQL"
linkTitle: "Integration Clickhouse with MySQL"
linkTitle: "Integrating ClickHouse® with MySQL"
weight: 100
description: >-
Integration Clickhouse with MySQL
Integrating ClickHouse® with MySQL
---

### Replication using MaterializeMySQL.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ kubectl logs chi-chcluster-2-1-0 -c clickhouse-pod -n chcluster --previous
kubectl describe pod chi-chcluster-2-1-0 -n chcluster
```

Q. Clickhouse is caching the Kafka pod's IP and trying to connect to the same ip even when there is a new Kafka pod running and the old one is deprecated. Is there some setting where we could refresh the connection
Q. ClickHouse is caching the Kafka pod's IP and trying to connect to the same ip even when there is a new Kafka pod running and the old one is deprecated. Is there some setting where we could refresh the connection

`<disable_internal_dns_cache>1</disable_internal_dns_cache>` in config.xml

Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ keywords:
- clickhouse queries
- clickhouse joins
description: >
Learn about ClickHouse queries & syntax, including Joins & Window Functions.
Learn about ClickHouse® queries & syntax, including Joins & Window Functions.
weight: 1
---
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >
---
`SELECT * FROM table FINAL`

* Before 20.5 - always executed in a single thread and slow.
* Before ClickHouse® 20.5 - always executed in a single thread and slow.
* Since 20.5 - final can be parallel, see [https://github.com/ClickHouse/ClickHouse/pull/10463](https://github.com/ClickHouse/ClickHouse/pull/10463)
* Since 20.10 - you can use `do_not_merge_across_partitions_select_final` setting.
* Since 22.6 - final even more parallel, see [https://github.com/ClickHouse/ClickHouse/pull/36396](https://github.com/ClickHouse/ClickHouse/pull/36396)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: >
Unfortunately not all queries can be killed.
`KILL QUERY` only sets a flag that must be checked by the query.
A query pipeline is checking this flag before a switching to next block. If the pipeline has stuck somewhere in the middle it cannot be killed.
If a query does not stop, the only way to get rid of it is to restart ClickHouse.
If a query does not stop, the only way to get rid of it is to restart ClickHouse®.

See also:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ You have 40 parts in 3 partitions. This unscheduled merge selects some partition

`OPTIMIZE TABLE xyz FINAL` -- initiates a cycle of unscheduled merges.

ClickHouse merges parts in this table until will remains 1 part in each partition (if a system has enough free disk space). As a result, you get 3 parts, 1 part per partition. In this case, CH rewrites parts even if they are already merged into a single part. It creates a huge CPU / Disk load if the table ( XYZ) is huge. ClickHouse reads / uncompress / merge / compress / writes all data in the table.
ClickHouse® merges parts in this table until will remains 1 part in each partition (if a system has enough free disk space). As a result, you get 3 parts, 1 part per partition. In this case, ClickHouse rewrites parts even if they are already merged into a single part. It creates a huge CPU / Disk load if the table (XYZ) is huge. ClickHouse reads / uncompress / merge / compress / writes all data in the table.

If this table has size 1TB it could take around 3 hours to complete.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: >
Parameterized views
---

## ClickHouse version 23.1+
## ClickHouse® version 23.1+

(23.1.6.42, 23.2.5.46, 23.3.1.2823)
Have inbuild support for [parametrized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view#parameterized-view):
Expand Down Expand Up @@ -34,7 +34,7 @@ select * from v(xx=[1,2,3]);
```


## ClickHouse versions per 23.1
## ClickHouse versions pre 23.1

Custom settings allows to emulate parameterized views.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linkTitle: "Possible deadlock avoided. Client should retry"
description: >
Possible deadlock avoided. Client should retry
---
In version 19.14 a serious issue was found: a race condition that can lead to server deadlock. The reason for that was quite fundamental, and a temporary workaround for that was added ("possible deadlock avoided").
In ClickHouse® version 19.14 a serious issue was found: a race condition that can lead to server deadlock. The reason for that was quite fundamental, and a temporary workaround for that was added ("possible deadlock avoided").

Those locks are one of the fundamental things that the core team was actively working on in 2020.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The execution pipeline is embedded in the partition reading code.

So that works this way:

1. ClickHouse does partition pruning based on `WHERE` conditions.
1. ClickHouse® does partition pruning based on `WHERE` conditions.
2. For every partition, it picks a columns ranges (aka 'marks' / 'granulas') based on primary key conditions.
3. Here the sampling logic is applied: a) in case of `SAMPLE k` (`k` in `0..1` range) it adds conditions `WHERE sample_key < k * max_int_of_sample_key_type` b) in case of `SAMPLE k OFFSET m` it adds conditions `WHERE sample_key BETWEEN m * max_int_of_sample_key_type AND (m + k) * max_int_of_sample_key_type`c) in case of `SAMPLE N` (N>1) if first estimates how many rows are inside the range we need to read and based on that convert it to 3a case (calculate k based on number of rows in ranges and desired number of rows)
4. on the data returned by those other conditions are applied (so here the number of rows can be decreased here)
Expand Down Expand Up @@ -56,4 +56,4 @@ SELECT count() FROM table WHERE ... AND cityHash64(some_high_card_key) % 10 = 0;
SELECT count() FROM table WHERE ... AND rand() % 10 = 0; -- Non-deterministic
```

ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key. (To learn more about ClickHouse internals, [ClickHouse Administrator Training](https://altinity.com/clickhouse-training/) is available.)
ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key. (To learn more about ClickHouse internals, [Administrator Training for ClickHouse](https://altinity.com/clickhouse-training/) is available.)
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/ansi-sql-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linkTitle: "ANSI SQL mode"
description: >
ANSI SQL mode
---
It's possible to tune some settings which would make ClickHouse more ANSI SQL compatible(and slower):
It's possible to tune some settings which would make ClickHouse® more ANSI SQL compatible(and slower):

```sql
SET join_use_nulls=1; -- introduced long ago
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@ description: >-
Using array functions to mimic window-functions alike behavior.
---

# Using array functions to mimic window functions alike behavior

There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to contol the memory better / use on-disk spiling, or just if you have old ClickHouse version.
There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to contol the memory better / use on-disk spiling, or just if you have old ClickHouse® version.

## Running difference sample

Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/async-inserts.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: >
Async INSERTs
---

Async INSERTs is a ClickHouse feature tha enables batching data automatically and transparently on the server-side. We recommend to batch at app/ingestor level because you will have more control and you decouple this responsibility from ClickHouse, but there are use cases where this is not possible and Async inserts come in handy if you have hundreds or thousands of clients doing small inserts.
Async INSERTs is a ClickHouse® feature tha enables batching data automatically and transparently on the server-side. We recommend to batch at app/ingestor level because you will have more control and you decouple this responsibility from ClickHouse, but there are use cases where this is not possible and Async inserts come in handy if you have hundreds or thousands of clients doing small inserts.

You can check how they work here: [Async inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ INSERT INTO events SELECT
FROM numbers(15);
```

## Using window functions (starting from Clickhouse 21.3)
## Using window functions (starting from ClickHouse® 21.3)

```sql
SELECT
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ description: >
</tbody>
</table>

See also [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/data_processing.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/data_processing.pdf) (slide 17-22)
See also the presentation [Data processing into ClickHouse®](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/data_processing.pdf), especially slides 17-22.
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ MemoryTracker: Peak memory usage (for query): 4.05 GiB.

0 rows in set. Elapsed: 4.852 sec. Processed 100.00 million rows, 800.00 MB (20.61 million rows/s., 164.88 MB/s.)

This query faster than first, because ClickHouse doesn't need to merge states for all keys, only for first 1000 (based on LIMIT)
This query faster than first, because ClickHouse® doesn't need to merge states for all keys, only for first 1000 (based on LIMIT)


SELECT number % 1000 AS key
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/explain-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ SELECT ...
* `SYNTAX` - query text after AST-level optimizations
* `PLAN` - query execution plan
* `PIPELINE` - query execution pipeline
* `ESTIMATE` - https://github.com/ClickHouse/ClickHouse/pull/26131 (since 21.9)
* `ESTIMATE` - See [Estimates for select query](https://github.com/ClickHouse/ClickHouse/pull/26131), available since ClickHouse® 21.9
* `indexes=1` supported starting from 21.6 (https://github.com/ClickHouse/ClickHouse/pull/22352 )
* `json=1` supported starting from 21.6 (https://github.com/ClickHouse/ClickHouse/pull/23082)

Expand Down
Loading
Loading