Skip to content

Commit

Permalink
Merge branch 'main' into trademarks
Browse files Browse the repository at this point in the history
  • Loading branch information
DougTidwell authored Jul 30, 2024
2 parents 3e41a19 + d1f69f2 commit 797cc7d
Show file tree
Hide file tree
Showing 14 changed files with 160 additions and 119 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,36 +8,36 @@ on:

jobs:
deploy:
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- name: Git checkout
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
submodules: true # Fetch Hugo themes (true OR recursive)
fetch-depth: 0 # Fetch all history for .GitInfo and .Lastmod
ref: main

- name: Setup Hugo
uses: peaceiris/actions-hugo@v2
uses: peaceiris/actions-hugo@v3
with:
hugo-version: '0.128.2'
extended: true

- name: Cache Hugo modules
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: /tmp/hugo_cache
key: ${{ runner.os }}-hugomod-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-hugomod-
- name: Setup Node
uses: actions/setup-node@v3
uses: actions/setup-node@v4
with:
node-version: '14'
node-version: '20'

- name: Cache dependencies
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
Expand All @@ -51,7 +51,7 @@ jobs:
# run: hugo --gc

- name: Deploy
uses: peaceiris/actions-gh-pages@v3
uses: peaceiris/actions-gh-pages@v4
if: github.ref == 'refs/heads/main'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
Expand Down
2 changes: 1 addition & 1 deletion content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,6 @@ For more detailed information about Altinity services support, see the following

The following sites are also useful references regarding ClickHouse:

* [ClickHouse.com documentation](https://clickhouse.com/docs/en/): From Yandex, the creators of ClickHouse
* [ClickHouse.com documentation](https://clickhouse.com/docs/en/): Official documentation from ClickHouse Inc.
* [ClickHouse at Stackoverflow](https://stackoverflow.com/questions/tagged/clickhouse): Community driven responses to questions regarding ClickHouse
* [Google groups (Usenet) yes we remember it](https://groups.google.com/g/clickhouse): The grandparent of all modern discussion boards.
74 changes: 48 additions & 26 deletions content/en/altinity-kb-integrations/ClickHouse_python_drivers.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ The **`clickhouse-driver`** is a Python library used for interacting with ClickH
7. **Asynchronous Support**: Supports asynchronous execution of queries using `asyncio`, allowing for non-blocking query execution in asynchronous Python applications.
8. **Customization**: Provides options for customizing connection settings, query execution behavior, and other parameters to suit specific application requirements and performance considerations.
9. **Compatibility**: Works with various versions of ClickHouse, ensuring compatibility and support for different ClickHouse features and functionalities.
10. **Documentation and Community**: Offers comprehensive documentation and active community support, including examples, tutorials, and forums, to assist developers in effectively using the library and addressing any issues or questions they may have.
10. **Documentation and Community**: Offers comprehensive documentation and active community support, including examples, tutorials, and forums, to assist developers in effectively using the library and addressing any issues or questions they may have.
11. **Supports multiple host** **on connection string** https://clickhouse-driver.readthedocs.io/en/latest/features.html#multiple-hosts
12. **Connection pooling** (aiohttp)

**Python ecosystem libs/modules:**

Expand All @@ -50,13 +52,15 @@ The ClickHouse Connect Python driver is the ClickHouse, Inc supported-official P
8. **Limited Asynchronous Support**: Some implementations of the driver offer asynchronous support, allowing developers to execute queries asynchronously to improve concurrency and scalability in asynchronous Python applications using asynchronous I/O frameworks like `asyncio`.
9. **Configuration Options**: The driver offers various configuration options, such as connection parameters, authentication methods, and connection pooling settings, allowing developers to customize the driver's behavior to suit their specific requirements and environment.
10. **Documentation and Community**: Offers comprehensive documentation and active community support, including examples, tutorials, and forums, to assist developers in effectively using the library and addressing any issues or questions they may have. [https://clickhouse.com/docs/en/integrations/language-clients/python/intro/](https://clickhouse.com/docs/en/integrations/language-clients/python/intro/)
11. **Multiple host on connection string not supported** https://github.com/ClickHouse/clickhouse-connect/issues/74
12. **Connection pooling** (urllib3)

**Python ecosystem libs/modules:**

- Good Pandas/Numpy support: [https://clickhouse.com/docs/en/integrations/python#consuming-query-results-with-numpy-pandas-or-arrow](https://clickhouse.com/docs/en/integrations/python#consuming-query-results-with-numpy-pandas-or-arrow)
- Decent SQLAlchemy 1.3 and 1.4 support (limited feature set)

It is the most recent driver with the latest feature set (query context and query streaming …. )
It is the most recent driver with the latest feature set (query context and query streaming …. ), and in recent release [asyncio wrapper](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.7.16)

You can check multiple official examples here:

Expand All @@ -68,17 +72,20 @@ Also some Altinity examples from repo:

You can clone the repo and use the helper files like `DDL.sql` to setup some tests.

Clickhouse-connect can use a connection pooler (based on urllib3) [https://clickhouse.com/docs/en/integrations/python#customizing-the-http-connection-pool](https://clickhouse.com/docs/en/integrations/python#customizing-the-http-connection-pool)

### Most common use cases:

#### Connection pooler:

- Clickhouse-connect can use a connection pooler (based on urllib3) https://clickhouse.com/docs/en/integrations/python#customizing-the-http-connection-pool
- Clickhouse-driver you can use **aiohttp** (https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size)

#### Managing ClickHouse `session_id`:

- clickhouse-driver
- Because it is using the Native Interface `session_id` is managed internally by clickhouse, so it is very rare (unless using asyncio) to get:

`Code: 373. DB::Exception: Session is locked by a concurrent client. (SESSION_IS_LOCKED)` .


- clickhouse-connect: How to use clickhouse-connect in a pythonic way and avoid getting `SESSION_IS_LOCKED` exceptions:
- [https://clickhouse.com/docs/en/integrations/python#managing-clickhouse-session-ids](https://clickhouse.com/docs/en/integrations/python#managing-clickhouse-session-ids)
Expand All @@ -96,17 +103,39 @@ Also in clickhouse documentation some explanation how to set `session_id` with a

[Best practices with flask · Issue #73 · ClickHouse/clickhouse-connect](https://github.com/ClickHouse/clickhouse-connect/issues/73#issuecomment-1325280242)

#### clickhouse-connect & clickhouse-driver with Asyncio
#### Asyncio (asynchronous wrappers)

##### clickhouse-connect

New release with [asyncio wrapper for clickhouse-connect](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.7.16)

How the wrapper works: https://clickhouse.com/docs/en/integrations/python#asyncclient-wrapper

Wrapper and connection pooler example:

```python
import clickhouse_connect
import asyncio
from clickhouse_connect.driver.httputil import get_pool_manager

async def main():
client = await clickhouse_connect.get_async_client(host='localhost', port=8123, pool_mgr=get_pool_manager())
for i in range(100):
result = await client.query("SELECT name FROM system.databases")
print(result.result_rows)

asyncio.run(main())
```

`clickhouse-connect` code is synchronous and running synchronous functions in an async application is a workaround and might not be as efficient as using a library designed for asynchronous operations from the ground up. Problem is there are few libs/modules in Python. So you can use `concurrent.futures` and `ThreadpoolExecutor` or `ProcessPoolExecutor`. Python GIL has a mutex over Threads but not to Processes so if you need performance at the cost of using processes instead of threads (not much different for medium workloads) you can use `ProcesspoolExecutor` instead.
`clickhouse-connect` code is synchronous by default and running synchronous functions in an async application is a workaround and might not be as efficient as using a library/wrapper designed for asynchronous operations from the ground up.. So you can use the current wrapper or you can use another approach with `asyncio` and `concurrent.futures` and `ThreadpoolExecutor` or `ProcessPoolExecutor`. Python GIL has a mutex over Threads but not to Processes so if you need performance at the cost of using processes instead of threads (not much different for medium workloads) you can use `ProcesspoolExecutor` instead.

Some info about this from the tinybird guys [https://www.tinybird.co/blog-posts/killing-the-processpoolexecutor](https://www.tinybird.co/blog-posts/killing-the-processpoolexecutor)
Some info about this from the tinybird guys https://www.tinybird.co/blog-posts/killing-the-processpoolexecutor

For clickhouse-connect
For clickhouse-connect :

```python
import asyncio
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
import clickhouse_connect

# Function to execute a query using clickhouse-connect synchronously
Expand All @@ -115,11 +144,11 @@ def execute_query_sync(query):
result = client.query(query)
return result

# Asynchronous wrapper function to run the synchronous function in a thread pool
# Asynchronous wrapper function to run the synchronous function in a process pool
async def execute_query_async(query):
loop = asyncio.get_running_loop()
# Use ThreadPoolExecutor to execute the synchronous function
with ThreadPoolExecutor() as pool:
# Use ProcessPoolExecutor to execute the synchronous function
with ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(pool, execute_query_sync, query)
return result

Expand All @@ -132,21 +161,14 @@ async def main():
if __name__ == '__main__':
asyncio.run(main())
```
##### Clickhouse-driver

Clickhouse-driver code is also synchronous and suffers the same problem as clickhouse-connect

[https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#async-and-multithreading](https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#async-and-multithreading)

So to use an asynchronous approach it is recommended to use a connection pool and some `asyncio` wrapper that can hide the complexity of using the `ThreadPoolExecutor/ProcessPoolExecutor`

To begin testing such environment [aiohttp](https://docs.aiohttp.org/) is a good approach. Here an example:

[https://github.com/lesandie/clickhouse-tests/blob/main/scripts/test_aiohttp_inserts.py](https://github.com/lesandie/clickhouse-tests/blob/main/scripts/test_aiohttp_inserts.py)

How to tune the connection pooler: [https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size](https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size))
`clickhouse-driver` code is also synchronous and suffers the same problem as `clickhouse-connect` https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#async-and-multithreading

Also `aiochclient` is another good wrapper [https://github.com/maximdanilchenko/aiochclient](https://github.com/maximdanilchenko/aiochclient) for the HTTP interface
So to use asynchronous approach it is recommended to use a connection pool and some asyncio wrapper that can hide the complexity of using the `ThreadPoolExecutor/ProcessPoolExecutor`

For the native interface you can try [https://github.com/long2ice/asynch](https://github.com/long2ice/asynch)
- To begin testing such environment [aiohttp](https://docs.aiohttp.org/) is a good approach. Here an example: https://github.com/lesandie/clickhouse-tests/blob/main/scripts/test_aiohttp_inserts.py
This will use simply requests module and aiohttp (you can tune the connection pooler https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size)

`asynch` is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuses most of [clickhouse-driver](https://github.com/mymarilyn/clickhouse-driver) and complies with [PEP249](https://www.python.org/dev/peps/pep-0249/).
- Also `aiochclient` is another good wrapper https://github.com/maximdanilchenko/aiochclient for the HTTP interface
- For the native interface you can try https://github.com/long2ice/asynch, `asynch` is an asyncio ClickHouse Python Driver with native (TCP) interface support, which reuse most of [clickhouse-driver](https://github.com/mymarilyn/clickhouse-driver) and comply with [PEP249](https://www.python.org/dev/peps/pep-0249/).
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ To add the `access_management` setting to an Altinity.Cloud ClickHouse Cluster:
1. **Contents**: Enter the following to allow the `clickhouse_operator` that controls the cluster through the `clickhouse-operator` the ability to set administrative options:

```xml
<yandex>
<clickhouse>
<users>
<admin>
<access_management>1</access_management>
Expand All @@ -37,7 +37,7 @@ To add the `access_management` setting to an Altinity.Cloud ClickHouse Cluster:
<access_management>1</access_management>
</clickhouse_operator>
</users>
</yandex>
</clickhouse>
```

access_management=1 means that users `admin`, `clickhouse_operator` are able to create users and grant them privileges using SQL.
Expand All @@ -50,7 +50,7 @@ To add the `access_management` setting to an Altinity.Cloud ClickHouse Cluster:
3. **Contents**:

```xml
<yandex>
<clickhouse>
<user_directories replace="replace">
<users_xml>
<path>/etc/clickhouse-server/users.xml</path>
Expand All @@ -62,5 +62,5 @@ To add the `access_management` setting to an Altinity.Cloud ClickHouse Cluster:
<path>/var/lib/clickhouse/access/</path>
</local_directory>
</user_directories>
</yandex>
</clickhouse>
```
Loading

0 comments on commit 797cc7d

Please sign in to comment.