Releases: zilliztech/milvus-cdc
Release v2.0.0-rc3
What's Changed
- move
replicate channel
config to cdc.yaml file fromcreate
request by @SimFG in #116 - ci: add ca-certificates in image by @zhuwenxing in #115
- fix the wrong rpc channel name by @SimFG in #119
- use the uri and token to connect the target milvus by @SimFG in #120
- update cdc config by @zhuwenxing in #118
- add mq connection check when starting the server by @SimFG in #121
- support to replicate the rbac message by @SimFG in #109
- feat: support replicate data to kafka by @Ricky-chen1 in #113
- use the TSManager instance to manage the ts info by @SimFG in #123
- keep the origin msg base when writing the data to target milvus by @SimFG in #125
- Fix the issue that the generate ts cannot be incremented normally by @SimFG in #126
- add the generated ts metric by @SimFG in #128
- fix the high cpu and memory usage and ts error by @SimFG in #129
- update the kafka version to fix the _Cfunc_GoString memory leak by @SimFG in #130
- feat: support to replicate rbac message to kafka by @Ricky-chen1 in #122
- fix the large tt lag in the target milvus by @SimFG in #131
- fix the nil point when there is a dropping collection by @SimFG in #133
- fix: resolve connection error by security protocol by @Uijeong97 in #134
- support to replicate when source/target milvus channel aren't equal by @SimFG in #132
- remove unused code by @SimFG in #135
- update the milvus pkg version and skip the env for the milvus param by @SimFG in #136
- reduce the buffer size of message queue by @SimFG in #137
- fix drop collection message lost due to tt delay by @SimFG in #139
- update milvus go-sdk to support the null and default value by @SimFG in #140
- support to replicate the database by the create request by @SimFG in #143
- improve to check the duplicate collection and replicate user and role api by @SimFG in #144
- test: add StartReadCollectionForKafka unit test by @Ricky-chen1 in #138
- improve the db usage way in the create request by @SimFG in #146
- use the source channel name in the task position by @SimFG in #147
- add the target info to the ts manager key by @SimFG in #148
- remove the invalid log by @SimFG in #149
- add data diff tool by @SimFG in #150
- fix get_should_read func function exclude logic error by @SimFG in #151
New Contributors
- @Ricky-chen1 made their first contribution in #113
- @Uijeong97 made their first contribution in #134
Full Changelog: v2.0.0-rc2...v2.0.0-rc3
What's Changed
🚀 Features
🎄 Enhancements
🐛 Bug Fixes
Changelog
- b8a4949 Fix the issue that the generate ts cannot be incremented normally (#126)
- 22c1172 add milvus ci case for the different channel num
- 8d8f74e add mq connection check when starting the server
- 902667e add rbac function unit test
- e64af8b add the diff data tool
- 643bc7c add the generated ts metric (#128)
- 6fc5647 add the target info to the ts manager key (#148)
- 4b8a125 ci: add ca-certificates in image
- 364e07b feat: support replicate data to kafka (#113)
- 63180d1 feat: support to replicate rbac message to kafka (#122)
- 66886dd fix get_should_read func function exclude logic error
- bc2d58e fix the high cpu and memory usage and ts error (#129)
- 2f29ad5 fix the large tt lag in the target milvus (#131)
- aca351a fix the nil point when there is a dropping collection (#133)
- 487f456 fix the wrong rpc channel name
- 149f4d8 fix: resolve connection error by security protocol (#134)
- 5992c40 improve the db usage way in the create request (#146)
- 2ec39d0 improve to check the duplicate collection and replicate user and role api
- cabf2d3 keep the origin msg base when writing the data to target milvus (#125)
- 0100b63 move
replicate channel
config to cdc.yaml file fromcreate
request - 10ddbbe reduce the buffer size of message queue
- 6d51b2e remove the invalid log (#149)
- 2045637 remove unused code
- b07fa77 save the collection start position when creating the collection
- 684fc67 support to replicate the database by the create request
- d204aad support to replicate the rbac message
- 0a31d7a support to replicate when source/target milvus channel aren't equal
- 82f9dd0 update cdc config (#118)
- 2313b18 update milvus go-sdk to support the null and default value
- fcf9186 update milvus pkg/proto/go-sdk version
- 51da848 update the kafka version to fix the _Cfunc_GoString memory leak (#130)
- 9c8f9c7 update the milvus pkg version and skip the env for the milvus param
- 76e55fe update the milvus pkg version to fix the drop collection msg lost
- 89d91a5 use the TSManager instance to manage the ts info (#123)
- 3ea1ebf use the source channel name in the task position (#147)
- 2217d68 use the uri and token to connect the target milvus
- c0ef2ca use the usage count for the channel forward map
Release v2.0.0-rc2
Note: If you are using the latest version of the CDC, recommend to use Milvus 2.4.7 or above (The newer the version, the more friendly it will be to cdc), as these Milvus versions have adapted the CDC functionality.
Feature
- Supports synchronization of a single collection, but currently, the collection_infos parameter in the create request only supports one collection. Please refer to the documentation for the complete create parameters: https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"create",
"request_data":{
...
"collection_infos":[
{
"name":"hello_milvus"
}
],
...
}
}
-
Supports collection checkpoints, which can be obtained from the flush method. The returned value needs to be base64 encoded before use (see the sample code below).
Note: This functionality can be used in combination with the latest backup tool for data migration or creating primary-backup instances. The basic steps are:- Backup the current Milvus instance, obtain the collection checkpoint.
- Restore the data to another Milvus instance.
- Start the CDC, creating a synchronization task for the collection and passing the previously obtained checkpoint.
If it's a data migration, you can then stop Milvus, wait for the CDC to migrate the incremental data, and then reclaim the other Milvus services, finally stopping the CDC.
And if you use this functionality, it's recommended to have a certain understanding of Milvus, as it involves internal Milvus concepts. We will write a case study document based on this information and continue to optimize the usage method.
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"create",
"request_data":{
...
"collection_infos":[
{
"name":"hello_milvus",
"positions": {
"by-dev-rootcoord-dml_0_450541344162316901v0": "AAAAAAAAAAAAAAAAAAAAAA==",
"by-dev-rootcoord-dml_1_450541344162316901v1": "AAAAAAAAAAAAAAAAAAAAAA=="
}
}
],
...
}
}
func Base64MsgPosition(position *msgpb.MsgPosition) string {
positionByte, err := proto.Marshal(position)
if err != nil {
return ""
}
return base64.StdEncoding.EncodeToString(positionByte)
}
-
The source/target Milvus supports custom certificate TLS connections.
- source milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#configuration
- target milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request-with-tls-one-way-authentication
-
The etcd supports custom certificate TLS connections.
sourceConfig:
etcd:
address:
- http://127.0.0.1:2379
rootPath: by-dev
metaSubPath: meta
enableAuth: false
username: root
password: root123456
enableTLS: false
tlsCertPath: deployment/cert/client.pem # path to your cert file
tlsKeyPath: deployment/cert/client.key # path to your key file
tlsCACertPath: deployment/cert/ca.pem # path to your CACert file
tlsMinVersion: 1.3
...
Improvement
- Improved stability - This is undoubtedly the biggest highlight of this update. Regardless of whether the source Milvus, CDC, or target Milvus is interrupted, the synchronization task can be resumed and restored to the state before the interruption after the system is repaired.
- Optimized internal logic - After extensive internal testing, with consistent data verification between the source and target Milvus instances, data consistency is now well-guaranteed even under high concurrency scenarios.
- Reduced frequent log output, which minimizes the impact on synchronization efficiency, and support for dynamic log level adjustment.
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"maintenance",
"request_data": {
"operation": "set_log_level",
"params": {
"log_level": "debug"
}
}
}
Other
Comprehensive performance data is currently being tested internally, and we will continue to optimize based on the performance test results.
NOTE: Before using the tool, please carefully read the documentation at cdc uasge, as there are still many configuration details that are easy to get wrong. If you encounter any issues, feel free to raise an issue, and I will do my best to help resolve it.
Release v2.0.0-rc1
Note: If you are using the latest version of the CDC, recommend to use Milvus 2.4.7 or above (The newer the version, the more friendly it will be to cdc), as these Milvus versions have adapted the CDC functionality.
Feature
- Supports synchronization of a single collection, but currently, the collection_infos parameter in the create request only supports one collection. Please refer to the documentation for the complete create parameters: https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"create",
"request_data":{
...
"collection_infos":[
{
"name":"hello_milvus"
}
],
...
}
}
-
Supports collection checkpoints, which can be obtained from the flush method. The returned value needs to be base64 encoded before use (see the sample code below).
Note: This functionality can be used in combination with the latest backup tool for data migration or creating primary-backup instances. The basic steps are:- Backup the current Milvus instance, obtain the collection checkpoint.
- Restore the data to another Milvus instance.
- Start the CDC, creating a synchronization task for the collection and passing the previously obtained checkpoint.
If it's a data migration, you can then stop Milvus, wait for the CDC to migrate the incremental data, and then reclaim the other Milvus services, finally stopping the CDC.
And if you use this functionality, it's recommended to have a certain understanding of Milvus, as it involves internal Milvus concepts. We will write a case study document based on this information and continue to optimize the usage method.
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"create",
"request_data":{
...
"collection_infos":[
{
"name":"hello_milvus",
"positions": {
"by-dev-rootcoord-dml_0_450541344162316901v0": "AAAAAAAAAAAAAAAAAAAAAA==",
"by-dev-rootcoord-dml_1_450541344162316901v1": "AAAAAAAAAAAAAAAAAAAAAA=="
}
}
],
...
}
}
func Base64MsgPosition(position *msgpb.MsgPosition) string {
positionByte, err := proto.Marshal(position)
if err != nil {
return ""
}
return base64.StdEncoding.EncodeToString(positionByte)
}
-
The source/target Milvus supports custom certificate TLS connections.
- source milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#configuration
- target milvus reference:https://github.com/zilliztech/milvus-cdc/blob/main/doc/cdc-usage.md#create-request-with-tls-one-way-authentication
-
The etcd supports custom certificate TLS connections.
sourceConfig:
etcd:
address:
- http://127.0.0.1:2379
rootPath: by-dev
metaSubPath: meta
enableAuth: false
username: root
password: root123456
enableTLS: false
tlsCertPath: deployment/cert/client.pem # path to your cert file
tlsKeyPath: deployment/cert/client.key # path to your key file
tlsCACertPath: deployment/cert/ca.pem # path to your CACert file
tlsMinVersion: 1.3
...
Improvement
- Improved stability - This is undoubtedly the biggest highlight of this update. Regardless of whether the source Milvus, CDC, or target Milvus is interrupted, the synchronization task can be resumed and restored to the state before the interruption after the system is repaired.
- Optimized internal logic - After extensive internal testing, with consistent data verification between the source and target Milvus instances, data consistency is now well-guaranteed even under high concurrency scenarios.
- Reduced frequent log output, which minimizes the impact on synchronization efficiency, and support for dynamic log level adjustment.
POST http://localhost:8444/cdc
Content-Type: application/json
{
"request_type":"maintenance",
"request_data": {
"operation": "set_log_level",
"params": {
"log_level": "debug"
}
}
}
Other
Comprehensive performance data is currently being tested internally, and we will continue to optimize based on the performance test results.
NOTE: Before using the tool, please carefully read the documentation at cdc uasge, as there are still many configuration details that are easy to get wrong. If you encounter any issues, feel free to raise an issue, and I will do my best to help resolve it.
Release v1.0.0
Milvus CDC (Change Data Capture) v1.0.0
Mar. 20, 2024
What's New
CDC stands for "Change Data Capture". The Milvus CDC tool is designed to capture changes made to upstream Milvus collections and sync (replicate) those changes to downstream Milvus instances.
This is the first release, and you can find detailed references in our GitHub repo.
FAQs
-
Can the CDC Tool Be Used with Any Milvus Version?
No, the Milvus CDC tool requires Milvus version 2.4.0 or above. -
Can the CDC Tool Fully Synchronize All Data?
Not currently. While the tool can synchronize collection-related data and interfaces, it does not yet support synchronizing aliases, users, roles, etc. For more details on what is supported, please refer to cdc usage. -
How to Use the CDC Tool?
Detailed usage documentation can be found at cdc usage.
We are thrilled to provide this powerful change data capture capability and look forward to continuously enhancing the Milvus CDC tool based on community feedback.
v0.0.1-test
Changelog
- 365f83d Add the task num limit
- 221ecc7 Fix drop action failure and null pointer issues
- bd19d94 Fix drop partition failure and wrong max task num (#12)
- 457e216 Fix to fail to get the collection name sometimes
- f5196dc Merge pull request #1 from zhuwenxing/main
- d718099 Merge pull request #2 from zhuwenxing/add_issue_template
- 3872f98 Merge pull request #6 from zhuwenxing/add_cdc_test_main
- 311ec3a Merge pull request #9 from SimFG/main
- 1fda2e0 Milvus cdc server and core lib, initial commit
- e87dcd8 ci: add issue template
- 479f4e3 ci: add release pipeline (#13)