Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: After the message queue component's pod is killed and recovered, all Milvus flush operations are timing out. #39197

Open
1 task done
zhuwenxing opened this issue Jan 13, 2025 · 2 comments
Assignees
Labels
feature/streaming node streaming node feature kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20250110-e5eb1159-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka/pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior


[2025-01-11T07:17:31.906Z] platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /usr/bin/python3

[2025-01-11T07:17:31.906Z] cachedir: .pytest_cache

[2025-01-11T07:17:31.906Z] metadata: {'Python': '3.10.12', 'Platform': 'Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'Faker': '19.2.0', 'allure-pytest': '2.7.0', 'assume': '2.4.3', 'cov': '2.8.1', 'forked': '1.6.0', 'html': '3.1.1', 'level': '0.1.1', 'metadata': '3.1.1', 'parallel': '0.1.1', 'print': '0.2.1', 'random-order': '1.1.1', 'repeat': '0.8.0', 'rerunfailures': '14.0', 'sugar': '0.9.5', 'tags': '1.8.1', 'timeout': '1.3.3', 'xdist': '2.5.0'}, 'CI': 'true', 'BUILD_NUMBER': '19115', 'BUILD_ID': '19115', 'BUILD_URL': 'https://qa-jenkins.milvus.io/job/chaos-test-kafka-cron/19115/', 'NODE_NAME': 'chaos-test-kafka-cron-19115-wszms-x9m3v-ng6b7', 'JOB_NAME': 'chaos-test-kafka-cron', 'BUILD_TAG': 'jenkins-chaos-test-kafka-cron-19115', 'EXECUTOR_NUMBER': '0', 'JENKINS_URL': 'https://qa-jenkins.milvus.io/', 'WORKSPACE': '/home/jenkins/agent/workspace', 'GIT_COMMIT': 'f2ce474eca96d53d42cff453ab752ce66719ccfe', 'GIT_URL': '[email protected]:zhuwenxing/test-jobs.git', 'GIT_BRANCH': 'origin/main'}

[2025-01-11T07:17:31.906Z] Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>

[2025-01-11T07:17:31.906Z] tags: all

[2025-01-11T07:17:31.906Z] rootdir: /home/jenkins/agent/workspace/tests/python_client

[2025-01-11T07:17:31.906Z] configfile: pytest.ini

[2025-01-11T07:17:31.906Z] plugins: asyncio-0.24.0, Faker-19.2.0, allure-pytest-2.7.0, assume-2.4.3, cov-2.8.1, forked-1.6.0, html-3.1.1, level-0.1.1, metadata-3.1.1, parallel-0.1.1, print-0.2.1, random-order-1.1.1, repeat-0.8.0, rerunfailures-14.0, sugar-0.9.5, tags-1.8.1, timeout-1.3.3, xdist-2.5.0

[2025-01-11T07:17:31.906Z] asyncio: mode=strict, default_loop_scope=function

[2025-01-11T07:17:31.906Z] collecting ... collected 2 items

[2025-01-11T07:17:31.906Z] 
selected 2 items

[2025-01-11T07:17:31.906Z] 

[2025-01-11T07:17:31.906Z] testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default[default] 

[2025-01-11T07:17:31.906Z] -------------------------------- live log setup --------------------------------

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: ################################################################################ (conftest.py:232)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:41)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:47)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: pymilvus version: 2.6.0rc44 (client_base.py:48)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:49)

[2025-01-11T07:17:31.906Z] -------------------------------- live log call ---------------------------------

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: server version: e5eb115 (client_base.py:166)

[2025-01-11T07:17:31.906Z] [2025-01-11 07:17:31 - INFO - ci_test]: all database: ['prod', 'default'] (test_data_persistence.py:26)

[2025-01-11T07:20:39.359Z] 2025-01-11 07:20:31,627 [WARNING][handler]: Retry timeout: 180s (decorators.py:106)

[2025-01-11T07:20:39.360Z] 2025-01-11 07:20:31,628 [ERROR][handler]: RPC error: [flush], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455233988598431755)>, <Time:{'RPC start': '2025-01-11 07:17:31.573124', 'RPC error': '2025-01-11 07:20:31.627967'}> (decorators.py:140)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - ERROR - ci_test]: Traceback (most recent call last):

[2025-01-11T07:20:39.360Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 22, in inner_wrapper

[2025-01-11T07:20:39.360Z]     res = func(*args, **_kwargs)

[2025-01-11T07:20:39.360Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 53, in api_request

[2025-01-11T07:20:39.360Z]     return func(*arg, **kwargs)

[2025-01-11T07:20:39.360Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py", line 318, in flush

[2025-01-11T07:20:39.360Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2025-01-11T07:20:39.360Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 141, in handler

[2025-01-11T07:20:39.360Z]     raise e from e

[2025-01-11T07:20:39.360Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 137, in handler

[2025-01-11T07:20:39.360Z]     return func(*args, **kwargs)

[2025-01-11T07:20:39.360Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 176, in handler

[2025-01-11T07:20:39.360Z]     return func(self, *args, **kwargs)

[2025-01-11T07:20:39.360Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 107, in handler

[2025-01-11T07:20:39.360Z]     raise MilvusException(

[2025-01-11T07:20:39.360Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455233988598431755)>

[2025-01-11T07:20:39.360Z]  (api_request.py:35)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455233988598431755)> (api_request.py:36)

[2025-01-11T07:20:39.360Z] FAILED

[2025-01-11T07:20:39.360Z] ------------------------------ live log teardown -------------------------------

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: *********************************** teardown *********************************** (test_data_persistence.py:15)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: [teardown_method] Start teardown test case test_milvus_default... (test_data_persistence.py:16)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: skip drop collection (test_data_persistence.py:18)

[2025-01-11T07:20:39.360Z] 

[2025-01-11T07:20:39.360Z] testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default[prod] 

[2025-01-11T07:20:39.360Z] -------------------------------- live log setup --------------------------------

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:47)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: pymilvus version: 2.6.0rc44 (client_base.py:48)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:49)

[2025-01-11T07:20:39.360Z] -------------------------------- live log call ---------------------------------

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: server version: e5eb115 (client_base.py:166)

[2025-01-11T07:20:39.360Z] [2025-01-11 07:20:31 - INFO - ci_test]: all database: ['default', 'prod'] (test_data_persistence.py:26)

[2025-01-11T07:23:45.771Z] 2025-01-11 07:23:32,179 [WARNING][handler]: Retry timeout: 180s (decorators.py:106)

[2025-01-11T07:23:45.771Z] 2025-01-11 07:23:32,179 [ERROR][handler]: RPC error: [flush], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455234035837042700)>, <Time:{'RPC start': '2025-01-11 07:20:31.790322', 'RPC error': '2025-01-11 07:23:32.179439'}> (decorators.py:140)

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - ERROR - ci_test]: Traceback (most recent call last):

[2025-01-11T07:23:45.771Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 22, in inner_wrapper

[2025-01-11T07:23:45.771Z]     res = func(*args, **_kwargs)

[2025-01-11T07:23:45.771Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 53, in api_request

[2025-01-11T07:23:45.771Z]     return func(*arg, **kwargs)

[2025-01-11T07:23:45.771Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py", line 318, in flush

[2025-01-11T07:23:45.771Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2025-01-11T07:23:45.771Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 141, in handler

[2025-01-11T07:23:45.771Z]     raise e from e

[2025-01-11T07:23:45.771Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 137, in handler

[2025-01-11T07:23:45.771Z]     return func(*args, **kwargs)

[2025-01-11T07:23:45.771Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 176, in handler

[2025-01-11T07:23:45.771Z]     return func(self, *args, **kwargs)

[2025-01-11T07:23:45.771Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 107, in handler

[2025-01-11T07:23:45.771Z]     raise MilvusException(

[2025-01-11T07:23:45.771Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455234035837042700)>

[2025-01-11T07:23:45.771Z]  (api_request.py:35)

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455234035837042700)> (api_request.py:36)

[2025-01-11T07:23:45.771Z] FAILED

[2025-01-11T07:23:45.771Z] ------------------------------ live log teardown -------------------------------

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - INFO - ci_test]: *********************************** teardown *********************************** (test_data_persistence.py:15)

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - INFO - ci_test]: [teardown_method] Start teardown test case test_milvus_default... (test_data_persistence.py:16)

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - INFO - ci_test]: skip drop collection (test_data_persistence.py:18)

[2025-01-11T07:23:45.771Z] [2025-01-11 07:23:32 - INFO - ci_test]: [teardown_class] Start teardown class... (client_base.py:44)

[2025-01-11T07:23:45.771Z] 

[2025-01-11T07:23:45.771Z] 

[2025-01-11T07:23:45.771Z] =================================== FAILURES ===================================

[2025-01-11T07:23:45.771Z] _______________ TestDataPersistence.test_milvus_default[default] _______________

[2025-01-11T07:23:45.771Z] 

[2025-01-11T07:23:45.771Z] self = <test_data_persistence.TestDataPersistence object at 0x7fcc028688b0>

[2025-01-11T07:23:45.771Z] db_name = 'default'

[2025-01-11T07:23:45.771Z] 

[2025-01-11T07:23:45.771Z]     @pytest.mark.tags(CaseLabel.L3)

[2025-01-11T07:23:45.771Z]     @pytest.mark.parametrize("db_name", ["default", "prod"])

[2025-01-11T07:23:45.771Z]     def test_milvus_default(self, db_name):

[2025-01-11T07:23:45.771Z]         self._connect()

[2025-01-11T07:23:45.771Z]         # create database if not exist

[2025-01-11T07:23:45.771Z]         dbs, _ = self.database_wrap.list_database()

[2025-01-11T07:23:45.771Z]         log.info(f"all database: {dbs}")

[2025-01-11T07:23:45.771Z]         if db_name not in dbs:

[2025-01-11T07:23:45.772Z]             log.info(f"create database {db_name}")

[2025-01-11T07:23:45.772Z]             self.database_wrap.create_database(db_name)

[2025-01-11T07:23:45.772Z]         self.database_wrap.using_database(db_name)

[2025-01-11T07:23:45.772Z]         # create collection

[2025-01-11T07:23:45.772Z]         name = "Hello_Milvus"

[2025-01-11T07:23:45.772Z]         t0 = time.time()

[2025-01-11T07:23:45.772Z]         collection_w = self.init_collection_wrap(name=name, active_trace=True)

[2025-01-11T07:23:45.772Z]         tt = time.time() - t0

[2025-01-11T07:23:45.772Z]         assert collection_w.name == name

[2025-01-11T07:23:45.772Z] >       entities = collection_w.num_entities

[2025-01-11T07:23:45.772Z] 

[2025-01-11T07:23:45.772Z] testcases/test_data_persistence.py:37: 

[2025-01-11T07:23:45.772Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-01-11T07:23:45.772Z] ../base/collection_wrapper.py:67: in num_entities

[2025-01-11T07:23:45.772Z]     self.flush()

[2025-01-11T07:23:45.772Z] ../utils/wrapper.py:18: in inner_wrapper

[2025-01-11T07:23:45.772Z]     res, result = func(*args, **kwargs)

[2025-01-11T07:23:45.772Z] ../base/collection_wrapper.py:164: in flush

[2025-01-11T07:23:45.772Z]     check_items, check, **kwargs).run()

[2025-01-11T07:23:45.772Z] ../check/func_check.py:45: in run

[2025-01-11T07:23:45.772Z]     result = self.assert_succ(self.succ, True)

[2025-01-11T07:23:45.772Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2025-01-11T07:23:45.772Z] 

[2025-01-11T07:23:45.772Z] self = <check.func_check.ResponseChecker object at 0x7fcc026df160>

[2025-01-11T07:23:45.772Z] actual = False, expect = True

[2025-01-11T07:23:45.772Z] 

[2025-01-11T07:23:45.772Z]     def assert_succ(self, actual, expect):

[2025-01-11T07:23:45.772Z] >       assert actual is expect, f"Response of API {self.func_name} expect {expect}, but got {actual}"

[2025-01-11T07:23:45.772Z] E       AssertionError: Response of API flush expect True, but got False

[2025-01-11T07:23:45.772Z] 

[2025-01-11T07:23:45.772Z] ../check/func_check.py:130: AssertionError

[2025-01-11T07:23:45.772Z] ------------------------------ Captured log setup ------------------------------

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: ################################################################################ (conftest.py:232)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:41)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:47)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: pymilvus version: 2.6.0rc44 (client_base.py:48)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:49)

[2025-01-11T07:23:45.772Z] ------------------------------ Captured log call -------------------------------

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.255.20.172', 'port': 19530} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : None  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: server version: e5eb115 (client_base.py:166)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [list_database] args: ['default', None], kwargs: {} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : ['prod', 'default']  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - INFO - ci_test]: all database: ['prod', 'default'] (test_data_persistence.py:26)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [using_database] args: ['default', 'default'], kwargs: {} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : None  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['int64', <DataType.INT64: 5>, ''], kwargs: {'is_primary': False, 'is_partition_key': False, 'nullable': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['varchar', <DataType.VARCHAR: 21>, ''], kwargs: {'max_length': 65535, 'is_primary': False, 'is_partition_key': False, 'nullable': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float_vector', <DataType.FLOAT_VECTOR: 101>, ''], kwargs: {'dim': 128, 'is_primary': False, 'nullable': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float', <DataType.FLOAT: 10>, ''], kwargs: {'is_primary': False, 'nullable': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['json_field', <DataType.JSON: 23>, ''], kwargs: {'is_primary': False, 'nullable': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'name': 'json_field', 'description': '', 'type': <DataType.JSON: 23>}  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [CollectionSchema] args: [[{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'json_field', 'description': '', 'type': <DataTyp......, kwargs: {'primary_field': 'int64', 'auto_id': False, 'enable_dynamic_field': False} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params......  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : True  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['Hello_Milvus', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARC......, kwargs: {'consistency_level': 'Strong'} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_response) : <Collection>:

[2025-01-11T07:23:45.772Z] -------------

[2025-01-11T07:23:45.772Z] <name>: Hello_Milvus

[2025-01-11T07:23:45.772Z] <description>: 

[2025-01-11T07:23:45.772Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'n......  (api_request.py:27)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:17:31 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:52)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:20:31 - ERROR - ci_test]: Traceback (most recent call last):

[2025-01-11T07:23:45.772Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 22, in inner_wrapper

[2025-01-11T07:23:45.772Z]     res = func(*args, **_kwargs)

[2025-01-11T07:23:45.772Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 53, in api_request

[2025-01-11T07:23:45.772Z]     return func(*arg, **kwargs)

[2025-01-11T07:23:45.772Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py", line 318, in flush

[2025-01-11T07:23:45.772Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2025-01-11T07:23:45.772Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 141, in handler

[2025-01-11T07:23:45.772Z]     raise e from e

[2025-01-11T07:23:45.772Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 137, in handler

[2025-01-11T07:23:45.772Z]     return func(*args, **kwargs)

[2025-01-11T07:23:45.772Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 176, in handler

[2025-01-11T07:23:45.772Z]     return func(self, *args, **kwargs)

[2025-01-11T07:23:45.772Z]   File "/usr/local/lib/python3.10/dist-packages/pymilvus/decorators.py", line 107, in handler

[2025-01-11T07:23:45.772Z]     raise MilvusException(

[2025-01-11T07:23:45.772Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455233988598431755)>

[2025-01-11T07:23:45.772Z]  (api_request.py:35)

[2025-01-11T07:23:45.772Z] [2025-01-11 07:20:31 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Hello_Milvus, flusht_ts: 455233988598431755)> (api_request.py:36)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

Both tests have enabled the streaming node.

kafka pod kill chaos test
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-cron/detail/chaos-test-kafka-cron/19115/pipeline
log:
artifacts-kafka-pod-kill-19115-server-logs.tar.gz

pod info

[2025-01-11T07:17:23.022Z] + grep kafka-pod-kill-19115

[2025-01-11T07:17:23.023Z] + kubectl get pods -o wide

[2025-01-11T07:17:23.023Z] kafka-pod-kill-19115-0                                           2/2     Running                  0                9m22s   10.104.32.181   4am-node39   <none>           <none>

[2025-01-11T07:17:23.023Z] kafka-pod-kill-19115-1                                           2/2     Running                  1 (9m12s ago)    9m22s   10.104.19.140   4am-node28   <none>           <none>

[2025-01-11T07:17:23.023Z] kafka-pod-kill-19115-2                                           2/2     Running                  1 (9m11s ago)    9m22s   10.104.30.35    4am-node38   <none>           <none>

[2025-01-11T07:17:23.023Z] kafka-pod-kill-19115-etcd-0                                      1/1     Running                  0                33m     10.104.19.107   4am-node28   <none>           <none>

[2025-01-11T07:17:23.023Z] kafka-pod-kill-19115-etcd-1                                      1/1     Running                  0                33m     10.104.32.163   4am-node39   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-etcd-2                                      1/1     Running                  0                33m     10.104.30.22    4am-node38   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-exporter-bf45b8f56-gszgk                    1/1     Running                  0                9m22s   10.104.9.16     4am-node14   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-datanode-647c98d9dc-g5prt            1/1     Running                  3 (32m ago)      33m     10.104.23.193   4am-node27   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-datanode-647c98d9dc-n7bm8            1/1     Running                  3 (32m ago)      33m     10.104.16.41    4am-node21   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-indexnode-5b858b848-j54p7            1/1     Running                  3 (32m ago)      33m     10.104.23.192   4am-node27   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-indexnode-5b858b848-rdskq            1/1     Running                  3 (32m ago)      33m     10.104.32.155   4am-node39   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-indexnode-5b858b848-w246g            1/1     Running                  3 (32m ago)      33m     10.104.27.140   4am-node31   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-mixcoord-76c4f99656-8x89h            1/1     Running                  3 (32m ago)      33m     10.104.26.73    4am-node32   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-proxy-8655b57684-m7cnw               1/1     Running                  3 (32m ago)      33m     10.104.27.139   4am-node31   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-querynode-7fbfb796f4-bmrnn           1/1     Running                  3 (32m ago)      33m     10.104.34.17    4am-node37   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-querynode-7fbfb796f4-djpks           1/1     Running                  3 (32m ago)      33m     10.104.14.7     4am-node18   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-querynode-7fbfb796f4-v9jnl           1/1     Running                  3 (32m ago)      33m     10.104.26.74    4am-node32   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-milvus-streamingnode-999cc87b4-ldm5z        1/1     Running                  3 (32m ago)      33m     10.104.23.191   4am-node27   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-minio-0                                     1/1     Running                  0                33m     10.104.32.164   4am-node39   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-minio-1                                     1/1     Running                  0                33m     10.104.19.106   4am-node28   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-minio-2                                     1/1     Running                  0                33m     10.104.15.248   4am-node20   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-minio-3                                     1/1     Running                  0                33m     10.104.30.23    4am-node38   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-zookeeper-0                                 1/1     Running                  0                33m     10.104.32.162   4am-node39   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-zookeeper-1                                 1/1     Running                  0                33m     10.104.19.110   4am-node28   <none>           <none>

[2025-01-11T07:17:23.024Z] kafka-pod-kill-19115-zookeeper-2                                 1/1     Running                  0                33m     10.104.15.249   4am-node20   <none>           <none>

pulsar pod kill chaos test
faild job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/20120/pipeline
log:

artifacts-pulsar-pod-failure-20120-server-logs.tar.gz

Pulsar has a particular issue: after pod is killed, 2 out of 3 bookies fail to restart.

pod info

[2025-01-10T09:15:33.354Z] + kubectl get pods -o wide

[2025-01-10T09:15:33.355Z] + grep pulsar-pod-failure-20120

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-etcd-0                                   1/1     Running                  0                34m     10.104.32.46    4am-node39   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-etcd-1                                   1/1     Running                  0                34m     10.104.19.203   4am-node28   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-etcd-2                                   1/1     Running                  0                34m     10.104.26.119   4am-node32   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-datanode-bd87c6f86-fbpfv          1/1     Running                  2 (33m ago)      34m     10.104.30.151   4am-node38   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-datanode-bd87c6f86-nrw4k          1/1     Running                  2 (33m ago)      34m     10.104.33.236   4am-node36   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-indexnode-86c56b45bc-4cm8f        1/1     Running                  2 (33m ago)      34m     10.104.34.145   4am-node37   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-indexnode-86c56b45bc-7chwr        1/1     Running                  2 (33m ago)      34m     10.104.14.147   4am-node18   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-indexnode-86c56b45bc-sxvrq        1/1     Running                  2 (33m ago)      34m     10.104.23.10    4am-node27   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-mixcoord-5475cb7979-kvlq9         1/1     Running                  2 (33m ago)      34m     10.104.23.9     4am-node27   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-proxy-77c6d556df-x5px9            1/1     Running                  2 (33m ago)      34m     10.104.33.235   4am-node36   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-querynode-69959fd6fc-cq6ww        1/1     Running                  2 (33m ago)      34m     10.104.15.101   4am-node20   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-querynode-69959fd6fc-dvh2x        1/1     Running                  2 (33m ago)      34m     10.104.33.237   4am-node36   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-querynode-69959fd6fc-gjrnh        1/1     Running                  2 (33m ago)      34m     10.104.16.9     4am-node21   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-milvus-streamingnode-548f5d78f5-vqwdd    1/1     Running                  2 (33m ago)      34m     10.104.23.11    4am-node27   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-minio-0                                  1/1     Running                  0                34m     10.104.32.50    4am-node39   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-minio-1                                  1/1     Running                  0                34m     10.104.19.206   4am-node28   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-minio-2                                  1/1     Running                  0                34m     10.104.26.114   4am-node32   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-minio-3                                  1/1     Running                  0                34m     10.104.24.129   4am-node29   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-pulsarv3-bookie-0                        1/1     Running                  8 (10m ago)      34m     10.104.19.204   4am-node28   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-pulsarv3-bookie-1                        0/1     Running                  8 (10m ago)      34m     10.104.32.51    4am-node39   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-pulsarv3-bookie-2                        0/1     Running                  8 (10m ago)      34m     10.104.26.120   4am-node32   <none>           <none>

[2025-01-10T09:15:33.615Z] pulsar-pod-failure-20120-pulsarv3-bookie-init-r4k45               0/1     Completed                0                34m     10.104.19.197   4am-node28   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-broker-0                        1/1     Running                  8 (9m55s ago)    34m     10.104.6.41     4am-node13   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-broker-1                        1/1     Running                  8 (10m ago)      34m     10.104.13.144   4am-node16   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-proxy-0                         1/1     Running                  8 (10m ago)      34m     10.104.32.42    4am-node39   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-proxy-1                         1/1     Running                  8 (10m ago)      34m     10.104.13.140   4am-node16   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-pulsar-init-5rd6x               0/1     Completed                0                34m     10.104.6.40     4am-node13   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-recovery-0                      1/1     Running                  8 (10m ago)      34m     10.104.13.143   4am-node16   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-zookeeper-0                     1/1     Running                  8 (10m ago)      34m     10.104.19.205   4am-node28   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-zookeeper-1                     1/1     Running                  8 (10m ago)      34m     10.104.32.47    4am-node39   <none>           <none>

[2025-01-10T09:15:33.616Z] pulsar-pod-failure-20120-pulsarv3-zookeeper-2                     1/1     Running                  8 (10m ago)      34m     10.104.26.113   4am-node32   <none>           <none>

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2025
@zhuwenxing
Copy link
Contributor Author

/assign @chyezh

PTAL

@zhuwenxing
Copy link
Contributor Author

The Kafka pod kill chaos test passes when the streaming node is not enabled.
https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-cron/detail/chaos-test-kafka-cron/19093/pipeline

image

@chyezh chyezh added the feature/streaming node streaming node feature label Jan 13, 2025
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2025
@yanliang567 yanliang567 removed their assignment Jan 13, 2025
@yanliang567 yanliang567 added this to the 2.6.0 milestone Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/streaming node streaming node feature kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants