Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Unexpected error, message=<can't start new thread>) #2716

Open
1 task done
PapowFish opened this issue Jan 22, 2025 · 10 comments
Assignees
Labels
kind/bug Issues or changes related a bug needs-triage Issues needs triage

Comments

@PapowFish
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

hi,When I was using the #video-embedding pipeline# of Towhee to perform video feature extraction and insert it into the Milvus database, I encountered the "can't start new thread" error. This problem occurs after approximately 18 videos have been stored. I asked the Milvus developers for their opinion, and it seems that "Looks like the code created many MilvusClient and might never close any MilvusClient to release the connection resoures in operator_pool."
So, does this pipeline need additional settings?

Expected Behavior

I have approximately 10,000 videos and need to use the built-in model of Towhee for embedding, and then store them in Milvus.

Steps To Reproduce

My environment:
Package                      Version
---------------------------- -----------
absl-py                      2.1.0
annotated-types              0.7.0
astunparse                   1.6.3
av                           14.0.1
backports.tarfile            1.2.0
cachetools                   5.5.0
certifi                      2024.12.14
cffi                         1.17.1
chardet                      5.2.0
charset-normalizer           3.4.1
clang                        5.0
cryptography                 44.0.0
defusedxml                   0.7.1
docutils                     0.21.2
filelock                     3.16.1
filestools                   0.2.1
flatbuffers                  24.12.23
fsspec                       2024.12.0
gast                         0.4.0
google-auth                  2.37.0
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
grpcio                       1.67.1
h5py                         3.12.1
huggingface-hub              0.27.1
idna                         3.10
importlib_metadata           8.5.0
jaraco.classes               3.4.0
jaraco.context               6.0.1
jaraco.functools             4.1.0
jax                          0.4.30
jaxlib                       0.4.30
jeepney                      0.8.0
keras                        2.12.0
Keras-Preprocessing          1.1.2
keyring                      25.6.0
libclang                     18.1.1
lxml                         5.3.0
Markdown                     3.7
markdown-it-py               3.0.0
MarkupSafe                   3.0.2
mdurl                        0.1.2
milvus-lite                  2.4.11
ml-dtypes                    0.4.1
more-itertools               10.6.0
namex                        0.0.8
nh3                          0.2.20
numpy                        1.23.5
oauthlib                     3.2.2
opencv-python                4.10.0.84
opt-einsum                   3.3.0
optree                       0.13.1
packaging                    24.2
pandas                       2.2.3
pillow                       11.1.0
pip                          24.2
pkginfo                      1.12.0
plyvel                       1.5.1
protobuf                     4.25.5
pyasn1                       0.6.1
pyasn1_modules               0.4.1
pycparser                    2.22
pydantic                     2.10.5
pydantic_core                2.27.2
Pygments                     2.19.1
pymilvus                     2.5.2
pyperclip                    1.9.0
python-dateutil              2.9.0.post0
python-dotenv                1.0.1
pytz                         2024.2
PyYAML                       6.0.2
readme_renderer              44.0
regex                        2024.11.6
requests                     2.32.3
requests-oauthlib            2.0.0
requests-toolbelt            1.0.0
rfc3986                      2.0.0
rich                         13.9.4
rsa                          4.9
safetensors                  0.5.2
scipy                        1.13.1
SecretStorage                3.3.3
setuptools                   75.1.0
six                          1.15.0
tabulate                     0.9.0
tdqm                         0.0.1
tenacity                     9.0.0
tensorboard                  2.12.3
tensorboard-data-server      0.7.2
tensorboard-plugin-wit       1.8.1
tensorflow-estimator         2.12.0
tensorflow-io-gcs-filesystem 0.37.1
termcolor                    1.1.0
timm                         1.0.13
tokenizers                   0.19.1
torch                        1.8.1
torchvision                  0.9.1+cu111
towhee                       1.1.3
towhee.models                1.1.3
tqdm                         4.67.1
transformers                 4.44.2
twine                        6.0.1
typing_extensions            4.12.2
tzdata                       2024.2
ujson                        5.10.0
urllib3                      2.3.0
Werkzeug                     3.1.3
wheel                        0.44.0
wrapt                        1.12.1
zipp                         3.21.0

My code:
import os
from concurrent.futures import ThreadPoolExecutor
from towhee import AutoPipes, AutoConfig
from pathlib import Path
from tqdm import tqdm
import gc
import torch


emb_conf = AutoConfig.load_config('video_embedding')
emb_conf.collection='video_collection'
emb_conf.start_time = None
emb_conf.end_time = None
emb_conf.milvus_host = '10.56.4.24'
emb_conf.device = 1 # 0 gpu -1 cpu
emb_conf.model = 'isc'
emb_conf.leveldb_path = 'video_collection.db' 

emb_pipe = AutoPipes.pipeline('video_embedding', emb_conf)

video_path = ""
files = [os.path.join(video_path, f) for f in os.listdir(video_path)]

for i in tqdm(files):
    try:
        emb_pipe(i)
    except Exception as e:
        print(f"embedding {i} error: {e}")

Environment

- Towhee version(e.g. v0.1.3 or 8b23a93):v1.1.3
- OS(Ubuntu or CentOS):Ubuntu 
- CPU/Memory:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz/500G
- GPU:Tesla K80
- Others:

Anything else?

No response

@PapowFish PapowFish added kind/bug Issues or changes related a bug needs-triage Issues needs triage labels Jan 22, 2025
@junjiejiangjjj
Copy link
Contributor

hi @PapowFish , it looks like a server-side error. Is your Milvus service still working properly?

@PapowFish
Copy link
Author

hi @PapowFish , it looks like a server-side error. Is your Milvus service still working properly?

Thanks for your reply. Yes, I am running v2.5.3 Milvus standalone on another server and storing data through a remote connection. What exactly does "server-side error" mean? Can you explain it in more detail? Is it due to insufficient server system resources?

@junjiejiangjjj
Copy link
Contributor

This code will only create a milvus_client and a thread pool equal to the number of CPUs, which is unlikely to exhaust all threads. Check the maximum number of threads in the system.

cat /proc/sys/kernel/threads-max

@PapowFish
Copy link
Author

This code will only create a milvus_client and a thread pool equal to the number of CPUs, which is unlikely to exhaust all threads. Check the maximum number of threads in the system.

cat /proc/sys/kernel/threads-max

The machine running the code is 4126101, and the server side where Milvus is installed is 125443. Is it because I have too many videos?

@junjiejiangjjj
Copy link
Contributor

Use this command to see the number of threads of the process

 ps -T -p   PID  | wc -l

@PapowFish
Copy link
Author

Use this command to see the number of threads of the process

ps -T -p PID | wc -l

It seems to be 380 when the error occurs.

@junjiejiangjjj
Copy link
Contributor

Use this command to see the number of threads of the process
ps -T -p PID | wc -l

It seems to be 380 when the error occurs.

Milvus standalone process or towhee process?

@PapowFish
Copy link
Author

Use this command to see the number of threads of the process
ps -T -p PID | wc -l

It seems to be 380 when the error occurs.

Milvus standalone process or towhee process?

maybe towhee? It is the number of threads of the executed Python script, and it seems that the Milvus standalone does not start the process.

@junjiejiangjjj
Copy link
Contributor

I used your code to process 500 videos (Each video is approximately a few dozen seconds long), but your problem did not occur. At the same time, I used the command ps -T -p PID | wc -l to observe that the maximum number of threads can reach 1500+, but the program can still work normally.

@PapowFish
Copy link
Author

I used your code to process 500 videos (Each video is approximately a few dozen seconds long), but your problem did not occur. At the same time, I used the command ps -T -p PID | wc -l to observe that the maximum number of threads can reach 1500+, but the program can still work normally.

Thank you for your reply! It seems to be a hardware issue. After switching to another computer, it has been running smoothly. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug needs-triage Issues needs triage
Projects
None yet
Development

No branches or pull requests

3 participants