Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize one database instance for multiple classification jobs #190

Open
valentynbez opened this issue Apr 30, 2023 · 0 comments
Open

Utilize one database instance for multiple classification jobs #190

valentynbez opened this issue Apr 30, 2023 · 0 comments

Comments

@valentynbez
Copy link

Improvement Description
If the taxonomical classifier is loaded into memory, it should be possible to make it available to a few processes simultaneously. I assume the main culprit is GIL. This can be omitted through the use of ONNX library.

On the other hand, adding additional dependency comes with a cost, so the question is if the plugin utilizes a lot of scikit-learn functions, or if trained models can be delivered to users without it.
The hard way would be rewriting a custom prediction function in C/C++.

Current Behavior
joblib pickles the classifier and then starts multiple processes in parallel, increasing RAM usage x processes. Databases will grow in size, so it is something to take into consideration.

Proposed Behavior
Multiple classification processes running from a single instance are loaded into memory.

Priority
Low. Machines with a lot of resources are available to users.

References

  1. https://onnx.ai/sklearn-onnx/auto_tutorial/plot_abegin_convert_pipeline.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants