OpenVINO™ Model Server 2022.3.0.1
The 2022.3.0.1 version is a patch release for the OpenVINO Model Server. It includes a few bug fixes and enhancement in the C-API.
New Features
- Added to inference execution method OVMS_Inference in C API support for DAG pipelines. The parameter servableName can be both the model name or the pipeline name
- Added debug log in the AUTO plugin execution to report which physical device is used - AUTO plugin allocates the best available device for the model execution. For troubleshooting purposes, in the debug log level, the model server will report which device is used for each inference execution
- Allowed enabling metrics collection via CLI parameters while using the configuration file. Metrics collection can be configured in CLI parameters or in the configuration file. Enabling the metrics in CLI is not blocking any more the usage of configuration file to define multiple models for serving.
- Added client sample in Java to demonstrate KServe API usage .
- Added client sample in Go to demonstrate KServe API usage.
- Added client samples demonstrating asynchronous calls via KServe API.
- Added a demo showcasing OVMS with GPT-J-6b model from Hugging Face.
Bug fixes
- Fixed model server image building with NVIDIA plugin on a host with NVIDIA Container Toolkit installed.
- Fixed KServe API response to include the DAG pipeline name for the calls to DAG – based on the API definition, the response includes the servable name. In case of DAG processing, it will return now the pipeline name instead of an empty value.
- Default number of gRPC and REST workers will be calculated correctly based on allocated CPU cores – when the model server is started in the docker container with constrained CPU allocation, the default number of the frontend threads will be set more efficiently.
- Corrected reporting the number of streams in the metrics while using non-CPU plugins – before fixing that bug, a zero value was returned. That metric suggests the optimal number of active parallel inferences calls for the best throughput performance.
- Fixed handling model mapping with model reloads.
- Fixed handling model mapping with dynamic shape/batch size.
- ovmsclient is not causing conflicts with tensorflow-serving-api package installation in the same python environment.
- Fixed debug image building.
- Fixed C-API demo building.
- Added security fixes.
Other changes:
- Updated OpenCV version to 4.7 - opencv is an included dependence for image transformation in the custom nodes and for jpeg/png input decoding.
- Lengthened requests waiting timeout during DAG reloads. On slower machines during DAG configuration reload sporadically timeout was reached ending in unsuccessful request.
- ovmsclient has more relaxed requirements related to numpy version.
- Improved unit tests stability.
- Improved documentation.
You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3.0.1
docker pull openvino/model_server:2022.3.0.1-gpu
or use provided binary packages.