-
Promote
InferenceService
andClusterServingRuntime
/ServingRuntime
CRD from v1beta1 to v1- Improve
InferenceService
CRD for REST/gRPC protocol interface - Unify model storage spec and implementation between KServe and ModelMesh
- Add Status to
ServingRuntime
for both ModelMesh and KServe, surfaceServingRuntime
validation errors and deployment status - Deprecate
TrainedModel
CRD and useInferenceService
annotation to allow dynamic model updates as alternative option to storage initializer - Collocate transformer and predictor in the pod to reduce sidecar resources and networking latency
- Stablize
RawDeployment
mode with comprehensive testing for supported features
- Improve
-
All model formats to support v2 inference protocol including custom serving runtime
- TorchServe to support v2 gRPC inference protocol
- Support batching for v2 inference protocol
- Transformer and Explainer v2 inference protocol interoperability
- Improve codec for v2 inference protocol
Reference: Control plane issues, Data plane issues,Serving Runtime issues.
- Improve KServe Python SDK dependency management with Poetry
- Create standarized model packaging API
- Improve KServe model server observability with metrics and distruted tracing
- Support batch inference
Reference:Python SDK issues, Storage issues
- Support TorchServe ServingRuntime
- Add PVC support and unify storage implementation with KServe
- Add optional ingress for ModelMesh deployments
- Etcd secret security for multi-namespace mode
- Add estimated model size field
Reference: ModelMesh issues
- Improve
InferenceGraph
spec for replica and concurrency control - Allow setting resource limits per
InferenceGraph
- Support distributed tracing
- Support gRPC for
InferenceGraph
- Standalone
Transformer
support forInferenceGraph
- Support traffic mirroring node
- Support
RawDeployment
mode forInferenceGraph
Reference: InferenceGraph issues
- Document KServe ServiceMesh setup with mTLS
- Support programmatic authentication token
- Implement per service level auth
- Add support for SPIFFE/SPIRE identity integration with
InferenceService
Reference: Auth related issues
- Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
- Unify the data plane v1 and v2 page formats
- Improve v2 data plane docs to tell the story why and what changed
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for example documentation
- Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive