-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added docs for raw deployment autoscaling. #312
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for elastic-nobel-0aef7a ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: andyi2it The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Andrews Arokiam <[email protected]>
Signed-off-by: Andrews Arokiam <[email protected]>
serving.kserve.io/deploymentMode: RawDeployment | ||
serving.kserve.io/autoscalerClass: hpa | ||
serving.kserve.io/metric: cpu | ||
serving.kserve.io/targetUtilizationPercentage: "80" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the annotations for the old schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also document the possible supported metric type for RawDeployment mode
### HPA in Raw Deployment | ||
|
||
When using Kserve with the `RawDeployment` mode, Knative is not installed. In this mode, if you deploy an `InferenceService`, Kserve uses **Kubernetes’ Horizontal Pod Autoscaler (HPA)** for autoscaling instead of **Knative Pod Autoscaler (KPA)**. For more information about Kserve's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to refer to the official Knative autoscaler doc.
The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`. | ||
|
||
## Autoscaler for Kserve's Raw Deployment Mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth separate page for this, this doc is a bit too long.
"Fixes #303" Update Autoscaling docs for Raw deployment mode
Proposed Changes