NOTE: This repository is under integration with the official Torchserver repo and is subject to heavy changes in the future.
This repository contains an example to deploy models with third-party dependencies (like 🤗 Transformers, sparseml etc) on Torchserve servers as ready-for-usage Docker containers on cloud services like AWS.
For the context of this repository, we would deploy the models on an AWS t2.micro
instance which can be used for free (for 750 hours) on a new AWS account. We work with a 🤗 MobileViT Transformer model for the task of image classification by using its pipeline
feature, the handler code in scripts
can also be used as a simplistic template to deploy an 🤗 pipeline
.
This work can also be extended to deploy any 🤗 pipeline
for any supported task with Torchserve.
This work may also be extended to deploy the Torchserve Docker containers with HF models at scale with AWS Cloudformation & AWS EKS as explained in the official Torchserve repo & AWS Sagemaker, incorporating utilities like AWS ELB & Cloudwatch.
We would also benchmark the REST API calls in time units and compare the model performances for the following approaches:
- Deploying the MobileViT XX Small Huggingface model with a custom torchserve handler. (refer
HF-only
directory) - Deploying the MobileViT XX Small Huggingface model in scripted mode with a custom torchserve handler. (refer
HF-scripted
directory)
- Verify HF pipeline functionality with AWS EC2
t2.micro
- Add benchmarking scripts for throughput with Torchserve utilities.
- Add dynamic batching explanation.
- Integrate inference optimizations from 🤗 optimum library:
- ONNX Runtime inference
- ONNX Quantized Runtime inference
- Optional: Try ONNX-TensorRT integration (reference)
- Try LLM.int8 integration
- https://github.com/pytorch/serve
- https://github.com/huggingface/transformers
- https://github.com/huggingface/optimum
- https://huggingface.co/docs/transformers/main_classes/pipelines
- My Torchserve + AWS Notion journal
- https://huggingface.co/apple/mobilevit-xx-small
- https://huggingface.co/course/chapter2/2?fw=pt
- https://huggingface.co/docs/transformers/main_classes/pipelines
- https://github.com/aws-samples/amazon-sagemaker-endpoint-deployment-of-siamese-network-with-torchserve
- https://github.com/cceyda/lit-NER
- https://github.com/tescal2/TorchServeOnAWS
There are many ways to support an open-source work, ⭐ing it is one of them.
In case of bugs or queries, raise an Issue, or even better, raise a PR with fixes.