Skip to content

Latest commit

 

History

History
46 lines (34 loc) · 3.78 KB

README.md

File metadata and controls

46 lines (34 loc) · 3.78 KB

NOTE: This repository is under integration with the official Torchserver repo and is subject to heavy changes in the future.

HF-torchserve-pipeline

This repository contains an example to deploy models with third-party dependencies (like 🤗 Transformers, sparseml etc) on Torchserve servers as ready-for-usage Docker containers on cloud services like AWS.

For the context of this repository, we would deploy the models on an AWS t2.micro instance which can be used for free (for 750 hours) on a new AWS account. We work with a 🤗 MobileViT Transformer model for the task of image classification by using its pipeline feature, the handler code in scripts can also be used as a simplistic template to deploy an 🤗 pipeline.

This work can also be extended to deploy any 🤗 pipeline for any supported task with Torchserve.

This work may also be extended to deploy the Torchserve Docker containers with HF models at scale with AWS Cloudformation & AWS EKS as explained in the official Torchserve repo & AWS Sagemaker, incorporating utilities like AWS ELB & Cloudwatch.

We would also benchmark the REST API calls in time units and compare the model performances for the following approaches:

Todo

  • Verify HF pipeline functionality with AWS EC2 t2.micro
  • Add benchmarking scripts for throughput with Torchserve utilities.
  • Add dynamic batching explanation.
  • Integrate inference optimizations from 🤗 optimum library:
  • Try LLM.int8 integration

References

Support

There are many ways to support an open-source work, ⭐ing it is one of them.

Issues

In case of bugs or queries, raise an Issue, or even better, raise a PR with fixes.