This demo is adapted from the blog post Real-time Serving for XGBoost, Scikit-Learn RandomForest, LightGBM, and More and describes how to build and deploy predictive models using XGBoost and the Triton Inference Server on GPU accelerated servers.
- Make sure you have properly set up your server. See the docs for details. These notebooks were tested on the following configuration:
- NVIDIA RTX A6000 x 2
- Ubuntu 20.04 running on Linux x86
- CUDA Toolkit 11.6
- NVIDIA Container Toolkit 1.9.0
- Recommended recommended disk space: 64 GB
- Open ports: 8888, 8000, 8001, and 8002.
Create a shared volume that will be used by the model repository.
sudo docker volume create volume1
sudo mkdir -p /var/lib/docker/volumes/volume1/_data/model_repository
The PyTorch container (v22.03) on NGC has many pre-built libraries that makes doing data science easy. Mount the shared volume so you can save models to the model repository.
sudo docker pull nvcr.io/nvidia/pytorch:22.03-py3
sudo docker run --gpus=all -t -dt --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
--network host --mount source=volume1,destination=/workspace/volume1 \
--name pytorch nvcr.io/nvidia/pytorch:22.03-py3
Clone this repository onto the server so you will have easy access to the Jupyter Notebooks. Copy the pre-built
model into the model repository.
sudo docker exec -it pytorch /bin/bash
git clone https://github.com/nwstephens/triton-xgboost.git
cp -R triton-xgboost/data/pre-built/ volume1/model_repository/.
exit
The Triton container (v22.03) on NGC will serve models in the model repository.
sudo docker pull nvcr.io/nvidia/tritonserver:22.03-py3
sudo docker run --gpus=all -dt --network host \
-v /var/lib/docker/volumes/volume1/_data/model_repository:/models \
--name tritonserver nvcr.io/nvidia/tritonserver:22.03-py3 tritonserver --model-repository=/models
Verify Triton is running correctly. The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready.
curl -v localhost:8000/v2/health/ready
Check the Triton logs. You should see the pre-built model listed in the model repository. If your model is not displayed in the table check the path to the model repository and your CUDA drivers.
sudo docker logs tritonserver
If you shut down your server instance to save costs, you can start containers when you bring your server instance back online.
sudo docker start pytorch
sudo docker start tritonserver
Jupyter Lab is pre-installed on the PyTorch container. Note that the code below removes the security token. If you want to use a security token when you log into the server, you can remove the comment --NotebeookApp.token=''
.
sudo docker exec -it pytorch /bin/bash
nohup jupyter-lab --NotebookApp.token='' --no-browser --port=8888 &
exit
Open Jupyter Lab in a browser at http://<server-ip>:8888
. Make sure port 8888 is open. Open the XGBoost notebook and follow the instructions for building and deploying a model to Triton. Then open the Triton notebook and follow the instructions for submitting inference requests to Triton. As an optional exercise, you may want to use the Performance Analyzer on your model.