From 282efc1f4c853c4f9cafb6aaf76dac23853d5d49 Mon Sep 17 00:00:00 2001 From: Chivier Humber Date: Fri, 16 Aug 2024 08:34:25 +0000 Subject: [PATCH] Document Sync by Tina --- docs/stable/cli/cli_api.md | 41 +++- docs/stable/cli/sllm_cli_doc.md | 181 ------------------ .../getting_started/docker_quickstart.md | 8 +- 3 files changed, 41 insertions(+), 189 deletions(-) delete mode 100644 docs/stable/cli/sllm_cli_doc.md diff --git a/docs/stable/cli/cli_api.md b/docs/stable/cli/cli_api.md index 0f4e581..99288a2 100644 --- a/docs/stable/cli/cli_api.md +++ b/docs/stable/cli/cli_api.md @@ -43,7 +43,9 @@ After setting up the ServerlessLLM cluster, you can use the commands listed belo ``` ### sllm-cli deploy -Deploy a model using a configuration file or model name. +Deploy a model using a configuration file or model name, with options to overwrite default configurations. The configuration file requires minimal specifications, as sensible defaults are provided for advanced configuration options. + +For more details on the advanced configuration options and their default values, please refer to the [Example Configuration File](#example-configuration-file-configjson) section. ##### Usage ```bash @@ -52,18 +54,49 @@ sllm-cli deploy [OPTIONS] ##### Options - `--model ` - - Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models). + - Model name to deploy with default configuration. The model name must be a Hugging Face pretrained model name. You can find the list of available models [here](https://huggingface.co/models). - `--config ` - - Path to the JSON configuration file. + - Path to the JSON configuration file. The configuration file can be incomplete, and missing sections will be filled in by the default configuration. -##### Example +- `--backend ` + - Overwrite the backend in the default configuration. + +- `--num_gpus ` + - Overwrite the number of GPUs in the default configuration. + +- `--target ` + - Overwrite the target concurrency in the default configuration. + +- `--min_instances ` + - Overwrite the minimum instances in the default configuration. + +- `--max_instances ` + - Overwrite the maximum instances in the default configuration. + +##### Examples +Deploy using a model name with default configuration: ```bash sllm-cli deploy --model facebook/opt-1.3b +``` + +Deploy using a configuration file: +```bash sllm-cli deploy --config /path/to/config.json ``` +Deploy using a model name and overwrite the backend: +```bash +sllm-cli deploy --model facebook/opt-1.3b --backend transformers +``` + +Deploy using a model name and overwrite multiple configurations: +```bash +sllm-cli deploy --model facebook/opt-1.3b --num_gpus 2 --target 5 --min_instances 1 --max_instances 5 +``` + ##### Example Configuration File (`config.json`) +This file can be incomplete, and missing sections will be filled in by the default configuration: ```json { "model": "facebook/opt-1.3b", diff --git a/docs/stable/cli/sllm_cli_doc.md b/docs/stable/cli/sllm_cli_doc.md deleted file mode 100644 index 0f4e581..0000000 --- a/docs/stable/cli/sllm_cli_doc.md +++ /dev/null @@ -1,181 +0,0 @@ -## ServerlessLLM CLI Documentation - -### Overview -`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage. - -### Getting Started - -Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster: - -- [Installation Guide](../getting_started/installation.md) -- [Docker Quickstart Guide](../getting_started/docker_quickstart.md) -- [Quickstart Guide](../getting_started/quickstart.md) - -After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models. - -### Example Workflow - -1. **Deploy a Model** - > Deploy a model using the model name, which must be a huggingface pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b". - ```bash - sllm-cli deploy --model facebook/opt-1.3b - ``` - -2. **Generate Output** - ```bash - echo '{ - "model": "facebook/opt-1.3b", - "messages": [ - { - "role": "user", - "content": "Please introduce yourself." - } - ], - "temperature": 0.7, - "max_tokens": 50 - }' > input.json - sllm-cli generate input.json - ``` - -3. **Delete a Model** - ```bash - sllm-cli delete facebook/opt-1.3b - ``` - -### sllm-cli deploy -Deploy a model using a configuration file or model name. - -##### Usage -```bash -sllm-cli deploy [OPTIONS] -``` - -##### Options -- `--model ` - - Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models). - -- `--config ` - - Path to the JSON configuration file. - -##### Example -```bash -sllm-cli deploy --model facebook/opt-1.3b -sllm-cli deploy --config /path/to/config.json -``` - -##### Example Configuration File (`config.json`) -```json -{ - "model": "facebook/opt-1.3b", - "backend": "transformers", - "num_gpus": 1, - "auto_scaling_config": { - "metric": "concurrency", - "target": 1, - "min_instances": 0, - "max_instances": 10 - }, - "backend_config": { - "pretrained_model_name_or_path": "facebook/opt-1.3b", - "device_map": "auto", - "torch_dtype": "float16" - } -} -``` - -### sllm-cli delete -Delete deployed models by name. - -##### Usage -```bash -sllm-cli delete [MODELS] -``` - -##### Arguments -- `MODELS` - - Space-separated list of model names to delete. - -##### Example -```bash -sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2 -``` - -### sllm-cli generate -Generate outputs using the deployed model. - -##### Usage -```bash -sllm-cli generate [OPTIONS] -``` - -##### Options -- `-t`, `--threads ` - - Number of parallel generation processes. Default is 1. - -##### Arguments -- `input_path` - - Path to the JSON input file. - -##### Example -```bash -sllm-cli generate --threads 4 /path/to/request.json -``` - -##### Example Request File (`request.json`) -```json -{ - "model": "facebook/opt-1.3b", - "messages": [ - { - "role": "user", - "content": "Please introduce yourself." - } - ], - "temperature": 0.3, - "max_tokens": 50 -} -``` - -### sllm-cli replay -Replay requests based on workload and dataset. - -##### Usage -```bash -sllm-cli replay [OPTIONS] -``` - -##### Options -- `--workload ` - - Path to the JSON workload file. - -- `--dataset ` - - Path to the JSON dataset file. - -- `--output ` - - Path to the output JSON file for latency results. Default is `latency_results.json`. - -##### Example -```bash -sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json -``` - -#### sllm-cli update -Update a deployed model using a configuration file or model name. - -##### Usage -```bash -sllm-cli update [OPTIONS] -``` - -##### Options -- `--model ` - - Model name to update with default configuration. - -- `--config ` - - Path to the JSON configuration file. - -##### Example -```bash -sllm-cli update --model facebook/opt-1.3b -sllm-cli update --config /path/to/config.json -``` diff --git a/docs/stable/getting_started/docker_quickstart.md b/docs/stable/getting_started/docker_quickstart.md index 99e9821..0de0ae9 100644 --- a/docs/stable/getting_started/docker_quickstart.md +++ b/docs/stable/getting_started/docker_quickstart.md @@ -6,9 +6,9 @@ sidebar_position: 2 This guide will help you get started with the basics of using ServerlessLLM with Docker. Please make sure you have Docker installed on your system and have installed ServerlessLLM CLI following the [installation guide](./installation.md). -## Pre-requirements +## Pre-requisites -Ensure you have the following pre-requirements installed: +Ensure you have the following pre-requisites: 1. **GPUs**: Ensure you have at least 2 GPUs available. If more GPUs are provided, you can adjust the number of workers and the number of devices assigned to each worker. 2. **NVIDIA Docker Toolkit**: This allows Docker to use NVIDIA GPUs. You can find the installation guide [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). @@ -109,8 +109,8 @@ sllm-cli deploy --model facebook/opt-1.3b Expected output: ```plaintext -INFO xx-xx xx:xx:xx deploy.py:36] Deploying model facebook/opt-1.3b with default configuration. -INFO xx-xx xx:xx:xx deploy.py:49] Model registered successfully. +INFO 08-01 07:38:12 deploy.py:36] Deploying model facebook/opt-1.3b with default configuration. +INFO 08-01 07:39:00 deploy.py:49] Model registered successfully. ``` ### Step 6: Query the Model