Skip to content

Commit

Permalink
Document Sync by Tina
Browse files Browse the repository at this point in the history
  • Loading branch information
Chivier committed Aug 9, 2024
1 parent 3d0b8d4 commit d05e017
Show file tree
Hide file tree
Showing 8 changed files with 20 additions and 32 deletions.
Binary file modified docs/images/serverlessllm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions docs/stable/cli/_category_.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{
"label": "ServerlessLLM CLI",
"position": 4,
"link": {
"type": "generated-index",
"description": "TODO"
}
"position": 4
}
6 changes: 1 addition & 5 deletions docs/stable/getting_started/_category_.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{
"label": "Getting Started",
"position": 2,
"link": {
"type": "generated-index",
"description": "TODO"
}
"position": 2
}
4 changes: 2 additions & 2 deletions docs/stable/getting_started/docker_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Ensure you have the following pre-requirements installed:
1. **GPUs**: Ensure you have at least 2 GPUs available. If more GPUs are provided, you can adjust the number of workers and the number of devices assigned to each worker.
2. **NVIDIA Docker Toolkit**: This allows Docker to use NVIDIA GPUs. You can find the installation guide [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

## Local Test Using Docker
## Run ServerlessLLM using Docker

First, let's start a local Docker-based ray cluster to test the ServerlessLLM.
First, let's start a local Docker-based ray cluster to run ServerlessLLM.

### Step 1: Build Docker Images

Expand Down
4 changes: 2 additions & 2 deletions docs/stable/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ sidebar_position: 1

This guide will help you get started with the basics of using ServerlessLLM. Please make sure you have installed the ServerlessLLM following the [installation guide](./installation.md).

## Local test
First, let's start a local ray cluster to test the ServerlessLLM. You can start a local ray cluster by running the following command:
## Run ServerlessLLM on your local machine
First, let's start a local ray cluster to run ServerlessLLM. You can start a local ray cluster by running the following command:

Start a local ray cluster with 1 head node and 1 worker node:
```bash
Expand Down
20 changes: 12 additions & 8 deletions docs/stable/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,28 @@ sidebar_position: 1

# Serverless LLM

<!-- logo (../images/serverlessllm.jpg) -->
![ServerlessLLM](../images/serverlessllm.jpg)
<!-- Scaled logo -->
<img src="../images/serverlessllm.jpg" alt="ServerlessLLM" width="256px">

ServerlessLLM is a fast, affordable, and easy-to-use library designed for multi-LLM serving, also known as [Serverless Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html), [Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated), or [Model Endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints?view=azureml-api-2). This library is ideal for environments with limited GPU resources, as it allows efficient dynamic loading of models onto GPUs. By supporting high levels of GPU multiplexing, it maximizes GPU utilization without the need to dedicate GPUs to individual models.
ServerlessLLM is a **fast** and **easy-to-use** serving system designed for **affordable** multi-LLM serving, also known as LLM-as-a-Service. ServerlessLLM is ideal for environments with multiple LLMs that need to be served on limited GPU resources, as it enables efficient dynamic loading of LLMs onto GPUs. By elastically scaling model instances and multiplexing GPUs, ServerlessLLM can significantly reduce costs compared to traditional GPU-dedicated serving systems while still providing low-latency (Time-to-First-Token, TTFT) LLM completions.

## Documentation

### Getting Started

- [Install ServerlessLLM](./getting_started/installation.md)
- [Deploy a ServerlessLLM cluster on your local machine](./getting_started/quickstart.md)
- [Deploy ServerlessLLM using Docker (Recommended)](./getting_started/docker_quickstart.md)
- [Deploy ServerlessLLM on a multi-machine cluster](./getting_started/multi_machine_setup.md)
- [Quickstart](./getting_started/quickstart.md)
- [Quickstart with Docker](./getting_started/docker_quickstart.md)
- [Multi-machine Quickstart](./getting_started/multi_machine_setup.md)

### ServerlessLLM Serve

- [Storage-Aware Scheduler](./serve/storage_aware_scheduling.md)

### ServerlessLLM Store

- [Use ServerlessLLM Store in your own code](./store/quickstart.md)
- [Quickstart](./store/quickstart.md)

### ServerlessLLM CLI

- [ServerlessLLM CLI Documentation](./cli/cli_api.md)
- [ServerlessLLM CLI API](./cli/cli_api.md)
6 changes: 1 addition & 5 deletions docs/stable/serve/_category_.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{
"label": "ServerlessLLM Serve",
"position": 3,
"link": {
"type": "generated-index",
"description": "TODO"
}
"position": 3
}
6 changes: 1 addition & 5 deletions docs/stable/store/_category_.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{
"label": "ServerlessLLM Store",
"position": 5,
"link": {
"type": "generated-index",
"description": "`sllm-store` is an internal library of ServerlessLLM that provides high-performance model loading from local storage into GPU memory. You can also install and use this library in your own projects."
}
"position": 5
}

0 comments on commit d05e017

Please sign in to comment.