Document Sync by Tina

ServerlessLLM · Aug 9, 2024 · d05e017 · d05e017
1 parent 3d0b8d4
commit d05e017
Show file tree

Hide file tree

Showing 8 changed files with 20 additions and 32 deletions.
diff --git a/docs/images/serverlessllm.jpg b/docs/images/serverlessllm.jpg
diff --git a/docs/stable/cli/_category_.json b/docs/stable/cli/_category_.json
@@ -1,8 +1,4 @@
 {
   "label": "ServerlessLLM CLI",
-  "position": 4,
-  "link": {
-    "type": "generated-index",
-    "description": "TODO"
-  }
+  "position": 4
 }
diff --git a/docs/stable/getting_started/_category_.json b/docs/stable/getting_started/_category_.json
@@ -1,8 +1,4 @@
 {
   "label": "Getting Started",
-  "position": 2,
-  "link": {
-    "type": "generated-index",
-    "description": "TODO"
-  }
+  "position": 2
 }
diff --git a/docs/stable/getting_started/docker_quickstart.md b/docs/stable/getting_started/docker_quickstart.md
@@ -13,9 +13,9 @@ Ensure you have the following pre-requirements installed:
 1. **GPUs**: Ensure you have at least 2 GPUs available. If more GPUs are provided, you can adjust the number of workers and the number of devices assigned to each worker.
 2. **NVIDIA Docker Toolkit**: This allows Docker to use NVIDIA GPUs. You can find the installation guide [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
 
-## Local Test Using Docker
+## Run ServerlessLLM using Docker
 
-First, let's start a local Docker-based ray cluster to test the ServerlessLLM. 
+First, let's start a local Docker-based ray cluster to run ServerlessLLM. 
 
 ### Step 1: Build Docker Images
 

diff --git a/docs/stable/getting_started/quickstart.md b/docs/stable/getting_started/quickstart.md
@@ -6,8 +6,8 @@ sidebar_position: 1
 
 This guide will help you get started with the basics of using ServerlessLLM. Please make sure you have installed the ServerlessLLM following the [installation guide](./installation.md).
 
-## Local test
-First, let's start a local ray cluster to test the ServerlessLLM. You can start a local ray cluster by running the following command:
+## Run ServerlessLLM on your local machine
+First, let's start a local ray cluster to run ServerlessLLM. You can start a local ray cluster by running the following command:
 
 Start a local ray cluster with 1 head node and 1 worker node:
 ```bash

diff --git a/docs/stable/intro.md b/docs/stable/intro.md
@@ -4,24 +4,28 @@ sidebar_position: 1
 
 # Serverless LLM
 
-<!-- logo (../images/serverlessllm.jpg) -->
-![ServerlessLLM](../images/serverlessllm.jpg)
+<!-- Scaled logo -->
+<img src="../images/serverlessllm.jpg" alt="ServerlessLLM" width="256px">
 
-ServerlessLLM is a fast, affordable, and easy-to-use library designed for multi-LLM serving, also known as [Serverless Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html), [Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated), or [Model Endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints?view=azureml-api-2). This library is ideal for environments with limited GPU resources, as it allows efficient dynamic loading of models onto GPUs. By supporting high levels of GPU multiplexing, it maximizes GPU utilization without the need to dedicate GPUs to individual models.
+ServerlessLLM is a **fast** and **easy-to-use** serving system designed for **affordable** multi-LLM serving, also known as LLM-as-a-Service. ServerlessLLM is ideal for environments with multiple LLMs that need to be served on limited GPU resources, as it enables efficient dynamic loading of LLMs onto GPUs. By elastically scaling model instances and multiplexing GPUs, ServerlessLLM can significantly reduce costs compared to traditional GPU-dedicated serving systems while still providing low-latency (Time-to-First-Token, TTFT) LLM completions.
 
 ## Documentation
 
 ### Getting Started
 
 - [Install ServerlessLLM](./getting_started/installation.md)
-- [Deploy a ServerlessLLM cluster on your local machine](./getting_started/quickstart.md)
-- [Deploy ServerlessLLM using Docker (Recommended)](./getting_started/docker_quickstart.md)
-- [Deploy ServerlessLLM on a multi-machine cluster](./getting_started/multi_machine_setup.md)
+- [Quickstart](./getting_started/quickstart.md)
+- [Quickstart with Docker](./getting_started/docker_quickstart.md)
+- [Multi-machine Quickstart](./getting_started/multi_machine_setup.md)
+
+### ServerlessLLM Serve
+
+- [Storage-Aware Scheduler](./serve/storage_aware_scheduling.md)
 
 ### ServerlessLLM Store
 
-- [Use ServerlessLLM Store in your own code](./store/quickstart.md)
+- [Quickstart](./store/quickstart.md)
 
 ### ServerlessLLM CLI
 
-- [ServerlessLLM CLI Documentation](./cli/cli_api.md)
+- [ServerlessLLM CLI API](./cli/cli_api.md)
diff --git a/docs/stable/serve/_category_.json b/docs/stable/serve/_category_.json
@@ -1,8 +1,4 @@
 {
   "label": "ServerlessLLM Serve",
-  "position": 3,
-  "link": {
-    "type": "generated-index",
-    "description": "TODO"
-  }
+  "position": 3
 }
diff --git a/docs/stable/store/_category_.json b/docs/stable/store/_category_.json
@@ -1,8 +1,4 @@
 {
   "label": "ServerlessLLM Store",
-  "position": 5,
-  "link": {
-    "type": "generated-index",
-    "description": "`sllm-store` is an internal library of ServerlessLLM that provides high-performance model loading from local storage into GPU memory. You can also install and use this library in your own projects."
-  }
+  "position": 5
 }