Skip to content

Commit

Permalink
updated CO pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Alfrick committed Jan 10, 2025
1 parent b0e0607 commit 0a46995
Show file tree
Hide file tree
Showing 10 changed files with 134 additions and 63 deletions.
18 changes: 14 additions & 4 deletions docs/portal-guide/compute-orchestration/README.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,19 @@ If you’re not using Compute Orchestration, the Shared SaaS (Serverless) deploy

## Compute Clusters and Nodepools

We use clusters and nodepools to organize and manage the compute resources required for the Compute Orchestration capabilities.
We use [clusters and nodepools](https://docs.clarifai.com/portal-guide/compute-orchestration/set-up-compute) to organize and manage the compute resources required for the Compute Orchestration capabilities.

A compute cluster in Clarifai acts as the overarching computational environment where models are executed, whether for training or inference. A nodepool refers to a set of dedicated nodes (virtual machine instances) within a cluster that share similar configurations and resources, such as CPU or GPU type, memory size, and other performance parameters.
:::tip Cluster

A compute cluster in Clarifai acts as the overarching computational environment where models are executed, whether for training or inference.

:::

:::tip nodepool

A nodepool refers to a set of dedicated nodes (virtual machine instances) within a cluster that share similar configurations and resources, such as CPU or GPU type, memory size, and other performance parameters.

:::

Cluster configuration lets you specify where and how your models are run, ensuring better performance, lower latency, and adherence to regional regulations. You can specify a cloud provider, such as AWS, that will provide the underlying compute infrastructure for hosting and running your models. You can also specify the geographic location of the data center where the compute resources will be hosted.

Expand All @@ -65,7 +75,7 @@ With compute orchestration, you can ensure these nodepools are properly scaled u

- It provides access to a wide range of accelerator options tailored to your use case. You can configure multiple compute clusters each tailored to your AI development stage, performance requirements, and budget. You can also run affordable proof of concepts or compute-heavy LLMs or LVMs in production all from a single product.

- It offers flexibility to make deployments in any cloud service provider, on-premises, or air-gapped environment, allowing users to leverage their hardware of choice without being locked into a single vendor. Or, you can make deployments in Clarifai’s compute to avoid having to worry about managing infrastructure.
- It offers flexibility to [make deployments](https://docs.clarifai.com/portal-guide/compute-orchestration/deploy-model) in any cloud service provider, on-premises, or air-gapped environment, allowing users to leverage their hardware of choice without being locked into a single vendor. Or, you can make deployments in Clarifai’s compute to avoid having to worry about managing infrastructure.

- You can customize auto-scaling settings to prevent cold-start issues and handle traffic swings; and scale down to zero for cost efficiency. The ability to scale from zero to infinity ensures both flexibility and cost management.

Expand Down Expand Up @@ -101,7 +111,7 @@ With compute orchestration, you can ensure these nodepools are properly scaled u

- Low-latency deployment minimizes gRPC hops, speeding up communication.

- New model types are easily supported with a unified protobuf format, and local inference runners allow users to test models before deploying to the cloud.
- New model types are easily supported with a unified protobuf format, and local inference runners allow users to [test models](https://docs.clarifai.com/sdk/advance-model-operations/model-upload#step-4-test-the-model-locally) before deploying to the cloud.


import DocCardList from '@theme/DocCardList';
Expand Down
4 changes: 2 additions & 2 deletions docs/portal-guide/compute-orchestration/deploy-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ To deploy a model, navigate to your cluster or nodepool page and click the **Dep

![ ](/img/compute-orchestration/compute-11.png)

Alternatively, navigate to your model's page, go to the **Deployments** tab, and click the **Deploy a Model** button.
> Alternatively, navigate to your model's page, go to the **Deployments** tab, and click the **Deploy a Model** button.
![ ](/img/compute-orchestration/compute-12.png)
> ![ ](/img/compute-orchestration/compute-12.png)
You’ll be redirected to a page where you can customize the compute options for deploying your model.

Expand Down
104 changes: 104 additions & 0 deletions docs/portal-guide/compute-orchestration/instance-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
description: Learn about the instance types we support
sidebar_position: 4
pagination_next: null
---

# Instance Types

**Learn about the instance types we support**
<hr />

We offer a range of Amazon Web Services (AWS) instance types, designed to handle a variety of machine learning workloads. These instances vary in their CPU, RAM (Random Access Memory), and GPU configurations, which allow you to orchestrate the right balance of performance and cost for your use case.

:::info

You can [contact us](https://www.clarifai.com/explore/contact-us) to unlock access to our most powerful instances, including NVIDIA A100 and H100 GPUs.

:::

## t3a Instances

The `t3a` series is designed for cost-effective, general-purpose workloads that do not require GPU acceleration. It offers a balanced combination of CPU and memory, making it ideal for lightweight applications.

|Instance Type |GPUs | Total GPU RAM | CPU | RAM |
|--------------------|----------------|----------------|---------------|---------------|
| `t3a.medium` | - | - | 2x CPU | 4GiB |
| `t3a.large` | - | - | 2x CPU | 8GiB |
| `t3a.xlarge` | - | - | 4x CPU | 16GiB |
| `t3a.2xlarge` | - | - | 8x CPU | 32GiB |

**Key Features**

- vCPUs (virtual CPUs) — Burstable performance for intermittent, compute-heavy tasks. Ideal for CPU-intensive operations like running traditional models or pre-processing pipelines. For example, `t3a.medium` offers two vCPUs, while `t3a.2xlarge` offers eight vCPUs.

- RAM — Determines the capacity for handling data in memory. It ranges from 4 GiB to 32 GiB, allowing you to handle lightweight, data-intensive workloads without requiring GPU acceleration.

**Example Use Case**

- Running simple models for classification tasks.

## g4dn Instances

The `g4dn` series is designed for moderate GPU-accelerated workloads, making it suitable for small-to-medium-scale machine learning tasks.

|Instance Type |GPUs | Total GPU RAM | CPU | RAM |
|--------------------|----------------|----------------|---------------|---------------|
| `g4dn.xlarge` | 1x T4 | 16GiB | 4x CPU | 16GiB |

**Key Features**

- NVIDIA T4 GPUs — Optimized for inference and light model training, offering a balance of performance and cost.

- vCPUs and RAM — Includes four vCPUs and 16 GiB of RAM for data processing and workload orchestration.

**Example Use Cases**

- Inference workloads, such as running NLP models like BERT-base for text summarization and question answering.

- Fine-tuning pre-trained models for specific tasks like object detection or sentiment analysis.

## g5 Instances

The `g5` series delivers enhanced GPU capabilities and is designed for tasks requiring higher memory and computational power, such as large-scale deep learning model training.

|Instance Type |GPUs | Total GPU RAM | CPU | RAM |
|--------------------|----------------|----------------|---------------|---------------|
| `g5.xlarge` | 1x A10G | 24GiB | 4x CPU | 16GiB |
| `g5.2xlarge` | 1x A10G | 24GiB | 8x CPU | 32GiB |

**Key Features**

- NVIDIA A10G GPUs — High memory bandwidth and compute power for complex deep learning models and advanced workloads.

- vCPUs and RAM — Increased CPU and memory for tasks involving heavy data processing alongside GPU computation.

**Example Use Cases**

- Training mid-sized NLP models like GPT-2 or T5 for text generation, or training image segmentation models like UNet or Mask R-CNN for medical imaging.

- Running object tracking or pose estimation workflows in real-time video analysis.

## g6 Instances

The `g6` series offers next-generation GPU technologies and is designed for the most demanding machine learning workloads, including large-scale model training and high-performance simulations. Each instance type in the `g6` series is tailored to specific workloads.

|Instance Type |GPUs | Total GPU RAM | CPU | RAM |
|--------------------|----------------|----------------|---------------|---------------|
| `g6.xlarge` | 1x L4 | 24GiB | 4x CPU | 16GiB |
| `g6.2xlarge` | 1x L4 | 24GiB | 8x CPU | 32GiB |
| `g6e.xlarge` | 1x L40S | 48GiB | 4x CPU | 32GiB |
| `g6e.12xlarge` | 4x L40S | 192GiB | 48x CPU | 384GiB |

**Key Features**

- Next-Gen GPUs — NVIDIA L4 and L40S GPUs deliver exceptional performance for training and inference tasks, with GPU memory scaling from 24 GiB to 192 GiB.

- High vCPU & RAM Configurations — Ideal for handling massive datasets and parallel processing for complex workflows.

**Example Use Cases**

- The `g6.xlarge` and `g6.2xlarge` instances support mid-tier workloads, such as fine-tuning the BERT-large model or running computer vision tasks like text-to-image generation.

- The `g6e.xlarge` and `g6e.12xlarge` instances support high-end workloads, such as training large-scale language models like GPT-4 or T5-XL for multi-modal tasks.

1 change: 0 additions & 1 deletion docs/portal-guide/compute-orchestration/manage.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
description: Edit and delete your clusters and nodepools
sidebar_position: 3
pagination_next: null
---

# Managing Your Compute
Expand Down
70 changes: 14 additions & 56 deletions docs/portal-guide/compute-orchestration/set-up-compute.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ After creating a custom cluster, you can configure nodepools within it to optimi

For example, you may create a nodepool for GPU-intensive tasks and another for lighter workloads running on CPUs.

With clusters and nodepools, you can organize and manage the compute resources necessary for running your models and workflows.
With clusters and nodepools, you can organize and manage (_orchestrate_) the compute resources necessary for running your models and workflows.

## Default Cluster and Nodepool

Expand All @@ -40,13 +40,16 @@ For the Compute Orchestration Public Preview, the default shared cluster is avai

You can create a custom cluster with custom nodepools that match your computational needs.

To get started, click your profile icon in the upper-right corner and select the **Compute** option in the list that drops down.
To get started, log in to the Clarifai platform and select the **Compute Orchestration** option in the top navigation bar.

![ ](/img/compute-orchestration/compute-1.png)

Alternatively, you can click the **Compute Settings** button found in the **Deployments** tab on a model's viewer page or anywhere this button appears in the platform.
<br/>
<br/>

![ ](/img/compute-orchestration/compute-1-1.png)
> Alternatively, you can click the **Compute Settings** button found in the **Deployments** tab on a model's viewer page or anywhere this button appears in the platform.
> ![ ](/img/compute-orchestration/compute-1-1.png)
You’ll be redirected to the Compute Orchestration pane. Then, click the **Create a new cluster** button.

Expand Down Expand Up @@ -75,25 +78,23 @@ After configuring the settings, click the **Continue** button in the upper-right

After clicking the **Continue** button upon creating a cluster, you’ll be redirected to a page where you can specify the configurations for your new nodepool.

Alternatively, you can create a new nodepool from an existing cluster by clicking the **Create a new nodepool** button in the upper-right corner of the cluster's page.
> Alternatively, you can create a new nodepool from an existing cluster by clicking the **Create a new nodepool** button in the upper-right corner of the cluster's page.
![ ](/img/compute-orchestration/compute-7.png)
> ![ ](/img/compute-orchestration/compute-7.png)
These are the configurations options you can set for your new nodepool:

![ ](/img/compute-orchestration/compute-8.png)

- **Nodepool ID** — Provide an ID that helps identify the nodepool to use when deploying your models. We recommend an easy-to-remember ID that’s related to the nodepool’s use case.

- **Nodepool Description** — Optionally, provide a short description that summarizes the details related to the nodepool.
- **Instance Configuration** — Provide an ID that helps identify the nodepool to use when deploying your models. We recommend an easy-to-remember ID that’s related to the nodepool’s use case. Optionally, provide a short description that summarizes the details related to the nodepool.

- **Node autoscaling range** — Specify the minimum and maximum number of nodes that the system can automatically scale within a nodepool, based on the workload demand. This means that the system will spin up more nodes to handle increased traffic or scale down when demand decreases to optimize costs. For instance, you can set your nodepool to scale between 1 and 5 nodes, depending on how many requests your model is processing. A minimum value of 1 (rather than 0) prevents cold start delays after inactivity, which is essential for meeting latency requirements, though it ensures that at least one node will always be running, which incurs compute costs. Alternatively, setting the minimum to 0 eliminates costs during idle periods but may introduce cold start delays when traffic resumes.
- **Node Autoscaling Range** — Specify the minimum and maximum number of nodes that the system can automatically scale within a nodepool, based on the workload demand. This means that the system will spin up more nodes to handle increased traffic or scale down when demand decreases to optimize costs. For instance, you can set your nodepool to scale between 1 and 5 nodes, depending on how many requests your model is processing. A minimum value of 1 (rather than 0) prevents cold start delays after inactivity, which is essential for meeting latency requirements, though it ensures that at least one node will always be running, which incurs compute costs. Alternatively, setting the minimum to 0 eliminates costs during idle periods but may introduce cold start delays when traffic resumes.

- **Instance Type** — Select the instance type you would like the deployment to run on. You can find an explanation of the available instance types [below](#instance-types).
- **Instance Type** — Select the instance type you would like the deployment to run on. You can find an explanation of the available instance types [here](https://docs.clarifai.com/portal-guide/compute-orchestration/instance-types).

- **Spot instances** (default is off) — Enable this option if you want to rent spare, unused compute capacity at significantly lower prices compared to regular on-demand instances. If no spot instances are available, Clarifai will automatically fall back to on-demand instances. Note that spot instances can be terminated if capacity is needed elsewhere, making your node temporarily unavailable. For greater reliability, leave this option unchecked to use only on-demand instances.
- **Spot Instances** (_default is off_) — Enable this option if you want to rent spare, unused compute capacity at significantly lower prices compared to regular on-demand instances. If no spot instances are available, Clarifai will automatically fall back to on-demand instances. Note that spot instances can be terminated if capacity is needed elsewhere, making your node temporarily unavailable. For greater reliability, leave this option unchecked to use only on-demand instances.

After configuring the settings, click the **Create Nodepool** button in the upper-right corner. You'll then be redirected to your cluster's page, where the newly created nodepool will be listed in a table.
After configuring the settings, click the **Create** button in the upper-right corner. You'll then be redirected to your cluster's page, where the newly created nodepool will be listed in a table.

![ ](/img/compute-orchestration/compute-9.png)

Expand All @@ -106,46 +107,3 @@ If you click on a nodepool listed in the table, you'll be taken to its individua
After creating a nodepool, you can use it to deploy a model.

[Click here](https://docs.clarifai.com/portal-guide/compute-orchestration/deploy-model) to learn more about deploying a model using a nodepool.

### Instance Types

We offer a range of Amazon Web Services (AWS) instance types, each tailored to specific workloads. These instances vary in their CPU, RAM (Random Access Memory), and GPU configurations, which allow you to orchestrate the right balance of performance and cost for your use case.

:::info

You can contact us to unlock access to our most powerful instances, including NVIDIA A100 and H100 GPUs.

:::

#### t3a Instances (t3a.small, t3a.medium, t3a.large, etc.)

The `t3a` series consists of general-purpose Amazon EC2 burstable performance instances. These are ideal for low- to moderate-compute tasks that don’t require GPU acceleration, making them cost-effective for various applications. These instances are powered by AMD EPYC processors.

- **vCPUs (virtual CPUs)** — These instances can burst to higher levels of CPU performance when needed. For example, `t3a.small` has 2 vCPUs, while `t3a.2xlarge` offers 8 vCPUs, providing better parallel processing for CPU-intensive workloads.
- **RAM** — RAM determines the capacity for handling data in memory. For `t3a` instances, RAM ranges from 2 GiB (in `t3a.small`) to 32 GiB (in `t3a.2xlarge`), allowing the instances to support more memory-heavy workloads as needed.

These instances are best suited for web servers, development environments, and applications with variable workloads that occasionally require more CPU power.

#### g4dn Instances (g4dn.xlarge)

The `g4dn` instances are designed for tasks requiring GPU acceleration, such as machine learning inference, graphic rendering, and video transcoding.

- **GPU** — These instances are equipped with NVIDIA T4 GPUs, which offer excellent performance for inferencing tasks on trained machine learning models and graphics processing.
- **vCPUs and RAM** — The `g4dn.xlarge` instance provides 4 vCPUs and 16 GiB of RAM, along with 16 GiB of GPU memory, which is critical for handling high-volume workloads that rely on both CPU and GPU power.

#### g5 Instances (g5.xlarge, g5.2xlarge)

The `g5` instances are next-generation GPU instances designed for high-performance tasks, such as large-scale deep learning model training and data processing. They are equipped with NVIDIA A10G GPUs.

- **vCPUs and RAM**`g5.xlarge` includes 4 vCPUs and 16 GiB of RAM, while `g5.2xlarge` offers 8 vCPUs and 32 GiB of RAM. Both have 24 GiB of GPU memory, which makes them suitable for memory-intensive workloads and high-fidelity model training.

These instances are optimized for tasks that require a high degree of parallel processing and large amounts of data, such as video analytics, 3D rendering, and scientific computing.

#### g6 Instances (g6.xlarge)

The `g6` series represents the latest in high-performance GPU computing, optimized for demanding applications such as AI model training, large-scale simulations, and data analytics.

- **vCPUs and RAM** — The `g6.xlarge` instance includes 4 vCPUs, 16 GiB of RAM, and a next-gen NVIDIA L4 GPU.
- **GPU Memory** — It offers 24 GiB of GPU memory, which provides ample resources for running advanced deep learning models and other compute-heavy tasks.

These instances are ideal for applications that require cutting-edge GPU performance, such as autonomous driving simulations, deep learning model training, and large-scale data processing.
Binary file modified static/img/compute-orchestration/compute-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/compute-orchestration/compute-10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/compute-orchestration/compute-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/compute-orchestration/compute-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/compute-orchestration/compute-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0a46995

Please sign in to comment.