From fa10679ddb958bf64fec5a1dc7b145b1f8e92272 Mon Sep 17 00:00:00 2001 From: Sam B <107863539+Sam-Bieberich@users.noreply.github.com> Date: Fri, 18 Oct 2024 19:23:33 -0500 Subject: [PATCH 1/3] Update README.md fixed container spelling --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e044e7a..2bef8fc 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Note that KubeShare 1.0 is deprecated. Refer to the [KubeShare 1.0 branch](https ## Prerequisite & Limitation -* A Kubernetes cluster with [garbage collection](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/), [DNS enabled](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) [nvidia-continaer-runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)installed. +* A Kubernetes cluster with [garbage collection](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/), [DNS enabled](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) [nvidia-container-runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)installed. * Only support a kubernetes cluster that uses the environment variable `NVIDIA_VISIBLE_DEVICES` to control which GPUs will be made accessible inside the container. * You also ensures that the *prometheus* is installed, because we will pull the data from it. * It can't compatible with other scheduler to manage gpu resource From d579c11d4f1c54dcb451fb93bf6e0ca337b2a928 Mon Sep 17 00:00:00 2001 From: Sam B <107863539+Sam-Bieberich@users.noreply.github.com> Date: Fri, 18 Oct 2024 19:24:00 -0500 Subject: [PATCH 2/3] Update README.md components spelling --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2bef8fc..58892e4 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Note that KubeShare 1.0 is deprecated. Refer to the [KubeShare 1.0 branch](https --> ## Deployment -1. [Deploy Componments](doc/deploy.md) +1. [Deploy Components](doc/deploy.md) ## Workloads From 7fc48899eb404ba98fded9c15609f88d1aaed135 Mon Sep 17 00:00:00 2001 From: Sam B <107863539+Sam-Bieberich@users.noreply.github.com> Date: Fri, 18 Oct 2024 20:08:21 -0500 Subject: [PATCH 3/3] Update deploy.md --- doc/deploy.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/deploy.md b/doc/deploy.md index 437ad0c..b77a2d0 100644 --- a/doc/deploy.md +++ b/doc/deploy.md @@ -135,8 +135,8 @@ kubectl apply -f deploy/aggregator.yaml kubectl apply -f deploy/collector.yaml ``` -+ **Make sure the enpoint of kubeshare-aggregator & kubeshare-collector of prometheus is up.** -+ Query the metric `gpu_capacity` , You will get the GPU specification ++ **Make sure the endpoint of kubeshare-aggregator & kubeshare-collector of prometheus is up.** ++ Query the metric `gpu_capacity`, You will get the GPU specification ``` gpu_capacity{endpoint="collector",index="0",instance="xxx.xxx.xxx.xxx:9004",job="kubeshare-collector",memory="34089730048",model="Tesla V100-PCIE-32GB",namespace="kube-system",node="ubuntu",pod="kubeshare-collector-wrrl6",service="kubeshare-collector",uuid="GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"}