Help mounting nvoptix.bin in containers #4269
-
Hi, After trying to upgrade our Bottlerocket AMIs, we realized OptiX was broken. The reason behind that is that, I supposed in recent OptiX versions, I updated However, when I start a container with the right environment variables (eg. The # Bottlerocket
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
[nvidia-container-cli]
root = "/"
path = "/usr/bin/nvidia-container-cli"
environment = []
ldconfig = "@/sbin/ldconfig" # AL2023
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false
[nvidia-ctk]
path = "nvidia-ctk" However I do not feel like this would be the issue. If I'm right, the There is a specific configuration property that can be set to enable debugging logs for the nvidia container runtime: debug = "/var/log/nvidia-container-runtime.log" Enabling that setting actually logs the files that are being mounted on AL2023. On Bottlerocket, it does nothing. I'm wondering maybe the binary is not allowed to write that path but I might be wrong. Basically my questions would be:
Thanks in advance for any help |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @emaincourt,
|
Beta Was this translation helpful? Give feedback.
-
Hello @emaincourt, the architecture that we followed in Bottlerocket to mount NVIDIA libraries and binaries from the host into the containers is now consider "legacy", and hence the difference between the configurations for Bottlerocket and AL2023. In the "legacy" stack, Once we migrate to our stack to the new NVIDIA Container Toolkit CDI support, the extra binaries that are "missing" in the containers should be available. We are still planning when the migration will occur, but contributions are welcome!. A potential workaround you can try is to force the mount in your pods through normal Kubernetes directives. |
Beta Was this translation helpful? Give feedback.
Hello @emaincourt, the architecture that we followed in Bottlerocket to mount NVIDIA libraries and binaries from the host into the containers is now consider "legacy", and hence the difference between the configurations for Bottlerocket and AL2023.
In the "legacy" stack,
libnvidia-container
was in charge of finding and mounting the libraries and binaries. However, that changed with the move to CDI and the newnvidia-container-toolkit
support. This is why you don't seenvoptix.bin
mounted in the containers, because thenvidia-container-toolkit
is now in charge of mounting some binaries when CDI is used.Once we migrate to our stack to the new NVIDIA Container Toolkit CDI support, the extra…