-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of NVIDIA drivers with issues #41
Comments
@ehfd could you help me, please? |
Upgrade your driver. Use the latest minor release of each major release if you are in the 535 or 550 branch. Versions earlier than 535.113.01 or 550.67 have bugs. |
I tried DP-O, DP-1, DP-2, DFP. |
VIDEO_PORT to DP-0 perhaps. "none" is not optimal. What's your environment? |
@ehfd Kubernetes cluster, nvidia-container-toolkit, NVIDIA device plugin, Talos Linux, RTX4090 with 535.86.05 nvidia driver. |
Similar issue with egl desktop. Perhaps an issue with driver 535. |
@ehfd Hmm, I don't have any issues with EGL desktop on 535. |
Mostly because we use Xvfb in EGL desktop variant, not Xorg. |
I reproduce the error... Immediate directive is NOT to upgrade to NVIDIA 535, yet. |
In NVIDIA 535.86.05 with
|
Works up to 530.41.03.
And in 525.60.13.
|
I've emailed the NVIDIA driver team. Waiting for response. |
TO OUR USERS: Please send an email to
|
@maxpain https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/27131 Could you (as well as everyone else affected) provide a nvidia-bug-report.log.gz after facing the error when running Xorg, either here or the NVIDIA forum post above? As many people as possible is good. |
NVIDIA has added this issue to their internal tracker. |
Good news: NVIDIA said they found the source of the issue and they will ship the fix in the next release. |
Maybe this issue was fixed in 535.129.03 and 545.29.02. |
It seems to be the case @bongole. I will check if all edge cases were addressed. |
@bongole What's the environment that made it work? Is it this container? |
I tested below command on bare metal Ubuntu-22.04 server with RTX 4060 Ti. docker run --gpus all -it --rm --tmpfs /dev/shm:rw -e SIZEW=1920 -e SIZEH=1080 -e REFRESH=60 -e DPI=96 -e CDEPTH=24 -e VIDEO_PORT=DFP -e PASSWD=mypasswd -e WEBRTC_ENCODER=nvh264enc -e BASIC_AUTH_PASSWORD=mypasswd -e ENABLE_HTTPS_WEB=true --network host ghcr.io/selkies-project/nvidia-glx-desktop:latest OS Info: $ uname -a
Linux gpu-server 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ nvidia-smi
Thu Nov 9 11:15:09 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 Off | N/A |
| 32% 29C P0 29W / 165W | 4MiB / 16380MiB | 3% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+ |
I cannot confirm on 535.129.03 because my testing node is currently broken. Information regarding this is appreciated. |
NVIDIA 550 drivers <= 550.5x have issues with Vulkan. Use 550.67 or higher. |
Hello. I'm trying to run this container in my home Kubernetes cluster on Talos Linux with RTX4090 GPU.
Nvidia driver: 535.86.05
The text was updated successfully, but these errors were encountered: