You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Enable_devices_cg = "YES" enables hide of GPU devices that are not reserved in the current job.
But the feature doesn't seem to work for the first job just after a reboot of the node. The next jobs are ok.
Tested with Debian 9.13 nodes, V100 and A100 GPUS, rebooted several times, the problem is reproducible
The text was updated successfully, but these errors were encountered:
Workaround: running /usr/bin/nvidia-smi -L || exit 5 from the /etc/default/oar-node startup script fixes the problem (probably by load nvidia drivers). It also checks if nvidia drivers are ok at boot time by the way.
The
Enable_devices_cg = "YES"
enables hide of GPU devices that are not reserved in the current job.But the feature doesn't seem to work for the first job just after a reboot of the node. The next jobs are ok.
Tested with Debian 9.13 nodes, V100 and A100 GPUS, rebooted several times, the problem is reproducible
The text was updated successfully, but these errors were encountered: