You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
start any container with local_docker scheduler on a machine with nvidia gpu
run nvidia-smi inside container to verify that container does not detect gpu
pretrain/0
pretrain/0 =============
pretrain/0 == PyTorch ==
pretrain/0 =============
pretrain/0
pretrain/0 NVIDIA Release 23.12 (build 76438008)
pretrain/0 PyTorch Version 2.2.0a0+81ea7a4
pretrain/0
pretrain/0 Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
pretrain/0
pretrain/0 Copyright (c) 2014-2023 Facebook Inc.
pretrain/0 Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
pretrain/0 Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
pretrain/0 Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
pretrain/0 Copyright (c) 2011-2013 NYU (Clement Farabet)
pretrain/0 Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
pretrain/0 Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
pretrain/0 Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
pretrain/0 Copyright (c) 2015 Google Inc.
pretrain/0 Copyright (c) 2015 Yangqing Jia
pretrain/0 Copyright (c) 2013-2016 The Caffe contributors
pretrain/0 All rights reserved.
pretrain/0
pretrain/0 Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
pretrain/0
pretrain/0 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
pretrain/0 By pulling and using the container, you accept the terms and conditions of this license:
pretrain/0 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
pretrain/0
pretrain/0 Failed to detect NVIDIA driver version.
Expected behavior
if device capability is properly set to "gpu", then i should see devices inside container and can detect nvidia driver
after changing "compute" to "gpu", works as expected
pretrain/0
pretrain/0 =============
pretrain/0 == PyTorch ==
pretrain/0 =============
pretrain/0
pretrain/0 NVIDIA Release 23.12 (build 76438008)
pretrain/0 PyTorch Version 2.2.0a0+81ea7a4
pretrain/0
pretrain/0 Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
pretrain/0
pretrain/0 Copyright (c) 2014-2023 Facebook Inc.
pretrain/0 Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
pretrain/0 Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
pretrain/0 Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
pretrain/0 Copyright (c) 2011-2013 NYU (Clement Farabet)
pretrain/0 Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
pretrain/0 Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
pretrain/0 Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
pretrain/0 Copyright (c) 2015 Google Inc.
pretrain/0 Copyright (c) 2015 Yangqing Jia
pretrain/0 Copyright (c) 2013-2016 The Caffe contributors
pretrain/0 All rights reserved.
pretrain/0
pretrain/0 Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
pretrain/0
pretrain/0 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
pretrain/0 By pulling and using the container, you accept the terms and conditions of this license:
pretrain/0 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
pretrain/0
pretrain/0 NOTE: CUDA Forward Compatibility mode ENABLED.
pretrain/0 Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.129.03.
pretrain/0 See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
pretrain/0
Environment
torchx version (e.g. 0.1.0rc1): 0.6.0
Python version: 3.10
OS (e.g., Linux): AL2
How you installed torchx (conda, pip, source, docker): pip
🐛 Bug
Device Request capabilities should be updated to "gpu", not "compute"
https://github.com/pytorch/torchx/blob/main/torchx/schedulers/docker_scheduler.py#L308
Module (check all that applies):
torchx.spec
torchx.component
torchx.apps
torchx.runtime
torchx.cli
torchx.schedulers
torchx.pipelines
torchx.aws
torchx.examples
other
To Reproduce
Steps to reproduce the behavior:
Expected behavior
if device capability is properly set to "gpu", then i should see devices inside container and can detect nvidia driver
after changing "compute" to "gpu", works as expected
Environment
conda
,pip
, source,docker
): pipAdditional context
The text was updated successfully, but these errors were encountered: