You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @arjunsuresh I have encountered one issue for the docker migration.
I run the below command in the system A to build the docker successfully and run the Resnet50 inference in the docker successfully. Then I save the docker as the docker-with-test-successfully-1.tar.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev
--model=resnet50
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=1000
I loaded it on another system B with almost the same configuration and run the same Resnet50 inference again as below but failed with the below log. I'm not sure is there any limitation for the docker migration I should care about.
bob@Bob-Tomcat-Product:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker-with-test-successfully-1 latest 2b63d4ccc258 9 days ago 35.5GB
bob@Bob-Tomcat-Product:~$ docker run -it docker-with-test-successfully-1:latest /bin/bash
cmuser@d37b940a1f0a:~$ ls
CM cm-run-script-versions.json configshardware version_info.json
cmuser@d37b940a1f0a:~$ cm run script --tags=run-mlperf,inference,_r4.1-dev \
> --model=resnet50 \
> --implementation=nvidia \
> --framework=tensorrt \
> --category=edge \
> --scenario=Offline \
> --execution_mode=valid \
> --device=cuda \
> --division=closed \
> --rerun \
> --quiet
INFO:root:* cm run script "run-mlperf inference _r4.1-dev"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: * cm run script "detect cpu"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get mlcommons inference src"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/21f79a83541549b7/cm-cached-state.json
INFO:root: * cm run script "get sut description"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: * cm run script "detect cpu"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get compiler"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/6285b87ff0f74d8a/cm-cached-state.json
INFO:root: * cm run script "get cuda-devices _with-pycuda"
INFO:root: * cm run script "get cuda _toolkit"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/b5a3a8af88c14cc7/cm-cached-state.json
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
INFO:root:ENV[CM_CUDA_VERSION]: 12.2
INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu122
INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda/bin/nvcc
INFO:root:ENV[CUDA_HOME]: /usr/local/cuda
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get generic-python-lib _package.pycuda"
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh
INFO:root: ! call "detect_version" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/customize.py
Detected version: 2022.2.2
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: ! load /home/cmuser/CM/repos/local/cache/a29ea6efe3564a4b/cm-cached-state.json
INFO:root: * cm run script "get generic-python-lib _package.numpy"
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh
INFO:root: ! call "detect_version" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/customize.py
Detected version: 1.23.5
INFO:root: * cm run script "get python3"
INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: ! load /home/cmuser/CM/repos/local/cache/19ca7b3b57a74cd2/cm-cached-state.json
INFO:root: ! cd /home/cmuser
INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.sh from tmp-run.sh
Traceback (most recent call last):
File "/home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.py", line 1, in <module>
import pycuda.driver as cuda
File "/home/cmuser/.local/lib/python3.8/site-packages/pycuda/driver.py", line 66, in <module>
from pycuda._driver import * # noqa
ImportError: /lib/x86_64-linux-gnu/libcuda.so.1: file too short
CM error: Portable CM script failed (name = get-cuda-devices, return code = 256)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:
https://github.com/mlcommons/cm4mlops/issues
The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!
cmuser@d37b940a1f0a:~$
The text was updated successfully, but these errors were encountered:
@Bob123Yang Was the docker porting working for you before? We normally don't test this and we recommend launching the docker image via CM command only. The reason is, any needed options (like --gus=all for Nvidia) and mounts (for models, datasets, results etc) to docker run command will be added by CM only. Also, now the docker build time for Nvidia container is just about 15-20 minutes as prebuilt pytorch whl is used.
Another option is to copy the docker image under the same name and then CM command should automatically pick it instead of recreating.
Hi @arjunsuresh I have encountered one issue for the docker migration.
I run the below command in the system A to build the docker successfully and run the Resnet50 inference in the docker successfully. Then I save the docker as the docker-with-test-successfully-1.tar.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev
--model=resnet50
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=1000
I loaded it on another system B with almost the same configuration and run the same Resnet50 inference again as below but failed with the below log. I'm not sure is there any limitation for the docker migration I should care about.
The text was updated successfully, but these errors were encountered: