Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to build the GPT-J docker after successfully installation of tensorrt-llm #2022

Open
Bob123Yang opened this issue Jan 8, 2025 · 2 comments

Comments

@Bob123Yang
Copy link

Hi @arjunsuresh

When I was running the below command to build the docker for GPT-J:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev
--model=gptj-99
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=50

I got the failure as below, I'm not sure if it is related the existing docker (built for Resnet50 several days before) or not?

Successfully installed tensorrt-llm

[notice] A new release of pip is available: 23.3.1 -> 24.3.1
[notice] To update, run: python3 -m pip install --upgrade pip
Initializing model from /mnt/models/GPTJ-6B/checkpoint-final
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.48s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /mnt/models/GPTJ-6B/checkpoint-final
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading calibration dataset
Traceback (most recent call last):
  File "/code/tensorrt_llm/examples/quantization/quantize.py", line 363, in <module>
    main(args)
  File "/code/tensorrt_llm/examples/quantization/quantize.py", line 255, in main
    calib_dataloader = get_calib_dataloader(
  File "/code/tensorrt_llm/examples/quantization/quantize.py", line 187, in get_calib_dataloader
    dataset = load_dataset("cnn_dailymail", name="3.0.0", split="train")
  File "/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py", line 1849, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py", line 1731, in dataset_module_factory
    raise e1 from None
  File "/home/bob1/.local/lib/python3.10/site-packages/datasets/load.py", line 1618, in dataset_module_factory
    raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e
ConnectionError: Couldn't reach 'cnn_dailymail' on the Hub (LocalEntryNotFoundError)
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory '/home/bob1/CM/repos/local/cache/2479e8f0ba164d4c/repo/docker'

CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!
@arjunsuresh
Copy link
Contributor

It looks like a network error as the code is working fine for me. May be retry?

[notice] To update, run: python3 -m pip install --upgrade pip
Initializing model from /mnt/models/GPTJ-6B/checkpoint-final
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00,  4.10s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /mnt/models/GPTJ-6B/checkpoint-final
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading calibration dataset
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15.6k/15.6k [00:00<00:00, 43.8MB/s]
train-00000-of-00003.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 257M/257M [00:02<00:00, 86.9MB/s]
train-00001-of-00003.parquet: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 257M/257M [00:02<00:00, 103MB/s]
train-00002-of-00003.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 259M/259M [00:02<00:00, 99.9MB/s]
validation-00000-of-00001.parquet: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 34.7M/34.7M [00:00<00:00, 49.1MB/s]
test-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 30.0M/30.0M [00:01<00:00, 24.4MB/s]
Generating train split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [00:02<00:00, 105507.79 examples/s]
Generating validation split: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 13368/13368 [00:00<00:00, 96769.25 examples/s]
Generating test split: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 11490/11490 [00:00<00:00, 116725.17 examples/s]
{'quant_cfg': {'*weight_quantizer': {'num_bits': (4, 3), 'axis': None}, '*input_quantizer': {'num_bits': (4, 3), 'axis': None}, 'default': {'num_bits': (4, 3), 'axis': None}, '*.query_key_value.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, '*.Wqkv.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, '*.W_pack.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, '*.c_attn.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, '*.k_proj.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, '*.v_proj.output_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}}, 'algorithm': 'max'}
Starting quantization...
Replaced 507 modules to quantized modules

@Bob123Yang
Copy link
Author

Okay, I will try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants