Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Cannot load <class 'pyramid_dit.mmdit_modules.modeling_pyramid_mmdit.PyramidDiffusionMMDiT'> #230

Open
xsolo opened this issue Jan 7, 2025 · 1 comment

Comments

@xsolo
Copy link

xsolo commented Jan 7, 2025

This error happens when I run python app_multigpu.py and click on the UI to generate video

[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING]
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] *****************************************
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] *****************************************
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 0): env://, gpu 0
Setting the Sequence Parallel Size 2
The config attributes {'axes_dims_rope': [16, 24, 24], 'num_single_layers': 16} were passed to PyramidDiffusionMMDiT, but are not expected and will be ignored. Please verify your config.json configuration file.
The config attributes {'axes_dims_rope': [16, 24, 24], 'num_single_layers': 16} were passed to PyramidDiffusionMMDiT, but are not expected and will be ignored. Please verify your config.json configuration file.
Using the rotary position embedding
Using temporal causal attention
We interp the position embedding of condition latents
Traceback (most recent call last):
  File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 133, in <module>
Traceback (most recent call last):
  File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 133, in <module>
    main()
  File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 54, in main
    model = PyramidDiTForVideoGeneration(
  File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 141, in __init__
    main()
  File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 54, in main
    self.dit = build_pyramid_dit(
  File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 81, in build_pyramid_dit
        model = PyramidDiTForVideoGeneration(dit = PyramidDiffusionMMDiT.from_pretrained(

  File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 141, in __init__
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
        return fn(*args, **kwargs)self.dit = build_pyramid_dit(

  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 894, in from_pretrained
  File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 81, in build_pyramid_dit
    dit = PyramidDiffusionMMDiT.from_pretrained(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 894, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'pyramid_dit.mmdit_modules.modeling_pyramid_mmdit.PyramidDiffusionMMDiT'> from /home/xsolo/pf/diffusion_transformer_768p because the following keys are missing:
 transformer_blocks.1.attn.norm_add_k.weight, transformer_blocks.2.attn.norm_add_k.weight, transformer_blocks.3.attn.norm_add_q.weight, transformer_blocks.7.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_k.weight, transformer_blocks.0.attn.norm_add_k.weight, pos_embed.proj.bias, transformer_blocks.6.attn.norm_add_q.weight, transformer_blocks.2.attn.norm_add_q.weight, transformer_blocks.3.attn.norm_add_k.weight, transformer_blocks.1.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_q.weight, transformer_blocks.5.attn.norm_add_q.weight, transformer_blocks.0.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_k.weight, transformer_blocks.7.attn.norm_add_k.weight, pos_embed.proj.weight, transformer_blocks.5.attn.norm_add_k.weight.
 Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
    raise ValueError(
ValueError: Cannot load <class 'pyramid_dit.mmdit_modules.modeling_pyramid_mmdit.PyramidDiffusionMMDiT'> from /home/xsolo/pf/diffusion_transformer_768p because the following keys are missing:
 transformer_blocks.2.attn.norm_add_q.weight, transformer_blocks.0.attn.norm_add_k.weight, transformer_blocks.7.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_k.weight, transformer_blocks.3.attn.norm_add_q.weight, transformer_blocks.3.attn.norm_add_k.weight, transformer_blocks.1.attn.norm_add_k.weight, transformer_blocks.2.attn.norm_add_k.weight, transformer_blocks.5.attn.norm_add_k.weight, transformer_blocks.4.attn.norm_add_q.weight, pos_embed.proj.bias, transformer_blocks.0.attn.norm_add_q.weight, transformer_blocks.1.attn.norm_add_q.weight, transformer_blocks.7.attn.norm_add_k.weight, pos_embed.proj.weight, transformer_blocks.5.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_k.weight.
 Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
[2025-01-08 17:04:33,356] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 410096) of binary: /home/xsolo/miniconda3/envs/pyramid/bin/python
Traceback (most recent call last):
  File "/home/xsolo/miniconda3/envs/pyramid/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2025-01-08_17:04:33
  host      : linux-gpu-2l4
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 410097)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-01-08_17:04:33
  host      : linux-gpu-2l4
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 410096)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
  File "app_multigpu.py", line 36, in run_inference_multigpu
    subprocess.run(cmd, check=True)
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./scripts/app_multigpu_engine.sh', '2', 'diffusion_transformer_768p', '/home/xsolo/pf/', 't2v', '16', '9', '5', '768p', '/tmp/tmpsp1tuqx_/e62fc12a-5b82-404a-9c22-ce853ee5533e_output.mp4', 'Hugging bears']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2364, in run_sync_in_worker_thread
    return await future
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 864, in run
    result = context.run(func, *args)
  File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "app_multigpu.py", line 54, in generate_text_to_video
    return run_inference_multigpu(gpus, variant, model_path, temp, guidance_scale, video_guidance_scale, resolution, prompt)
  File "app_multigpu.py", line 38, in run_inference_multigpu
    raise RuntimeError(f"Error during video generation: {e}")
RuntimeError: Error during video generation: Command '['./scripts/app_multigpu_engine.sh', '2', 'diffusion_transformer_768p', '/home/xsolo/pf/', 't2v', '16', '9', '5', '768p', '/tmp/tmpsp1tuqx_/e62fc12a-5b82-404a-9c22-ce853ee5533e_output.mp4', 'Hugging bears']' returned non-zero exit status 1.
@xsolo
Copy link
Author

xsolo commented Jan 8, 2025

(pyramid) ➜  ~ tree /home/xsolo/pf/
pf
├── README.md
├── causal_video_vae
│   ├── config.json
│   └── diffusion_pytorch_model.bin
├── diffusion_transformer_384p
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── diffusion_transformer_768p
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── diffusion_transformer_image
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── text_encoder
│   ├── config.json
│   └── model.safetensors
├── text_encoder_2
│   ├── config.json
│   ├── model-00001-of-00002.safetensors
│   ├── model-00002-of-00002.safetensors
│   └── model.safetensors.index.json
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
└── tokenizer_2
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer.json
    └── tokenizer_config.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant