You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This error happens when I run python app_multigpu.py and click on the UI to generate video
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING]
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] *****************************************
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable foreach process to be 1in default, to avoid your system being overloaded, please further tune the variable foroptimal performancein your application as needed.
[2025-01-08 17:04:23,329] torch.distributed.run: [WARNING] *****************************************
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/timm/models/hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 0): env://, gpu 0
Setting the Sequence Parallel Size 2
The config attributes {'axes_dims_rope': [16, 24, 24], 'num_single_layers': 16} were passed to PyramidDiffusionMMDiT, but are not expected and will be ignored. Please verify your config.json configuration file.
The config attributes {'axes_dims_rope': [16, 24, 24], 'num_single_layers': 16} were passed to PyramidDiffusionMMDiT, but are not expected and will be ignored. Please verify your config.json configuration file.
Using the rotary position embedding
Using temporal causal attention
We interp the position embedding of condition latents
Traceback (most recent call last):
File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 133, in<module>
Traceback (most recent call last):
File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 133, in<module>main()
File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 54, in main
model = PyramidDiTForVideoGeneration(
File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 141, in __init__
main()
File "/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py", line 54, in main
self.dit = build_pyramid_dit(
File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 81, in build_pyramid_dit
model = PyramidDiTForVideoGeneration(dit = PyramidDiffusionMMDiT.from_pretrained(
File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 141, in __init__
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)self.dit = build_pyramid_dit(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 894, in from_pretrained
File "/home/xsolo/Pyramid-Flow/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py", line 81, in build_pyramid_dit
dit = PyramidDiffusionMMDiT.from_pretrained(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 894, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'pyramid_dit.mmdit_modules.modeling_pyramid_mmdit.PyramidDiffusionMMDiT'> from /home/xsolo/pf/diffusion_transformer_768p because the following keys are missing:
transformer_blocks.1.attn.norm_add_k.weight, transformer_blocks.2.attn.norm_add_k.weight, transformer_blocks.3.attn.norm_add_q.weight, transformer_blocks.7.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_k.weight, transformer_blocks.0.attn.norm_add_k.weight, pos_embed.proj.bias, transformer_blocks.6.attn.norm_add_q.weight, transformer_blocks.2.attn.norm_add_q.weight, transformer_blocks.3.attn.norm_add_k.weight, transformer_blocks.1.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_q.weight, transformer_blocks.5.attn.norm_add_q.weight, transformer_blocks.0.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_k.weight, transformer_blocks.7.attn.norm_add_k.weight, pos_embed.proj.weight, transformer_blocks.5.attn.norm_add_k.weight.
Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None`if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
raise ValueError(
ValueError: Cannot load <class 'pyramid_dit.mmdit_modules.modeling_pyramid_mmdit.PyramidDiffusionMMDiT'> from /home/xsolo/pf/diffusion_transformer_768p because the following keys are missing:
transformer_blocks.2.attn.norm_add_q.weight, transformer_blocks.0.attn.norm_add_k.weight, transformer_blocks.7.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_q.weight, transformer_blocks.4.attn.norm_add_k.weight, transformer_blocks.3.attn.norm_add_q.weight, transformer_blocks.3.attn.norm_add_k.weight, transformer_blocks.1.attn.norm_add_k.weight, transformer_blocks.2.attn.norm_add_k.weight, transformer_blocks.5.attn.norm_add_k.weight, transformer_blocks.4.attn.norm_add_q.weight, pos_embed.proj.bias, transformer_blocks.0.attn.norm_add_q.weight, transformer_blocks.1.attn.norm_add_q.weight, transformer_blocks.7.attn.norm_add_k.weight, pos_embed.proj.weight, transformer_blocks.5.attn.norm_add_q.weight, transformer_blocks.6.attn.norm_add_k.weight.
Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None`if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
[2025-01-08 17:04:33,356] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 410096) of binary: /home/xsolo/miniconda3/envs/pyramid/bin/python
Traceback (most recent call last):
File "/home/xsolo/miniconda3/envs/pyramid/bin/torchrun", line 8, in<module>sys.exit(main())
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/xsolo/Pyramid-Flow/scripts/app_multigpu_engine.py FAILED
------------------------------------------------------------
Failures:
[1]:
time: 2025-01-08_17:04:33
host : linux-gpu-2l4
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 410097)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time: 2025-01-08_17:04:33
host : linux-gpu-2l4
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 410096)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
File "app_multigpu.py", line 36, in run_inference_multigpu
subprocess.run(cmd, check=True)
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./scripts/app_multigpu_engine.sh', '2', 'diffusion_transformer_768p', '/home/xsolo/pf/', 't2v', '16', '9', '5', '768p', '/tmp/tmpsp1tuqx_/e62fc12a-5b82-404a-9c22-ce853ee5533e_output.mp4', 'Hugging bears']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2364, in run_sync_in_worker_thread
return await future
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 864, in run
result = context.run(func, *args)
File "/home/xsolo/miniconda3/envs/pyramid/lib/python3.8/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "app_multigpu.py", line 54, in generate_text_to_video
return run_inference_multigpu(gpus, variant, model_path, temp, guidance_scale, video_guidance_scale, resolution, prompt)
File "app_multigpu.py", line 38, in run_inference_multigpu
raise RuntimeError(f"Error during video generation: {e}")
RuntimeError: Error during video generation: Command '['./scripts/app_multigpu_engine.sh', '2', 'diffusion_transformer_768p', '/home/xsolo/pf/', 't2v', '16', '9', '5', '768p', '/tmp/tmpsp1tuqx_/e62fc12a-5b82-404a-9c22-ce853ee5533e_output.mp4', 'Hugging bears']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered:
This error happens when I run
python app_multigpu.py
and click on the UI to generate videoThe text was updated successfully, but these errors were encountered: