-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mamba2 doesn't support Multi-GPU training (fast path) #35770
Comments
This looks like an issue with mamba_ssm as you said that the slow path is working. Can you raise an issue there instead ? |
Hi Marc, I'm not sure since I already did a multi-gpu Mamba2 training with another framework (https://github.com/state-spaces/s4) and it worked fine. Maybe it depends on how the parallelism is done |
Oh thanks for the context. What did you use for multi-gpu training ? Can you share your accelerate config ? |
Where can I find my accelerate config? |
Using "torchrun --nproc_per_node=2 train.py" will trigger DP. You can also just do : |
System Info
System Info
transformers
version: 4.46.3Who can help?
@ylacombe, @eustlb @muellerzr @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Hi! When using cuda_kernels_forward in Mamba2 on multiple GPUs the following error appears (full traceback in the end):
However, it works just fine when I'm using the slower path, torch_forward.
Do you know how to address this issue?
I'm using SFTTrainer (inherited from Transformers Trainer).
Thanks a lot.
Traceback
The text was updated successfully, but these errors were encountered: