-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RMSNormGated in Zamba2 #35943
base: main
Are you sure you want to change the base?
Fix RMSNormGated in Zamba2 #35943
Conversation
Rebase zamba2
rebase on upstream
Co-authored-by: Arthur <[email protected]>
This reverts commit 9007a52.
Co-authored-by: Arthur <[email protected]>
class Zamba2RMSNormGated(MambaRMSNormGated): | ||
pass | ||
class Zamba2RMSNormGated(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will also affect the mamba2 code then (as codestral mamba also uses ngroups > 1) - so I'd be for implementing this in the mamba2 code and use modular then.
cc @molbap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but as I'm no maintainer I leave the decision to the others 👀
cc @molbap I think! |
What does this PR do?
This PR extends
Zamba2RMSNormGated
to allowconfig.mamba_ngroups>1
. The Zamba2 7B checkpoints haveconfig.mamba_ngroups=2
so this change is necessary to have the correct forward pass.I defined
Zamba2RMSNormGated
inside modular_zamba2.py instead of importing it, as this differs from the definition in modeling_mamba2.py. The implementation in this PR is the torch version of the mamba-ssm implementation of the original mamba2 (used here and torch implementation given here).Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @Cyrilvallez