You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my research, RMSNorm and SimpleRMSNorm allow my models to converge early, but not necessarily fast. ScaleNorm converges the fastest, but there is a substantial delay in when it starts. LayerNorm is the worst in my use-case as it's quite unstable.
Furthermore, both RMSNorm and SimpleRMSNorm have undesirable side-effects at the output of my model due to my loss functions and boundary constraints. ScaleNorm does not suffer from this.
So, what would be cool, is to specify different normalizations at different depths. In my use-case, I would like to experiment with using SimpleRMSNorm in the early layers, then switch to ScaleNorm in the last layers.
Cheers
The text was updated successfully, but these errors were encountered:
In my research,
RMSNorm
andSimpleRMSNorm
allow my models to converge early, but not necessarily fast.ScaleNorm
converges the fastest, but there is a substantial delay in when it starts.LayerNorm
is the worst in my use-case as it's quite unstable.Furthermore, both
RMSNorm
andSimpleRMSNorm
have undesirable side-effects at the output of my model due to my loss functions and boundary constraints.ScaleNorm
does not suffer from this.So, what would be cool, is to specify different normalizations at different depths. In my use-case, I would like to experiment with using
SimpleRMSNorm
in the early layers, then switch toScaleNorm
in the last layers.Cheers
The text was updated successfully, but these errors were encountered: