Feature request: different normalization layers at different depths #199

pfeatherstone · 2023-10-24T08:49:46Z

In my research, RMSNorm and SimpleRMSNorm allow my models to converge early, but not necessarily fast. ScaleNorm converges the fastest, but there is a substantial delay in when it starts. LayerNorm is the worst in my use-case as it's quite unstable.
Furthermore, both RMSNorm and SimpleRMSNorm have undesirable side-effects at the output of my model due to my loss functions and boundary constraints. ScaleNorm does not suffer from this.

So, what would be cool, is to specify different normalizations at different depths. In my use-case, I would like to experiment with using SimpleRMSNorm in the early layers, then switch to ScaleNorm in the last layers.

Cheers

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: different normalization layers at different depths #199

Feature request: different normalization layers at different depths #199

pfeatherstone commented Oct 24, 2023

Feature request: different normalization layers at different depths #199

Feature request: different normalization layers at different depths #199

Comments

pfeatherstone commented Oct 24, 2023