Several fixes related to rotary position embeddings #35376

mseeger · 2024-12-20T21:28:00Z

What does this PR do?

Changes related to position_embeddings being a mandatory argument
Remove position_ids argument of apply_rotary_pos_emb
Replace torch.stack by torch.cat, former requires equal shapes
esm: RoPE depends on position_ids, which was ignored.
Fix changes behavior, but should improve.
gpt_neox: Selection of attention compute type via class removed
gptj, codegen: RoPE must be applied per head, and some shape issues.
Probably changes behavior.
nemotron: config.partial_rotary_factor was not implemented. This is
why default changed to 1, so that behavior in default case does not change.

Fixes #35233. This is the first of two PRs providing the fix. I split it for easier reviewing.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd #35233
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?
The second PR fixes the bug the issue is about, and I am adding new tests to confirm it is fixed.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

First part of resolution of huggingface#35233 - Changes related to `position_embeddings` being a mandatory argument - Remove `position_ids` argument of `apply_rotary_pos_emb` - Replace `torch.stack` by `torch.cat`, former requires equal shapes - `esm`: RoPE depends on `position_ids`, which was ignored. - `gpt_neox`: Selection of attention compute type via class removed - `gptj`, `codegen`: RoPE must be applied per head, and some shape issues. - `nemotron`: `config.partial_rotary_factor` was not implemented.

mseeger mentioned this pull request Dec 21, 2024

Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd #35233

Open

4 tasks

mseeger force-pushed the fixrope_part1 branch 3 times, most recently from 987d726 to cdb18d5 Compare December 21, 2024 21:15

mseeger force-pushed the fixrope_part1 branch from cdb18d5 to 12a1a35 Compare December 22, 2024 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several fixes related to rotary position embeddings #35376

Several fixes related to rotary position embeddings #35376

mseeger commented Dec 20, 2024 •

edited

Loading

Several fixes related to rotary position embeddings #35376

Are you sure you want to change the base?

Several fixes related to rotary position embeddings #35376

Conversation

mseeger commented Dec 20, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

mseeger commented Dec 20, 2024 •

edited

Loading