Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several fixes related to rotary position embeddings #35376

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mseeger
Copy link

@mseeger mseeger commented Dec 20, 2024

What does this PR do?

  • Changes related to position_embeddings being a mandatory argument
  • Remove position_ids argument of apply_rotary_pos_emb
  • Replace torch.stack by torch.cat, former requires equal shapes
  • esm: RoPE depends on position_ids, which was ignored.
    Fix changes behavior, but should improve.
  • gpt_neox: Selection of attention compute type via class removed
  • gptj, codegen: RoPE must be applied per head, and some shape issues.
    Probably changes behavior.
  • nemotron: config.partial_rotary_factor was not implemented. This is
    why default changed to 1, so that behavior in default case does not change.

Fixes #35233. This is the first of two PRs providing the fix. I split it for easier reviewing.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

First part of resolution of huggingface#35233
- Changes related to `position_embeddings` being a mandatory argument
- Remove `position_ids` argument of `apply_rotary_pos_emb`
- Replace `torch.stack` by `torch.cat`, former requires equal shapes
- `esm`: RoPE depends on `position_ids`, which was ignored.
- `gpt_neox`: Selection of attention compute type via class removed
- `gptj`, `codegen`: RoPE must be applied per head, and some shape issues.
- `nemotron`: `config.partial_rotary_factor` was not implemented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
1 participant