weird logic in positional embedding in APTModel (self.wpe)? #65

othertea · 2024-01-21T00:13:20Z

I'm finding our handling of the initial positional embeddings, before the APT blocks, (self.wpe or its absence in the definition of APTModel) to be a bit weird.
They are initialized here:

protein-lm-scaling/protein_lm/modeling/models/apt/model_pytorch.py

Lines 453 to 460 in 86ca8f5

    
           if self.position_embedding=="learned" or self.position_embedding == 'rope' or self.position_embedding == 'rerope' or self.position_embedding=="linear_rope_scaling" or self.position_embedding =="dynamic_rope_scaling": 
        
               self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim) 
        
               self.alibi = None 
        
           elif self.position_embedding=="alibi": 
        
               maxpos = config.n_positions 
        
               attn_heads = config.n_head 
        
               alibi = create_alibi_tensor(attn_heads,maxpos) 
        
               self.register_buffer('alibi',alibi)

and used here:

protein-lm-scaling/protein_lm/modeling/models/apt/model_pytorch.py

Lines 567 to 571 in 86ca8f5

    
           if self.position_embedding=="learned" or self.position_embedding == 'rope' or self.position_embedding == 'rerope' or self.position_embedding=="linear_rope_scaling" or self.position_embedding =="dynamic_rope_scaling": 
        
               position_embeds = self.wpe(position_ids) 
        
               hidden_states = inputs_embeds + position_embeds 
        
           else: 
        
               hidden_states = inputs_embeds

It seems that for learned embedding as well as for variants of rope, a learned positional embedding is added before passing on to the blocks. Only for alibi is this positional embedding omitted. (The APT blocks have rope/alibi as was specified, so this first positional embedding being omitted does not mean that these positional embeddings are never used.)
This seems weird to me because I don't see why rope should be grouped with learned embeddings. It makes more sense to me for rope variants to also omit having an initial positional embedding (i.e., no self.wpe). I would also be more okay with all of them having an initial positional embedding, but this doesn't seem the standard way language models are implemented e.g., in llama.

Tagging @talkhanz who I think was the original author of this logic, and @jamaliki @jeffreyruffolo @NZ99 @pascalnotin for their thoughts.

The text was updated successfully, but these errors were encountered:

jamaliki · 2024-01-23T14:03:09Z

This is weird to me too! It should only be there for learned embeddings, IMO.

talkhanz · 2024-01-30T06:17:21Z

Hey @othertea. Thanks for pointing this out! I believe you are correct about ignoring the positional embeddings for rope and its variants. I think I was trying to push a bit too hard to make the code similar to tranception :P and so made this error in that spirit.
As I understand it, the if conditions and initialization within APTModel need to be rectified? I'll make them in another PR? @othertea
@pascalnotin let me know your thoughts?

othertea · 2024-01-31T05:28:06Z

Thanks for confirming my suspicions, @jamaliki and @talkhanz !
@talkhanz don't worry about doing anything, I'll make the PR with the updates and tag you! I'm thinking it might be better to wait until the mup PR #64 is merged so that we avoid possibly creating merge conflict problems for @NZ99 .

talkhanz pushed a commit to talkhanz/protein-lm-scaling that referenced this issue Mar 15, 2024

fixed position encoding schemes pos_embeds as per OpenBioML#65

64def51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weird logic in positional embedding in APTModel (self.wpe)? #65

weird logic in positional embedding in APTModel (self.wpe)? #65

othertea commented Jan 21, 2024 •

edited

Loading

jamaliki commented Jan 23, 2024

talkhanz commented Jan 30, 2024

othertea commented Jan 31, 2024

weird logic in positional embedding in APTModel (self.wpe)? #65

weird logic in positional embedding in APTModel (self.wpe)? #65

Comments

othertea commented Jan 21, 2024 • edited Loading

jamaliki commented Jan 23, 2024

talkhanz commented Jan 30, 2024

othertea commented Jan 31, 2024

othertea commented Jan 21, 2024 •

edited

Loading