Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issues in "The Annotated Transformer" #6

Open
alexeyr opened this issue Jun 11, 2019 · 0 comments
Open

Possible issues in "The Annotated Transformer" #6

alexeyr opened this issue Jun 11, 2019 · 0 comments

Comments

@alexeyr
Copy link

alexeyr commented Jun 11, 2019

  1. In http://nlp.seas.harvard.edu/2018/04/03/attention.html#encoder, the paper text says

    the output of each sub-layer is LayerNorm(x+Sublayer(x))... We apply dropout (cite) to the output of each sub-layer, before it is added to the sub-layer input and normalized.

    but the code is

    return x + self.dropout(sublayer(self.norm(x)))
    

    It seems it should be

    return self.norm(x + self.dropout(sublayer(x)))
    

    instead.

  2. In Encoder and Decoder, where does the extra norm on top of the stack come from?

  3. In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite). In the embedding layers, we multiply those weights by sqrt(dmodel).

    It's described in http://nlp.seas.harvard.edu/2018/04/03/attention.html#additional-components-bpe-search-averaging, but it may be better to link that section from the quoted part, I couldn't find it initially.

  4. Should http://nlp.seas.harvard.edu/2018/04/01/attention.html link to the updated version http://nlp.seas.harvard.edu/2018/04/03/attention.html?

@alexeyr alexeyr changed the title SublayerConnection definition in "The Annotated Transformer" Possible issues in "The Annotated Transformer" Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant