Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_embedding_matrix assumes that the words are fed in the same order as the vocab #22

Open
sumeetsk opened this issue Oct 20, 2019 · 0 comments

Comments

@sumeetsk
Copy link

I'm studying 5_3_Document_Classification_with_CNN.

The make_embedding_matrix helper docs say that it should be fed in a list of words in the dataset. However, for the embedding matrix to return the correct embedding of a word from pretrained embeddings, the word list should be fed in the same order as in the vocabulary. Furthermore, there should be no gaps in the word indices in the vocabulary. These are big assumptions.

I think the correct way to construct the embedding matrix is to pass the vocab to the make_embedding_function, and use the token_to_idx method in the vocab to find which rows of the embedding matrix should be populated.

Correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant