-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
target vocab size #35
Comments
Hmm did you preprocess the full training set? Or the provided sample set? |
@da03 Emm, I preprocess the full training set. |
That's weird, can you load the provided pretrained model, find the vocab (https://github.com/harvardnlp/im2markup/blob/master/src/model/model.lua#L64) and then compare it to your vocabulary to see where they differ? |
@da03 It's soooo weird. I download both the processed files and raw files from http://lstm.seas.harvard.edu/latex/data/. I got a vocabulary of size 496 from both the two files (using the "train_filter.lst"). When I use the “train.lst” file (without filter), I got the size 519, still unequal to yours. I also compare the vocabulary in the provided pretrained model with mine, and I can not find the exact reason for causing this. |
It seems it is because of the different sizes of the training set. I found that the vocab size in this respo is 499, and his training set is 76444. https://github.com/ritheshkumar95/im2latex-tensorflow/tree/master/im2markup |
I found that the provided model has a vocabulary size 525, however, following the preprocessing, I got a vocabulary with size 496.
The text was updated successfully, but these errors were encountered: