target vocab size #35

mily33 · 2020-04-27T02:44:53Z

I found that the provided model has a vocabulary size 525, however, following the preprocessing, I got a vocabulary with size 496.

da03 · 2020-04-27T02:46:16Z

Hmm did you preprocess the full training set? Or the provided sample set?

mily33 · 2020-04-27T02:50:32Z

@da03 Emm, I preprocess the full training set.

da03 · 2020-04-27T02:57:24Z

That's weird, can you load the provided pretrained model, find the vocab (https://github.com/harvardnlp/im2markup/blob/master/src/model/model.lua#L64) and then compare it to your vocabulary to see where they differ?

mily33 · 2020-04-27T08:53:51Z

@da03 It's soooo weird. I download both the processed files and raw files from http://lstm.seas.harvard.edu/latex/data/. I got a vocabulary of size 496 from both the two files (using the "train_filter.lst"). When I use the “train.lst” file (without filter), I got the size 519, still unequal to yours. I also compare the vocabulary in the provided pretrained model with mine, and I can not find the exact reason for causing this.

mily33 · 2020-04-27T09:03:54Z

It seems it is because of the different sizes of the training set. I found that the vocab size in this respo is 499, and his training set is 76444. https://github.com/ritheshkumar95/im2latex-tensorflow/tree/master/im2markup
My training set is of size 75275 after filter, equal to your provided processed training set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

target vocab size #35

target vocab size #35

mily33 commented Apr 27, 2020

da03 commented Apr 27, 2020

mily33 commented Apr 27, 2020

da03 commented Apr 27, 2020

mily33 commented Apr 27, 2020

mily33 commented Apr 27, 2020

target vocab size #35

target vocab size #35

Comments

mily33 commented Apr 27, 2020

da03 commented Apr 27, 2020

mily33 commented Apr 27, 2020

da03 commented Apr 27, 2020

mily33 commented Apr 27, 2020

mily33 commented Apr 27, 2020