-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
found some data label unconsistence #23
Comments
Hey, did you open the files correctly? See this quote from the Zenodo webpage:
|
sorry to waste your time.I see the web again, and chect what you said. |
f = open("./im2latex_formulas.lst", encoding="ISO-8859-1",newline="\n") |
Hmm that is peculiar: I downloaded the f = open("./im2latex_formulas.lst", newline="\n")
len(f.readlines())
Out[11]: 103559
f = open("./im2latex_formulas.lst", encoding="ISO-8859-1",newline="\n")
len(f.readlines())
Out[13]: 103559 I do not think changing the encoding helps, it is the way newlines are handled differently in different OSes. |
Excuse me, I am also interested in this project. and are you still doing formula recognition? Have you successfully reproduced the results of EM in the paper? |
51238 1a00a76d4e basic in im2latex_train.lst
latexs around line 51238 in im2latex_formulas.lst are not the latex content in pic 1a00a76d4e.
1a00a76d4e should point to line 51729 in im2latex_formulas.lst.
I have found some of this case, but not sure how many.
I download data from https://zenodo.org/record/56198#.XZ7yK_n_yHt.
Is anything wrong?
The text was updated successfully, but these errors were encountered: