You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 8, 2020. It is now read-only.
Hello. Thanks for sharing your work.
I trained a model following the steps in README and ran the evaluation using the run_all_evaluator.sh
It turns out most of the metrics are identicle to the results reported in your paper except PPL.
The results for my trained model are :
ll_scores: [(-9.701861720617387, 106.5074394250216), (-10.269295644873736, 120.9065905583248)]
The mean PPL is 113.7
However, the results should be around 32.
I think it may attribute to a different vocabulary or training KenLM with different corpus. I directly used the yelp_corpus_adapter for data preparation and yelp/reviews-train.txt to train KenLM.
Did I miss something ?
The text was updated successfully, but these errors were encountered:
I have the same issue. I tried training the language model on the dev and test split as well but got a similar PPL. Notably the overall_evaluator.py script should be changed in line 62 to ll_score, ppl_score = language_fluency.score_generated_sentences(generated_text_file_path, options.language_model_path) and in line 68 to ll_scores.append(ppl_score) because it formerly wanted to output a tuple of negative log likelihood and perplexity (might have something to do with Kenlm versions).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello. Thanks for sharing your work.
I trained a model following the steps in README and ran the evaluation using the run_all_evaluator.sh
It turns out most of the metrics are identicle to the results reported in your paper except PPL.
The results for my trained model are :
ll_scores: [(-9.701861720617387, 106.5074394250216), (-10.269295644873736, 120.9065905583248)]
The mean PPL is 113.7
However, the results should be around 32.
I think it may attribute to a different vocabulary or training KenLM with different corpus. I directly used the yelp_corpus_adapter for data preparation and yelp/reviews-train.txt to train KenLM.
Did I miss something ?
The text was updated successfully, but these errors were encountered: