Ensemble of All

Repository containing Ensemble of Language Models on several LM benchmarks.

Thanks to this works one does not need to create an individual model that is the new state of the art to attract attention; it is sufficient to develop a new model that learns patterns which other models do not. Thus, even a suboptimal model can be found to have value.

quick start

To run evaluation of code on all datasets:

$ python main.py

This will create optimal weighted combination of models in ensemble on each dataset. It will also generate index.html containing graphs and charts: LM Leaderboard

results

Perplexity on validation and test with Ensemble of All:

dataset	valid ppl	test ppl
penntreebank	48.92	47.31
wikitext-2	55.40	53.73
wikitext-103	13.12	13.29

addition of new models

New models can be easily add to the ensemble by producing probabilities of words on valid and test set and then they can be put into specific folders regarding datasets. We have reproduced most of the open sourced models and their probabilities on each word can be seen in individual folders PennTreeBank, Wikitext-2,Wikitext-103.

citation

Please cite the arxiv paper if you use our work:

@misc{herel2023advancing,
      title={Advancing State of the Art in Language Modeling}, 
      author={David Herel and Tomas Mikolov},
      year={2023},
      eprint={2312.03735},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Our code is MIT licensed. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PennTreeBank		PennTreeBank
WikiText-103		WikiText-103
WikiText-2		WikiText-2
LICENSE		LICENSE
README.md		README.md
gitignore		gitignore
index.html		index.html
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble of All

quick start

results

addition of new models

citation

License

About

Releases

Packages

Languages

License

DavidHerel/sota_lm

Folders and files

Latest commit

History

Repository files navigation

Ensemble of All

quick start

results

addition of new models

citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages