Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

This repository contains the code used in the Mixture-of-Supernet (MoS) work.

Folder/File	Experiments
`mos-mt/mos-mt/`	Machine Translation
`mos-bert/mos-bert/`	BERT Pretraining

Citation

If you use this code, please cite:

@inproceedings{jawahar2024mos,
      title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts}, 
      author={Ganesh Jawahar and Haichuan Yang and Yunyang Xiong and Zechun Liu and Dilin Wang and Fei Sun and Meng Li and Aasish Pappu and Barlas Oguz and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Raghuraman Krishnamoorthi and Vikas Chandra},
      year={2024},
      booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
}

License

This repository is GPL-licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
mos-bert/mos-bert		mos-bert/mos-bert
mos-mt/mos-mt		mos-mt/mos-mt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Citation

License

About

Releases

Packages

Languages

License

UBC-NLP/MoS

Folders and files

Latest commit

History

Repository files navigation

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages