Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
This repository contains the code used in the Mixture-of-Supernet (MoS) work.
Folder/File | Experiments |
---|---|
mos-mt/mos-mt/ |
Machine Translation |
mos-bert/mos-bert/ |
BERT Pretraining |
If you use this code, please cite:
@inproceedings{jawahar2024mos,
title={Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts},
author={Ganesh Jawahar and Haichuan Yang and Yunyang Xiong and Zechun Liu and Dilin Wang and Fei Sun and Meng Li and Aasish Pappu and Barlas Oguz and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Raghuraman Krishnamoorthi and Vikas Chandra},
year={2024},
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
}
This repository is GPL-licensed.