Group 17
report: https://sii-czxy.feishu.cn/wiki/Flu7wMvuCisUTekVYBccFWnKnbg?from=from_copylink
implemented version & terminal command:
- single card:
python main_singlecard.py -n official_singlecard
- torch.DDP:
python -m torch.distributed.launch --nproc_per_node 4 main_ddp.py -n official_ddp
- parameter-server:
python main_paramserver.py --name official_paramserver
- all-reduce tree:
python main_allreduce_tree.py --name official_allreduce_tree
- all-reduce ring:
python main_allreduce_ring.py --name official_allreduce_ring
- vanilla selfDDP:
python -m torch.distributed.launch --nproc_per_node 4 main_selfddp.py -n official_selfddp
- syncBN selfDDP:
python -m torch.distributed.launch --nproc_per_node 4 main_selfddp_syncbn.py -n official_selfddp_syncbn
all the experiment logs used in the report can be found in dir outputs
.