A list of transformers
-
Attention Is All You Need, NIPS 2017 (paper) originial paper
-
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, ACL 2019 (paper) (pytorch & tensorflow code) segment-level recurrence, and relative position encoding
-
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, ACL 2019 (paper) (pytorch code) used for KG's tuples
-
Adaptive Attention Span in Transformers, ACL 2019 (paper) (pytorch code) adaptive attention span
-
XLNet: Generalized Autoregressive Pretraining for Language Understanding, arxiv 2019 (paper) (tensorflow code) permutation language model
-
Syntactically Supervised Transformers for Faster Neural Machine Translation, ACL 2019 (paper) (pytorch code) non-autoregressive decoding
-
Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction, ACL 2019 (paper) (pytorch code) relation extraction
-
Learning Deep Transformer Models for Machine Translation, ACL 2019 (paper) (pytorch code) residual
-
Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes, arxiv 2019 (paper) distributed computation
-
Universal Transformers, ICLR 2019 (paper) (tensorflow code) recurrent Transformer blocks
-
Lattice Transformer for Speech Translation, ACL 2019 (paper) on lattice (directed acyclic graph, a.k.a., DAG)
-
ERNIE: Enhanced Language Representation with Informative Entities, ACL 2019 (paper) (code)