A Transformer implementation based on an implementation from Harvard The Levenshtein transformer implementation is built on top of the Transformer above, with many elements borrowed from Fairseq