Skip to content

Latest commit

 

History

History
98 lines (72 loc) · 2.99 KB

demo.README.md

File metadata and controls

98 lines (72 loc) · 2.99 KB
language thumbnail tags license datasets metrics
ru
en
translation
fsmt
Apache 2.0
wmt19
bleu
sacrebleu

MyModel

Model description

This is a ported version of fairseq wmt19 transformer for {src_lang}-{tgt_lang}.

For more details, please see, Facebook FAIR's WMT19 News Translation Task Submission.

The abbreviation FSMT stands for FairSeqMachineTranslation

All four models are available:

Intended uses & limitations

How to use

from transformers.tokenization_fsmt import FSMTTokenizer
from transformers.modeling_fsmt import FSMTForConditionalGeneration
mname = "facebook/wmt19-ru-en"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

input = "Машинное обучение - это здорово, не так ли?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded) # Machine learning is great, isn't it?

Limitations and bias

  • The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, content gets truncated

Training data

Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the paper.

Training procedure

Eval results

pair fairseq transformers
ru-en 41.3 39.20

The score was calculated using this code:

git clone https://github.com/huggingface/transformers
cd transformers
export PAIR=ru-en
export DATA_DIR=data/$PAIR
export SAVE_DIR=data/$PAIR
export BS=8
export NUM_BEAMS=15
mkdir -p $DATA_DIR
sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
echo $PAIR
PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS

BibTeX entry and citation info

@inproceedings{...,
  year={2020},
  title={Facebook FAIR's WMT19 News Translation Task Submission},
  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
  booktitle={Proc. of WMT},
}