More robust testing for chaining sacremoses CLI #150

alvations · 2024-07-06T09:41:05Z

The CLI flags and chaining though pipeline should be tested with a little more robustness than just the examples in README.md

Not sure if it is still the case, but I found this from https://aclanthology.org/2020.wmt-1.88.pdf

During our early experiments we noticed several issues with our preprocessing pipeline which we
fixed for the later experiments. In particular, we noticed that some sacremoses command line flags were broken, and the out-of-the-box inference tool from FairSeq did not fully replicate the preprocessing pipeline used for training (punctuation normalization and vocabulary-aware subword segmentation). The original pipeline (called v1) was used for our baseline models. The later experiments used the fixed implementations of sacremoses and FairSeq (denoted by v2).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More robust testing for chaining sacremoses CLI #150

More robust testing for chaining sacremoses CLI #150

alvations commented Jul 6, 2024

More robust testing for chaining sacremoses CLI #150

More robust testing for chaining sacremoses CLI #150

Comments

alvations commented Jul 6, 2024