Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust testing for chaining sacremoses CLI #150

Open
alvations opened this issue Jul 6, 2024 · 0 comments
Open

More robust testing for chaining sacremoses CLI #150

alvations opened this issue Jul 6, 2024 · 0 comments

Comments

@alvations
Copy link
Contributor

The CLI flags and chaining though pipeline should be tested with a little more robustness than just the examples in README.md

Not sure if it is still the case, but I found this from https://aclanthology.org/2020.wmt-1.88.pdf

During our early experiments we noticed several issues with our preprocessing pipeline which we
fixed for the later experiments. In particular, we noticed that some sacremoses command line flags were broken, and the out-of-the-box inference tool from FairSeq did not fully replicate the preprocessing pipeline used for training (punctuation normalization and vocabulary-aware subword segmentation). The original pipeline (called v1) was used for our baseline models. The later experiments used the fixed implementations of sacremoses and FairSeq (denoted by v2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant