Provide a script to cleanly download and normalize text #3
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
Rather than the current system of each sub-corpora it is own folder with its own code. Create a top-level
downloads.sh
which can re-assemble the sub-corpora.Separately, have the downloaded & pre-processed sub-corpora ready to be referenced from ADR, and NMT repos as submodules etc.
The text was updated successfully, but these errors were encountered: