Skip to content

Scripts to use for preparing corpora for Elpis and other language tools

License

Notifications You must be signed in to change notification settings

CoEDL/text-helpers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text helpers

A collection of scripts for preparing text files for things like ASR.

See the README.md files in each folder for more information about how to use these. Check the README and License info in each folder for specific copyright and license details.

Contents

PDF Cleaner

This script will extract text from a PDF file.

ZMS

A collection of scripts written by Romi Hill (Appen) and Zara Maxwell-Smith (CoEDL), for text extraction from PDF (and other format) files, corpus compilation and cleaning, and experiments with external lexicons for cleaning and corpus analysis.

About

Scripts to use for preparing corpora for Elpis and other language tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages