Skip to content

Latest commit

 

History

History
17 lines (8 loc) · 600 Bytes

README.md

File metadata and controls

17 lines (8 loc) · 600 Bytes

Text helpers

A collection of scripts for preparing text files for things like ASR.

See the README.md files in each folder for more information about how to use these. Check the README and License info in each folder for specific copyright and license details.

Contents

PDF Cleaner

This script will extract text from a PDF file.

ZMS

A collection of scripts written by Romi Hill (Appen) and Zara Maxwell-Smith (CoEDL), for text extraction from PDF (and other format) files, corpus compilation and cleaning, and experiments with external lexicons for cleaning and corpus analysis.