Skip to content

A deep learning toolkit specialized for handwritten document analysis

License

Notifications You must be signed in to change notification settings

Transkribus/PyLaia

 
 

Repository files navigation

PyLaia

Build Status Python Version Code Style

PyLaia is a device agnostic, PyTorch based, deep learning toolkit specialized for handwritten document analysis. It is also a successor to Laia.

Disclaimer: The easiest way to learn to use PyLaia is to follow the IAM example for HTR. Apologies for not having a better documentation at this moment, I will keep improving it and adding other examples.

Installation

In order to install PyLaia, follow this recipe:

git clone https://github.com/jpuigcerver/PyLaia
cd PyLaia
pip install -r requirements.txt
python setup.py install

The following Python scripts will be installed in your system:

  • pylaia-htr-create-model: Create a VGG-like model with BLSTMs on top for handwriting text recognition. The script has different options to costumize the model. The architecture is based on the paper "Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?" (2017) by J. Puigcerver.
  • pylaia-htr-decode-ctc: Decode text line images using a trained model and the CTC algorithm. It also outputs segmentatation boundaries of recognized chars/words. For example, to obtain recognized hypotheses along with the corresponding word segmentation boundaries:
    pylaia-htr-decode-ctc \
      --print_args True \
      --train_path ./model \
      --model_filename model \
      --batch_size 24 \
      --print_img_ids \
      --use_letters \
      --separator=" " \
      --space "<space>" \
      --print_word_segm \
      --join_str " " \
      symbols.txt Lines-Processed test.lst | less
  • pylaia-htr-train-ctc: Train a model using the CTC algorithm and a set of text-line images and their transcripts.
  • pylaia-htr-netout: Dump the output of the model for a set of text-line images in order to decode using an external language model.

Some examples need additional tools and packages, which are not installed with pip install -r requirements.txt. For instance, typically ImageMagick is used to process images, or Kaldi is employed to perform Viterbi decoding (and lattice generation) combining the output of the neural network with a n-gram language model.

About

A deep learning toolkit specialized for handwritten document analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.6%
  • Shell 40.7%
  • Perl 1.6%
  • XSLT 0.1%