Skip to content

Latest commit

 

History

History
36 lines (30 loc) · 1.64 KB

README.md

File metadata and controls

36 lines (30 loc) · 1.64 KB

hocr-parser

Python parser for hOCR files using lxml

Build Status codecov Coverage Status

hOCR is an open standard for representing the results of optical character recognition (OCR). The results of OCR (the recognized text, layout, styles, etc.) are represented in hOCR using XHTML. This Python module parses an existing hOCR file and gives easy access to the hOCR elements and their attributes.

Install

Python 3.6+ is required, and you'll probably want to use some kind of virtual environment to install this package. Until I push the package to PyPi, you can install directly from Github with pip:

pip install git+https://github.com/jlieth/hocr-parser

Similar projects

External links