HOCkeR

Python package for combining hOCR files and images into searchable PDFs

HOCkeR is a Python package for combining hOCR files and images into searchable PDFs. The package lays the text on top of the image, and then creates a PDF with the text and image. The code used is from HOCRConverter by jbrinley. The code was designed for Python 2, therefore does not work with newer version of python, so I created this package as an update to the original code.

How to install

To install the package, run the following command within a python environment:

pip install hocker

If any errors occur whilst installing, try using the .whl file instead linked here

How to use hOCkeR

Below is an example of how to use hOCkeR to combine an png and a .hocr file into a PDF

import hocker as hkr

image_path = 'path/to/image.png'
hocr_path = 'path/to/image.hocr'

# Specify the element in the hocr file to use as the text
hocr = hkr.HOCRCombiner('ocrx_word') # For tesseract outputs, it is 'ocrx_word'

# Specify the hocr and image path
hocr.locate_image(image_path)
hocr.locate_hocr(hocr_path)

# Output the PDF
hocr.to_pdf('path/to/output.pdf')

Credits & links

hOCKeR by Lucas Warwick
HOCRConverter by jbrinley

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src/hocker		src/hocker
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOCkeR

Table of Contents

What is hOCkeR?

How to install

How to use hOCkeR

Credits & links

About

Releases

Packages

Languages

License

mahna2/HOCkeR

Folders and files

Latest commit

History

Repository files navigation

HOCkeR

Table of Contents

What is hOCkeR?

How to install

How to use hOCkeR

Credits & links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages