ocrallmypdf

This is a minimal wrapper around the jbarlow83/ocrmypdf container which adds an inotify-based directory watcher and automatically runs OCR processing on all incoming files in the input directory.

Think of this as a simplified version of cmccambridge/ocrmypdf-auto, where ocrallmypdf is using the existing tools incron and task-spooler to achieve a similar effect.

USAGE

Use your container runtime of choice and run:

docker run -d \
  -v /path/to/input:/input \
  -v /path/to/output:/output \
  ghcr.io/ansemjo/ocrallmypdf

Existing files in the input directory are not processed. The processing only triggers on CLOSE_WRITE and MOVED_TO events.

The container can be customized with a number of environment variables:

env	description	default
`INPUT`	watched input directory inside of the container	`/input`
`OUTPUT`	output directory for processed file in the container, be careful not to create an inotify loop by using the same directories!	`/output`
`OCR_OPTIONS`	options passed to `ocrmypdf` command	`--clean --deskew --output-type pdfa --skip-text`
`OCR_LANGUAGE`	language used with tesseract (`-l` flag of `ocrmypdf`), missing languages will be installed on startup	`deu+eng`
`REMOVE_ORIGINAL`	truthy value whether to remove original input files	`yes`
`JOBS`	number of concurrent jobs in task spooler, using too many at once may lead to ocr timeouts, since `ocrmypdf` uses all cores per process by default	`1`

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
ocrmypdf.sh		ocrmypdf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrallmypdf

USAGE

About

Releases

Packages

Contributors 2

Languages

License

ansemjo/ocrallmypdf

Folders and files

Latest commit

History

Repository files navigation

ocrallmypdf

USAGE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages