quipucamayoc
relies on Poppler for some of its PDF handling tools, such as:
- Extracting images from PDFs
- Extracting text from PDFs (usually embedded text from PDfs already OCRed)
Sadly as of 2022 there are no multi-platform Python bindings so we rely on calling the Poppler tools as command line utilities. Even then, installing Poppler on Windows might be a bit confusing so below we describe the alternative steps:
- Install the latest release from the poppler-windows Github repo.
- Alternatively, install Poppler on Windows using Anaconda
- Experimentally,
quipucamayoc
includes Poppler, although it might not correspond to the latest available release.
Note that for simplicity we include all files from poppler-windows
(37 items; 20mb uncompressed) but only a subset is needed (18 files, 15mb):
- freetype.dll
- jbig.dll
- lcms2.dll
- Lerc.dll
- libcrypto-3-x64.dll
- libcurl.dll
- libdeflate.dll
- libpng16.dll
- libssh2.dll
- openjp2.dll
- pdfimages.exe
- pdfinfo.exe
- pdftohtml.exe
- pdftotext.exe
- poppler.dll
- tiff.dll
- zlib.dll
- zstd.dll