From 71069a21baf143cf817ffe5f3e9987c9bf3b788f Mon Sep 17 00:00:00 2001 From: bosd Date: Sun, 10 Nov 2024 00:57:26 +0100 Subject: [PATCH] Update pdfminer url to new pdfminer.six --- docs/user/advanced.rst | 6 +++--- docs/user/how-it-works.rst | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 18a89b6d..f4405e37 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -282,12 +282,12 @@ Let's get back to the *x* coordinates we got from plotting the text that exists "NUMBER TYPE DBA NAME","","","LICENSEE NAME","ADDRESS","CITY","ST","ZIP","PHONE NUMBER","EXPIRES" "...","...","...","...","...","...","...","...","...","..." -Ah! Since `PDFMiner `_ merged the strings, "NUMBER", "TYPE" and "DBA NAME", all of them were assigned to the same cell. Let's see how we can fix this in the next section. +Ah! Since `PDFMiner `_ merged the strings, "NUMBER", "TYPE" and "DBA NAME", all of them were assigned to the same cell. Let's see how we can fix this in the next section. Split text along separators --------------------------- -To deal with cases like the output from the previous section, you can pass ``split_text=True`` to :meth:`read_pdf() `, which will split any strings that lie in different cells but have been assigned to a single cell (as a result of being merged together by `PDFMiner `_). +To deal with cases like the output from the previous section, you can pass ``split_text=True`` to :meth:`read_pdf() `, which will split any strings that lie in different cells but have been assigned to a single cell (as a result of being merged together by `PDFMiner `_). .. code-block:: pycon :class: full-width @@ -636,7 +636,7 @@ Tweak layout generation pypdf_table_extraction is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 `_ and `#215 `_), PDFMiner can group characters that should belong to the same sentence into separate sentences. -To deal with such cases, you can tweak PDFMiner's `LAParams kwargs `_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() `. To know more about the parameters you can tweak, you can check out `PDFMiner docs `_. +To deal with such cases, you can tweak PDFMiner's `LAParams kwargs `_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() `. To know more about the parameters you can tweak, you can check out `PDFMiner docs `_. .. code-block:: pycon diff --git a/docs/user/how-it-works.rst b/docs/user/how-it-works.rst index dfd2eea3..b19b9bc3 100644 --- a/docs/user/how-it-works.rst +++ b/docs/user/how-it-works.rst @@ -20,7 +20,7 @@ Where *Hybrid* is a combination of the *Network* and *Lattice* parser. Stream ------ -Stream can be used to parse tables that have whitespaces between cells to simulate a table structure. It is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences, using `margins `_. +Stream can be used to parse tables that have whitespaces between cells to simulate a table structure. It is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences, using `margins `_. 1. Words on the PDF page are grouped into text rows based on their *y* axis overlaps.