From 3dc1ade6aa72b36c55f6c09195f613a0277ac950 Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 12:55:03 +0200 Subject: [PATCH 01/13] [IMP] Documentation Update advanced.rst --- docs/user/advanced.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 2fd696e3..06c52dd6 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -78,7 +78,6 @@ Let's plot all the text present on the table's PDF page. $ pypdf_table_extraction lattice -plot text foo.pdf .. figure:: ../_static/png/plot_text.png - :height: 674 :width: 1366 :scale: 50% :alt: A plot of all text on a PDF page @@ -199,7 +198,6 @@ You can also visualize the textedges found on a page by specifying ``kind='texte $ pypdf_table_extraction stream -plot textedge foo.pdf .. figure:: ../_static/png/plot_textedge.png - :height: 674 :width: 1366 :scale: 50% :alt: A plot of relevant textedges on a PDF page @@ -400,7 +398,6 @@ Let's see the table area that is detected by default. $ pypdf_table_extraction stream -plot contour edge_tol.pdf .. figure:: ../_static/png/edge_tol_1.png - :height: 674 :width: 1366 :scale: 50% :alt: Table area with default edge_tol @@ -421,7 +418,6 @@ To improve the detected area, you can increase the ``edge_tol`` (default: 50) va $ pypdf_table_extraction stream -e 500 -plot contour edge_tol.pdf .. figure:: ../_static/png/edge_tol_2.png - :height: 674 :width: 1366 :scale: 50% :alt: Table area with custom edge_tol From e04762d7fb9bb94870cc28df0313ec7a154c77fc Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 13:10:31 +0200 Subject: [PATCH 02/13] Update intro.rst --- docs/user/intro.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/intro.rst b/docs/user/intro.rst index 825e10e4..213d9e35 100644 --- a/docs/user/intro.rst +++ b/docs/user/intro.rst @@ -44,6 +44,6 @@ As you can already guess, that library is named after `The Camelot Project`_. .. _Arthurian legend: https://en.wikipedia.org/wiki/King_Arthur pypdf_table_extracion License ------------------------------ +------------------------------ .. include:: ../../LICENSE From fd41e22ae5ecd3641e66b091b5838299be41e042 Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 13:17:29 +0200 Subject: [PATCH 03/13] Update intro.rst --- docs/user/intro.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user/intro.rst b/docs/user/intro.rst index 213d9e35..aa07904b 100644 --- a/docs/user/intro.rst +++ b/docs/user/intro.rst @@ -43,7 +43,7 @@ As you can already guess, that library is named after `The Camelot Project`_. .. _Monty Python and the Holy Grail: https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail .. _Arthurian legend: https://en.wikipedia.org/wiki/King_Arthur -pypdf_table_extracion License ------------------------------- +pypdf_table_extraction License +------------------------------- .. include:: ../../LICENSE From 36ab544a66c10aaf8ccb43f856499362c69e38a4 Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 13:21:37 +0200 Subject: [PATCH 04/13] Update intro.rst --- docs/user/intro.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/intro.rst b/docs/user/intro.rst index aa07904b..7e962e4b 100644 --- a/docs/user/intro.rst +++ b/docs/user/intro.rst @@ -44,6 +44,6 @@ As you can already guess, that library is named after `The Camelot Project`_. .. _Arthurian legend: https://en.wikipedia.org/wiki/King_Arthur pypdf_table_extraction License -------------------------------- +------------------------------ .. include:: ../../LICENSE From 88bd0ad10196e6063d14bda28e1a69ea313a8cf2 Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 16:19:28 +0200 Subject: [PATCH 05/13] Advanced Image Layout tests --- docs/user/advanced.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 06c52dd6..41afd1b3 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -78,7 +78,6 @@ Let's plot all the text present on the table's PDF page. $ pypdf_table_extraction lattice -plot text foo.pdf .. figure:: ../_static/png/plot_text.png - :width: 1366 :scale: 50% :alt: A plot of all text on a PDF page :align: center @@ -399,7 +398,6 @@ Let's see the table area that is detected by default. .. figure:: ../_static/png/edge_tol_1.png :width: 1366 - :scale: 50% :alt: Table area with default edge_tol :align: center @@ -418,8 +416,6 @@ To improve the detected area, you can increase the ``edge_tol`` (default: 50) va $ pypdf_table_extraction stream -e 500 -plot contour edge_tol.pdf .. figure:: ../_static/png/edge_tol_2.png - :width: 1366 - :scale: 50% :alt: Table area with custom edge_tol :align: center From 8da7e4f4a7a01a4091be740c989140917650bf80 Mon Sep 17 00:00:00 2001 From: bosd Date: Mon, 14 Oct 2024 17:04:33 +0200 Subject: [PATCH 06/13] [IMP] Image display in docs --- docs/user/advanced.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 41afd1b3..01aac682 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -78,7 +78,6 @@ Let's plot all the text present on the table's PDF page. $ pypdf_table_extraction lattice -plot text foo.pdf .. figure:: ../_static/png/plot_text.png - :scale: 50% :alt: A plot of all text on a PDF page :align: center @@ -197,8 +196,6 @@ You can also visualize the textedges found on a page by specifying ``kind='texte $ pypdf_table_extraction stream -plot textedge foo.pdf .. figure:: ../_static/png/plot_textedge.png - :width: 1366 - :scale: 50% :alt: A plot of relevant textedges on a PDF page :align: center @@ -397,7 +394,6 @@ Let's see the table area that is detected by default. $ pypdf_table_extraction stream -plot contour edge_tol.pdf .. figure:: ../_static/png/edge_tol_1.png - :width: 1366 :alt: Table area with default edge_tol :align: center From f29b1b40a1370f823cc95f08b635ff1b4ec8b45b Mon Sep 17 00:00:00 2001 From: bosd Date: Tue, 15 Oct 2024 07:28:55 +0200 Subject: [PATCH 07/13] Update advanced.rst --- docs/user/advanced.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index 01aac682..7da382b8 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -644,10 +644,10 @@ To deal with such cases, you can tweak PDFMiner's `LAParams kwargs ` flavor, pypdf_table_extraction uses ``ghostscript`` to convert PDF pages to images for line recognition. If you face installation issues with ``ghostscript``, you can use an alternate image conversion backend called ``poppler``. You can specify which image conversion backend you want to use with +When using the :ref:`Lattice ` flavor, pypdf_table_extraction uses ``ghostscript`` to convert PDF pages to images for line recognition. If you face installation issues with ``ghostscript``, you can use an alternative image conversion backend called ``poppler``. You can specify which image conversion backend you want to use with .. code-block:: pycon From 95d74c1305f4b02450488fa3a260c769ded61bd7 Mon Sep 17 00:00:00 2001 From: bosd Date: Thu, 17 Oct 2024 16:48:22 +0200 Subject: [PATCH 08/13] Fixup logo URL Change the url of the logo, so it will show correctly on pypi From 9e0b724ba941a2ed1c42864c3d20646ab2acdc27 Mon Sep 17 00:00:00 2001 From: bosd Date: Sat, 26 Oct 2024 21:07:13 +0200 Subject: [PATCH 09/13] Update how-it-works.rst --- docs/user/how-it-works.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/how-it-works.rst b/docs/user/how-it-works.rst index 9f0efdb3..dfd2eea3 100644 --- a/docs/user/how-it-works.rst +++ b/docs/user/how-it-works.rst @@ -39,7 +39,7 @@ Lattice Lattice is more deterministic in nature, and it does not rely on guesses. It can be used to parse tables that have demarcated lines between cells, and it can automatically parse multiple tables present on a page. -It starts by converting the PDF page to an image using ghostscript, and then processes it to get horizontal and vertical line segments by applying a set of morphological transformations (erosion and dilation) using OpenCV. +It starts by converting the PDF page to an image using an image conversion backend (default pdfium), and then processes it to get horizontal and vertical line segments by applying a set of morphological transformations (erosion and dilation) using OpenCV. Let's see how Lattice processes the second page of `this PDF`_, step-by-step. From 1ad26c789c8aad8dace026123153d86ce94f1933 Mon Sep 17 00:00:00 2001 From: bosd Date: Sat, 2 Nov 2024 21:40:41 +0100 Subject: [PATCH 10/13] Update api.rst: Add Network and Hybrid parser --- docs/api.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/api.rst b/docs/api.rst index 93c2ffa8..b79ff587 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -21,6 +21,12 @@ Lower-Level Classes .. autoclass:: camelot.parsers.Lattice :inherited-members: +.. autoclass:: camelot.parsers.Network + :inherited-members: + +.. autoclass:: camelot.parsers.Hybrid + :inherited-members: + Lower-Lower-Level Classes ------------------------- From bb207e43d69a0df23e7969982f2904d881202478 Mon Sep 17 00:00:00 2001 From: bosd Date: Sat, 2 Nov 2024 21:43:37 +0100 Subject: [PATCH 11/13] Update api.rst: Add plotting --- docs/api.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/api.rst b/docs/api.rst index b79ff587..2a2ea14e 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -37,3 +37,9 @@ Lower-Lower-Level Classes :inherited-members: .. autoclass:: camelot.core.Cell + +Plotting +-------- + +.. autoclass:: camelot.plotting.Plotmethods + :inherited-members: From d1e00ed73dadf201d6a144de22fb41786e4bab0e Mon Sep 17 00:00:00 2001 From: bosd Date: Sat, 2 Nov 2024 21:50:45 +0100 Subject: [PATCH 12/13] Update api.rst Fixup Plot --- docs/api.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 2a2ea14e..85d629eb 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -41,5 +41,4 @@ Lower-Lower-Level Classes Plotting -------- -.. autoclass:: camelot.plotting.Plotmethods - :inherited-members: +.. autofunction:: camelot.plot From e015a5dc052a708b79cf4273e882e962b7be2946 Mon Sep 17 00:00:00 2001 From: bosd Date: Sat, 2 Nov 2024 21:57:53 +0100 Subject: [PATCH 13/13] Update api.rst: Fixup Plotting --- docs/api.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/api.rst b/docs/api.rst index 85d629eb..4c453c23 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -42,3 +42,6 @@ Plotting -------- .. autofunction:: camelot.plot + +.. autoclass:: camelot.plotting.PlotMethods + :inherited-members: