Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various documentation fixes #227

Merged
merged 13 commits into from
Nov 9, 2024
14 changes: 14 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ Lower-Level Classes
.. autoclass:: camelot.parsers.Lattice
:inherited-members:

.. autoclass:: camelot.parsers.Network
:inherited-members:

.. autoclass:: camelot.parsers.Hybrid
:inherited-members:

Lower-Lower-Level Classes
-------------------------

Expand All @@ -31,3 +37,11 @@ Lower-Lower-Level Classes
:inherited-members:

.. autoclass:: camelot.core.Cell

Plotting
--------

.. autofunction:: camelot.plot

.. autoclass:: camelot.plotting.PlotMethods
:inherited-members:
18 changes: 3 additions & 15 deletions docs/user/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,6 @@ Let's plot all the text present on the table's PDF page.
$ pypdf_table_extraction lattice -plot text foo.pdf

.. figure:: ../_static/png/plot_text.png
:height: 674
:width: 1366
:scale: 50%
:alt: A plot of all text on a PDF page
:align: center

Expand Down Expand Up @@ -199,9 +196,6 @@ You can also visualize the textedges found on a page by specifying ``kind='texte
$ pypdf_table_extraction stream -plot textedge foo.pdf

.. figure:: ../_static/png/plot_textedge.png
:height: 674
:width: 1366
:scale: 50%
:alt: A plot of relevant textedges on a PDF page
:align: center

Expand Down Expand Up @@ -400,9 +394,6 @@ Let's see the table area that is detected by default.
$ pypdf_table_extraction stream -plot contour edge_tol.pdf

.. figure:: ../_static/png/edge_tol_1.png
:height: 674
:width: 1366
:scale: 50%
:alt: Table area with default edge_tol
:align: center

Expand All @@ -421,9 +412,6 @@ To improve the detected area, you can increase the ``edge_tol`` (default: 50) va
$ pypdf_table_extraction stream -e 500 -plot contour edge_tol.pdf

.. figure:: ../_static/png/edge_tol_2.png
:height: 674
:width: 1366
:scale: 50%
:alt: Table area with custom edge_tol
:align: center

Expand Down Expand Up @@ -656,10 +644,10 @@ To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://gith

.. _image-conversion-backend:

Use alternate image conversion backends
---------------------------------------
Use alternative image conversion backends
-----------------------------------------

When using the :ref:`Lattice <lattice>` flavor, pypdf_table_extraction uses ``ghostscript`` to convert PDF pages to images for line recognition. If you face installation issues with ``ghostscript``, you can use an alternate image conversion backend called ``poppler``. You can specify which image conversion backend you want to use with
When using the :ref:`Lattice <lattice>` flavor, pypdf_table_extraction uses ``ghostscript`` to convert PDF pages to images for line recognition. If you face installation issues with ``ghostscript``, you can use an alternative image conversion backend called ``poppler``. You can specify which image conversion backend you want to use with

.. code-block:: pycon

Expand Down
2 changes: 1 addition & 1 deletion docs/user/how-it-works.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Lattice

Lattice is more deterministic in nature, and it does not rely on guesses. It can be used to parse tables that have demarcated lines between cells, and it can automatically parse multiple tables present on a page.

It starts by converting the PDF page to an image using ghostscript, and then processes it to get horizontal and vertical line segments by applying a set of morphological transformations (erosion and dilation) using OpenCV.
It starts by converting the PDF page to an image using an image conversion backend (default pdfium), and then processes it to get horizontal and vertical line segments by applying a set of morphological transformations (erosion and dilation) using OpenCV.

Let's see how Lattice processes the second page of `this PDF`_, step-by-step.

Expand Down
4 changes: 2 additions & 2 deletions docs/user/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ As you can already guess, that library is named after `The Camelot Project`_.
.. _Monty Python and the Holy Grail: https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail
.. _Arthurian legend: https://en.wikipedia.org/wiki/King_Arthur

pypdf_table_extracion License
-----------------------------
pypdf_table_extraction License
------------------------------

.. include:: ../../LICENSE
Loading