[IMP] reduce pdf object loop #253

bosd · 2024-11-02T08:28:35Z

Originally, Camelot scans 2 to 3 times for whole PDF objects on the page to get 1. Horizontal Texts, 2. Vertical texts, and 3. Images.
This PR is to change it to one scan to get everything.

This is a port of nexus-lib#1

bosd · 2024-11-05T17:29:48Z

Converted it back to draft. As it can be further optimized.
And in it's current form it is partially undoing another optimization from #255

Update the pyppdf_table_extraction imports

Get all image, char and text objects in one throw.

bosd added the performance Performance label Nov 2, 2024

bosd force-pushed the imp-reduce-pdf-object-loop branch from a5d4f83 to 4fe629e Compare November 2, 2024 14:45

bosd added the good first issue Good for newcomers label Nov 2, 2024

bosd marked this pull request as draft November 5, 2024 17:28

takaaki-mizuno and others added 4 commits November 9, 2024 12:18

Reduce loops to improve process speed

f2871de

Fixed import omissions.

cbb9d1a

Fix get_and_text_objects

495821b

[REM] obsolete function get_text_objects, Update imports

65c6620

Update the pyppdf_table_extraction imports

bosd force-pushed the imp-reduce-pdf-object-loop branch from 4fe629e to 65c6620 Compare November 9, 2024 11:19

[REF]: further reduce object loops

84b23c0

Get all image, char and text objects in one throw.

bosd marked this pull request as ready for review November 9, 2024 14:56

bosd merged commit 06f48a1 into py-pdf:main Nov 9, 2024
14 checks passed

bosd deleted the imp-reduce-pdf-object-loop branch November 9, 2024 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IMP] reduce pdf object loop #253

[IMP] reduce pdf object loop #253

bosd commented Nov 2, 2024

bosd commented Nov 5, 2024

[IMP] reduce pdf object loop #253

[IMP] reduce pdf object loop #253

Conversation

bosd commented Nov 2, 2024

bosd commented Nov 5, 2024