Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] reduce pdf object loop #253

Merged
merged 5 commits into from
Nov 9, 2024
Merged

Conversation

bosd
Copy link
Collaborator

@bosd bosd commented Nov 2, 2024

Originally, Camelot scans 2 to 3 times for whole PDF objects on the page to get 1. Horizontal Texts, 2. Vertical texts, and 3. Images.
This PR is to change it to one scan to get everything.

This is a port of nexus-lib#1

@bosd bosd added the performance Performance label Nov 2, 2024
@bosd bosd force-pushed the imp-reduce-pdf-object-loop branch from a5d4f83 to 4fe629e Compare November 2, 2024 14:45
@bosd bosd added the good first issue Good for newcomers label Nov 2, 2024
@bosd bosd marked this pull request as draft November 5, 2024 17:28
@bosd
Copy link
Collaborator Author

bosd commented Nov 5, 2024

Converted it back to draft. As it can be further optimized.
And in it's current form it is partially undoing another optimization from #255

@bosd bosd force-pushed the imp-reduce-pdf-object-loop branch from 4fe629e to 65c6620 Compare November 9, 2024 11:19
Get all image, char and text objects in one throw.
@bosd bosd marked this pull request as ready for review November 9, 2024 14:56
@bosd bosd merged commit 06f48a1 into py-pdf:main Nov 9, 2024
14 checks passed
@bosd bosd deleted the imp-reduce-pdf-object-loop branch November 9, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers performance Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants