Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jules - various #295

Merged
merged 8 commits into from
Jun 7, 2024
Merged

Jules - various #295

merged 8 commits into from
Jun 7, 2024

Conversation

julian-smith-artifex-com
Copy link
Collaborator

No description provided.

pip._internal.req is not always available. Instead we parse requirements.txt by
hand.
@JorjMcKie JorjMcKie self-requested a review June 5, 2024 10:28
Copy link
Collaborator

@JorjMcKie JorjMcKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked through it and haven't seen problems.
I think we should let @green also review it and approve after that (if not too urgent / time permitting).

Avoid treating bullet-lists as tables.

The main change is in pdf2docx/layout/Blocks.py:Blocks.collect_stream_lines().

Other:

Added fitz.TEXT_CID_FOR_UNKNOWN_UNICODE to flags passed to PyMuPDF's
get_text().

New setting `raw_exceptions`, propogates Python exceptions instead of just
generating a diagnostic.

New setting `sort`, passed to PyMuPDF's get_text().

pdf2docx/table/Cell.py: use hasattr() as workaround to avoid error when no
`.is_text_block` member.
Specifically we automate conversion of .docx files to .pdf. We use docx2pdf on
Windows (requires Word), Libreoffice on other platforms.

Updated expected sidx_required values for libreoffice.

Define separate pytest test for each sample file, using
@pytest.mark.parametrize.
Default is true, which avoids treating bullet lists as tables.
@julian-smith-artifex-com
Copy link
Collaborator Author

Have force-pushed a small change to add a fixme note about new use of hasattr(), after comment by robin.

@julian-smith-artifex-com julian-smith-artifex-com requested review from dothinking and greendreamer and removed request for dothinking June 6, 2024 12:55
@julian-smith-artifex-com julian-smith-artifex-com merged commit c50f3d5 into master Jun 7, 2024
4 checks passed
@julian-smith-artifex-com julian-smith-artifex-com deleted the jules branch June 7, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants