Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancing recovery_to_doc #14396

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

GreatV
Copy link
Collaborator

@GreatV GreatV commented Dec 16, 2024

This pull request includes several changes to improve the layout analysis and document generation functionalities in ppstructure/recovery/recovery_to_doc.py, adds new tests, and updates testing configurations. The most important changes include the enhancement of the sorted_layout_boxes function, the addition of new test cases, and updates to the testing configurations.

Enhancements to layout analysis:

  • ppstructure/recovery/recovery_to_doc.py: Improved the sorted_layout_boxes function by adding comments for clarity, refining the criteria for classifying boxes as left or right columns, and ensuring single-column boxes are correctly identified. [1] [2] [3]

New test cases:

  • tests/test_recovery_to_doc.py: Added comprehensive test cases for double-column and single-column document structure analysis and docx generation, including validations for layout detection, column separation, and document content.

Testing configuration updates:

  • pytest.ini: Added configurations to ignore specific deprecation warnings and enabled verbose output for tests.
  • tests/test_paddleocr.py: Removed an unnecessary encoding declaration.

close #14308

Copy link

paddle-bot bot commented Dec 16, 2024

Thanks for your contribution!

@SWHL
Copy link
Collaborator

SWHL commented Dec 16, 2024

Please provide unit tests for the corresponding case

@GreatV GreatV requested a review from SWHL December 23, 2024 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

识别不准确,标题总是分到右边
2 participants