Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to improve the layout analysis and document generation functionalities in
ppstructure/recovery/recovery_to_doc.py
, adds new tests, and updates testing configurations. The most important changes include the enhancement of thesorted_layout_boxes
function, the addition of new test cases, and updates to the testing configurations.Enhancements to layout analysis:
ppstructure/recovery/recovery_to_doc.py
: Improved thesorted_layout_boxes
function by adding comments for clarity, refining the criteria for classifying boxes as left or right columns, and ensuring single-column boxes are correctly identified. [1] [2] [3]New test cases:
tests/test_recovery_to_doc.py
: Added comprehensive test cases for double-column and single-column document structure analysis and docx generation, including validations for layout detection, column separation, and document content.Testing configuration updates:
pytest.ini
: Added configurations to ignore specific deprecation warnings and enabled verbose output for tests.tests/test_paddleocr.py
: Removed an unnecessary encoding declaration.close #14308