-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
识别不准确,标题总是分到右边 #14308
Comments
感觉是自动分双栏的原因? |
关于问题“识别不准确,标题总是分到右边”,以下是可能的原因分析及解决方案: 原因分析
解决方案以下是几个优化方向,逐步排查和调整: 1. 使用更适合的版面分析模型
2. 调整检测参数
3. 禁用或优化版面恢复
4. 检查数据预处理
其他相关讨论与问题
总结通过调整版面分析模型、检测参数和恢复模式,可以有效改善标题错误分组的问题。如果问题仍未解决,建议提供更详细的输入图片和输出日志信息以进一步排查。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
方法都试过,不太行,而且将版面恢复功能recovery=true去掉,就不输出word文档了 |
估计是有bug,我后面抽空看看。 |
recovery_to_doc.py中sorted_layout_boxes方法的规则比较简单,容易把单栏误判为双栏 |
🔎 Search before asking
🐛 Bug (问题描述)
识别不准确,标题总是分到右边
🏃♂️ Environment (运行环境)
paddleocr --image_dir=./png_test/5 --type=structure --recovery=true --formula=true --recovery_to_markdown=true --lang=ch --output=./2
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
检测图片:
识别结果:
The text was updated successfully, but these errors were encountered: