Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR recognition model #158

Merged
merged 34 commits into from
Jan 29, 2025
Merged

OCR recognition model #158

merged 34 commits into from
Jan 29, 2025

Conversation

sokovninn
Copy link
Member

@sokovninn sokovninn commented Jan 23, 2025

New OCR recognition model, loss, metric and visualizer

The most important changes are summarized below:

Losses:

  • Introduced CTCLoss with optional focal loss weighting in luxonis_train/attached_modules/losses/ctc_loss.py and updated __init__.py to include CTCLoss. [1] [2] [3]
  • Updated luxonis_train/attached_modules/losses/README.md to document CTCLoss.

Metrics:

  • Added OCRAccuracy metric for OCR tasks in luxonis_train/attached_modules/metrics/ocr_accuracy.py and updated __init__.py to include OCRAccuracy. [1] [2] [3]
  • Updated luxonis_train/attached_modules/metrics/README.md to document OCRAccuracy.

Visualizers:

  • Introduced OCRVisualizer for visualizing OCR tasks in luxonis_train/attached_modules/visualizers/ocr_visualizer.py and updated __init__.py to include OCRVisualizer. [1] [2] [3]
  • Updated luxonis_train/attached_modules/visualizers/README.md to document OCRVisualizer.

Predefined Models:

  • Added OCRRecognitionModel to luxonis_train/config/predefined_models/__init__.py and updated README.md to document its components and parameters. [1] [2] [3]

Toy dataset creation example

def toy_ocr_generator():
    im_paths = glob.glob("*.png")
    labels = [os.path.splitext(os.path.basename(path))[0] for path in im_paths]
    for path, label in tqdm(zip(im_paths, labels)):
        if len(label):
            yield {
                "file": path,
                "annotation": {
                    "metadata": {"text": label},
                },
            }

Examples from the overfitted model on the toy dataset

ocr_recognition-OCRCTCHead_OCRVisualizer_3
ocr_recognition-OCRCTCHead_OCRVisualizer_2
ocr_recognition-OCRCTCHead_OCRVisualizer_1
ocr_recognition-OCRCTCHead_OCRVisualizer_0

@sokovninn sokovninn requested a review from a team as a code owner January 23, 2025 21:36
@sokovninn sokovninn requested review from kozlov721, klemen1999, tersekmatija and conorsim and removed request for a team January 23, 2025 21:36
@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 23, 2025
@sokovninn
Copy link
Member Author

sokovninn commented Jan 23, 2025

Possible improvements include:

  • Adding more advanced OCR metrics
  • Adding a temporal NRTR head together with NRTRLoss
  • Improving visualization
  • Adding a large variant
  • Improving encoder to handle more edge cases
  • Adding a beam search decoder
  • Adding OCR-specific augmentations
  • Adding OCR detection model (same backbone)

Copy link
Collaborator

@klemen1999 klemen1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, left some comments. One thing that we want to also make sure is the integration with HubAI and depthai-nodes - the archived model should have correct archive data so that the parser can parse it.

luxonis_train/config/predefined_models/README.md Outdated Show resolved Hide resolved
luxonis_train/nodes/backbones/pplcnet_v3/blocks.py Outdated Show resolved Hide resolved
luxonis_train/nodes/backbones/pplcnet_v3/pplcnet_v3.py Outdated Show resolved Hide resolved
luxonis_train/nodes/heads/ocr_ctc_head.py Outdated Show resolved Hide resolved
luxonis_train/nodes/heads/ocr_ctc_head.py Show resolved Hide resolved
luxonis_train/nodes/necks/svtr_neck/blocks.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 88.88889% with 78 lines in your changes missing coverage. Please review.

Project coverage is 94.23%. Comparing base (631b905) to head (32f4bfc).
Report is 40 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
luxonis_train/nodes/necks/svtr_neck/blocks.py 75.00% 37 Missing ⚠️
luxonis_train/nodes/heads/ocr_ctc_head.py 79.03% 13 Missing ⚠️
...nis_train/nodes/backbones/pplcnet_v3/pplcnet_v3.py 84.48% 9 Missing ⚠️
luxonis_train/utils/ocr.py 89.33% 8 Missing ⚠️
luxonis_train/loaders/utils.py 22.22% 7 Missing ⚠️
luxonis_train/nodes/backbones/pplcnet_v3/blocks.py 97.69% 3 Missing ⚠️
.../config/predefined_models/ocr_recognition_model.py 97.61% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
- Coverage   96.31%   94.23%   -2.09%     
==========================================
  Files         147      202      +55     
  Lines        6304     9349    +3045     
==========================================
+ Hits         6072     8810    +2738     
- Misses        232      539     +307     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sokovninn sokovninn requested a review from klemen1999 January 27, 2025 18:29
Copy link
Collaborator

@kozlov721 kozlov721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, left some small comments

configs/ocr_recognition_light_model.yaml Outdated Show resolved Hide resolved
@sokovninn sokovninn requested a review from kozlov721 January 28, 2025 17:06
Copy link
Collaborator

@klemen1999 klemen1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just couple of small comments

luxonis_train/nodes/README.md Outdated Show resolved Hide resolved
@sokovninn sokovninn merged commit 7b1fccd into main Jan 29, 2025
7 of 10 checks passed
@sokovninn sokovninn deleted the feat/ocr-recognition branch January 29, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants