Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ scale_detections function added to adjust bbox,masks,obb for scaled images #1711

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions supervision/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
)
from supervision.dataset.utils import mask_to_rle, rle_to_mask
from supervision.detection.core import Detections
from supervision.detection.detection_utils import scale_detections
from supervision.detection.line_zone import (
LineZone,
LineZoneAnnotator,
Expand Down Expand Up @@ -219,6 +220,7 @@
"resize_image",
"rle_to_mask",
"scale_boxes",
"scale_detections",
"scale_image",
"xcycwh_to_xyxy",
"xywh_to_xyxy",
Expand Down
82 changes: 82 additions & 0 deletions supervision/detection/detection_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
from typing import Tuple

import cv2
import numpy as np

from supervision.detection.core import ORIENTED_BOX_COORDINATES, Detections


def scale_detections(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onuralpszr can we change function name to specific to letterbox otherwise it will confuse users?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hardikdava scale_letterbox_detections ? (maybe) what do you think @LinasKo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-letterboxing feels like a very narrow use case.

We should provide general functions which also enable letterbox reversal, and highlight that case in the examples & docs.

While it might be frustrating, let me take the time to review this in-depth over the weekend & start of next week, for reversing API decisions is hard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinasKo I can add enum parameter and change other two to src_res and target_res params and based on enum parameter it can be letterbox depended scale or normal scale, I can also add other extra cases to handle normal scale cases without "padding" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming to either scale_detections_to_fill or undo_letterboxing_detections.

I'd like some advice from @SkalskiP here - I've explained more in the global review comment.

detections: Detections,
letterbox_wh: Tuple[int, int],
resolution_wh: Tuple[int, int],
) -> Detections:
"""
This function scale the coordinates of bounding boxes and optionally scales the
masks,oriented bounding boxes to fit a new resolution, taking into account the
letterbox padding applied during the resizing process and return Detections object.

Args:
detections (Detections): The Detections object to be scaled.
letterbox_wh (Tuple[int, int]): The width and height of the letterboxed image.
resolution_wh (Tuple[int, int]): The target width and height for scaling.

Returns:
Detections: A new Detections object with scaled to target resolution.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc should come with an example showing how to use this. It should make it evident that letterboxing can be undone using this operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also find a place in the docs where we could showcase the scenario of:

  1. Letterbox an image
  2. Send the appropriately sized image to the model
  3. Undo the letterboxing

Without this, discovery of the method will be very low.

input_w, input_h = resolution_wh
letterbox_w, letterbox_h = letterbox_wh

target_ratio = letterbox_w / letterbox_h
image_ratio = input_w / input_h

if image_ratio >= target_ratio:
width_new = letterbox_w
height_new = int(letterbox_w / image_ratio)
else:
height_new = letterbox_h
width_new = int(letterbox_h * image_ratio)

scale = input_w / width_new
padding_top = (letterbox_h - height_new) // 2
padding_left = (letterbox_w - width_new) // 2

boxes = detections.xyxy.copy()
boxes[:, [0, 2]] -= padding_left
boxes[:, [1, 3]] -= padding_top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use move_boxes from utils

boxes[:, [0, 2]] *= scale
boxes[:, [1, 3]] *= scale

scaled_mask = None
if detections.mask is not None:
masks = []
for mask in detections.mask:
mask = mask[
padding_top : padding_top + height_new,
padding_left : padding_left + width_new,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use move_masks despite #1715 - we know that padding will be positive.

scaled_mask_i = cv2.resize(
mask.astype(np.uint8),
(input_w, input_h),
interpolation=cv2.INTER_LINEAR,
).astype(bool)
masks.append(scaled_mask_i)
scaled_mask = np.array(masks)

if ORIENTED_BOX_COORDINATES in detections.data:
obbs = np.array(detections.data[ORIENTED_BOX_COORDINATES]).copy()
obbs[:, :, 0] -= padding_left
obbs[:, :, 1] -= padding_top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use move_oriented_boxes from utils.

obbs[:, :, 0] *= scale
obbs[:, :, 1] *= scale
detections.data[ORIENTED_BOX_COORDINATES] = obbs

return Detections(
xyxy=boxes,
mask=scaled_mask,
confidence=detections.confidence,
class_id=detections.class_id,
tracker_id=detections.tracker_id,
data=detections.data,
metadata=detections.metadata,
)
Loading