Abstractions
The main abstractions introduced by maskrcnn_benchmark that are useful to have in mind are the following: ImageList
In PyTorch, the first dimension of the input to the network generally represents the batch dimension, and thus all elements of the same batch have the same height / width. In order to support images with different sizes and aspect ratios in the same batch, we created the ImageList class, which holds internally a batch of images (os possibly different sizes). The images are padded with zeros such that they have the same final size and batched over the first dimension. The original sizes of the images before padding are stored in the image_sizes attribute, and the batched tensor in tensors. We provide a convenience function to_image_list that accepts a few different input types, including a list of tensors, and returns an ImageList object.
from maskrcnn_benchmark.structures.image_list import to_image_list
images = [torch.rand(3, 100, 200), torch.rand(3, 150, 170)] batched_images = to_image_list(images)
batched_images_32 = to_image_list(images, size_divisible=32)
BoxList
The BoxList class holds a set of bounding boxes (represented as a Nx4 tensor) for a specific image, as well as the size of the image as a (width, height) tuple. It also contains a set of methods that allow to perform geometric transformations to the bounding boxes (such as cropping, scaling and flipping). The class accepts bounding boxes from two different input formats:
xyxy, where each box is encoded as a x1, y1, x2 and y2 coordinates, and
xywh, where each box is encoded as x1, y1, w and h.
Additionally, each BoxList instance can also hold arbitrary additional information for each bounding box, such as labels, visibility, probability scores etc.
Here is an example on how to create a BoxList from a list of coordinates:
from maskrcnn_benchmark.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT
width = 100 height = 200 boxes = [ [0, 10, 50, 50], [50, 20, 90, 60], [10, 10, 50, 50] ]
bbox = BoxList(boxes, image_size=(width, height), mode='xyxy')
bbox_scaled = bbox.resize((width * 2, height * 3)) bbox_flipped = bbox.transpose(FLIP_LEFT_RIGHT)
labels = torch.tensor([0, 10, 1]) bbox.add_field('labels', labels)
bbox_subset = bbox[[0, 2]]