Image processing is a crucial aspect of computer vision and is widely used in various applications such as face detection, object recognition, and image enhancement. OpenCV (Open Source Computer Vision Library) is a popular open-source library that provides a wide range of tools and functions for image processing. In this tutorial, we will explore the different types of images, how they are represented, and demonstrate basic image processing techniques using OpenCV.
A black and white image, also known as a binary image, consists of pixels with only two possible values: 0 (black) and 255 (white). These images are often used in applications where the distinction between the object and the background is essential.
A grayscale image contains shades of gray varying from black to white. Each pixel in a grayscale image has an intensity value ranging from 0 to 255, where 0 represents black, 255 represents white, and the values in between represent varying shades of gray.
RGB images are the most common type of images used in digital photography and computer graphics. An RGB image consists of three color channels: Red, Green, and Blue. Each pixel in an RGB image is represented by three values, one for each channel, which combine to produce a specific color.
Let’s have an example. In the illustration above, imagine the picture on the left only has a size of 5x5 pixels. Thus, the image consists of 25 pixels in total. In reality, we would barely see a picture this small, but it is a good size for illustration purposes. As you can see, the image has a grid. This grid can be used to access each pixel. On the right, we see how the python library OpenCV (cv2) stores this particular picture in matrices with a [5, 5, 3]
shape. The last index (3) indicates the three different colors: Red, Green, and Blue.
If we now access one particular pixel, for instance at location [4, 4]
, we receive the pixel values [24, 23, 34]
. The first value depicts the intensity for the color red, the second one represents the intensity for the color green, and the last one for blue. In combination, these three colors yield a new color which is depicted at this particular pixel location. Those values range from 0 to 255.
Image credit: Medium/Frederik vom Lehn
In OpenCV, images are represented as NumPy arrays. The way these arrays are structured depends on the type of image:
- Binary and Grayscale Images: Represented as a 2D array of shape
[height, width]
. - RGB Images: Represented as a 3D array of shape
[height, width, 3]
, where the last dimension corresponds to the three color channels (Red, Green, and Blue).
Let's start by reading and displaying an image using OpenCV.
import cv2
import matplotlib.pyplot as plt
# Read an RGB image
image = cv2.imread('path_to_image.jpg')
# Convert the image from BGR (OpenCV format) to RGB (Matplotlib format)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Display the image
plt.imshow(image_rgb)
plt.title('RGB Image')
plt.show()
OpenCV and PyTorch use different conventions for representing images:
- OpenCV:
[height, width, colorchannel]
- PyTorch:
[colorchannel, height, width]
This difference is important to keep in mind when working with images across these libraries.
To access the pixel values of an image in OpenCV, you can use array indexing. For example, to access the pixel at location [4, 4]
in an RGB image:
# Access the pixel at location [4, 4]
pixel = image[4, 4]
print(f'Pixel at [4, 4]: {pixel}')
One of the most common image processing tasks is converting an RGB image to a grayscale image. This can be done using the cv2.cvtColor
function.
# Convert the RGB image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Display the grayscale image
plt.imshow(gray_image, cmap='gray')
plt.title('Grayscale Image')
plt.show()
Thresholding is a technique used to create binary images. It converts a grayscale image into a binary image by applying a threshold value.
# Apply a binary threshold to the grayscale image
_, binary_image = cv2.threshold(gray_image, 127, 255, cv2.THRESH_BINARY)
# Display the binary image
plt.imshow(binary_image, cmap='gray')
plt.title('Binary Image')
plt.show()
Edge detection is used to identify the boundaries within an image. The Canny edge detector is a popular edge detection algorithm provided by OpenCV.
# Apply Canny edge detection
edges = cv2.Canny(image, 100, 200)
# Display the edges
plt.imshow(edges, cmap='gray')
plt.title('Edge Detection')
plt.show()
This tutorial provides an introduction to image processing with OpenCV, covering the types of images and basic image processing techniques. OpenCV offers a wide range of functionalities for image processing, making it a powerful tool for computer vision applications. Explore the various functions and methods provided by OpenCV to further enhance your image processing capabilities.
For more advanced image processing techniques and applications, refer to the OpenCV documentation.