Module 5.1

Computer Vision Basics

Master the foundation of how machines see and understand images. Learn how digital images are stored as pixel arrays, work with different color spaces like RGB and HSV, apply geometric transformations such as rotation and scaling, and detect edges and contours for object recognition using OpenCV and Python.

50 min
Beginner
Hands-on
What You'll Learn
  • Understand how images are represented digitally
  • Work with color spaces (RGB, BGR, Grayscale, HSV)
  • Perform basic image transformations and filtering
  • Apply edge detection and contour finding
  • Build foundational CV pipelines with OpenCV
Contents
01

Introduction to Computer Vision

Computer Vision is one of the most exciting fields in artificial intelligence, teaching machines to "see" and understand the visual world just like humans do. Imagine teaching a computer to recognize faces in photos, detect objects in videos, or even guide a self-driving car through traffic. From medical imaging systems that spot tumors early to smartphone apps that apply fun filters to your selfies, computer vision powers countless applications you use every day. In this beginner-friendly lesson, you will learn the fundamental building blocks that make all of these amazing systems possible, starting from the very basics of how computers represent and process images.

What is Computer Vision?

Computer Vision (CV) is the science of teaching computers to interpret and understand the visual world around us. Think about how effortlessly you can recognize a friend's face in a crowd, read a street sign while driving, or catch a ball thrown at you. These tasks seem simple to humans, but they require incredibly complex processing that our brains perform automatically. Computer Vision aims to give machines this same ability by using cameras, algorithms, and artificial intelligence. The field combines image processing (manipulating pixel values), machine learning (teaching computers to recognize patterns), and domain-specific knowledge to extract meaningful information from photos and videos. Whether you want to build a face recognition system, create an app that identifies plants, or develop autonomous robots, understanding computer vision fundamentals is your essential first step.

Key Concept

Computer Vision

A field of AI that trains computers to interpret and understand the visual world. Using digital images and deep learning models, machines can accurately identify and classify objects, and react to what they "see."

Real-World Applications

Computer vision has transformed virtually every industry you can think of, and new applications emerge every day. In healthcare, CV algorithms analyze X-rays, MRIs, and CT scans to help doctors detect diseases like cancer earlier than ever before, potentially saving millions of lives. In retail, stores like Amazon Go use computer vision to let customers simply walk in, grab items, and walk out without waiting in checkout lines. Self-driving cars from Tesla, Waymo, and others use dozens of cameras and CV algorithms to detect pedestrians, read traffic signs, identify lane markings, and navigate complex traffic scenarios. Your smartphone uses CV for face unlock, photo organization, and those fun AR filters on social media. Even agriculture uses drones with computer vision to monitor crop health across thousands of acres. Understanding these real-world applications helps you see why mastering CV fundamentals opens doors to some of the most exciting and impactful careers in technology.

Autonomous Vehicles
Object detection, lane tracking, pedestrian recognition
Medical Imaging
Tumor detection, X-ray analysis, retinal scans
Robotics
Pick and place, navigation, quality inspection
Security
Face recognition, surveillance, anomaly detection

Getting Started with OpenCV

OpenCV (Open Source Computer Vision Library) is the most popular and powerful library for computer vision, used by everyone from beginners learning their first image processing scripts to major tech companies building production systems. Originally developed by Intel in 1999 and now maintained by a global community of developers, OpenCV provides over 2,500 highly optimized algorithms covering everything from basic image loading to advanced deep learning inference. The best part? It is completely free and open source! OpenCV works seamlessly with Python and NumPy, which means if you already know basic Python, you are ready to start building computer vision applications. The library handles all the complex low-level operations (like memory management and hardware optimization), so you can focus on the fun part: solving real-world vision problems.

# Install OpenCV (run in terminal)
# pip install opencv-python

# Standard imports for computer vision
import cv2
import numpy as np

# Check OpenCV version
print(f"OpenCV Version: {cv2.__version__}")  # 4.8.0 (or your version)

Loading Your First Image

Every computer vision project starts with one fundamental skill: loading an image into your program. Think of it like opening a photo on your computer, except now you can programmatically access and modify every single pixel! OpenCV makes this incredibly easy with the cv2.imread() function, which reads image files (JPEG, PNG, BMP, and many more formats) from your hard drive and converts them into NumPy arrays that you can manipulate with Python code. Once loaded, you can display images in popup windows, examine their dimensions and color information, and save modified versions back to disk. Let us start by loading your first image and exploring what information OpenCV gives us about it.

# Load an image from file
image = cv2.imread('photo.jpg')

# Check if image loaded successfully
if image is None:
    print("Error: Could not load image")
else:
    # Display image properties
    print(f"Image shape: {image.shape}")      # (height, width, channels)
    print(f"Image dtype: {image.dtype}")      # uint8 (0-255)
    print(f"Image size: {image.size} pixels") # total pixels

# Display the image in a window
cv2.imshow('My First Image', image)
cv2.waitKey(0)       # Wait for any key press
cv2.destroyAllWindows()  # Close all windows

Displaying Images with Matplotlib

When working in Jupyter notebooks or Google Colab (which are incredibly popular for learning and experimenting with computer vision), you cannot use OpenCV's popup windows because notebooks run in a web browser. Instead, you will use Matplotlib, Python's go-to plotting library, to display images inline right in your notebook. However, here is where that BGR vs RGB issue we mentioned earlier becomes really important! Since OpenCV loads images in BGR format but Matplotlib expects RGB format, you must convert your images before displaying them, or your colors will look completely wrong (blue objects will appear red and vice versa). The good news is that the conversion is just one line of code: cv2.cvtColor(image, cv2.COLOR_BGR2RGB). Get in the habit of always doing this conversion when displaying OpenCV images with Matplotlib, and you will avoid a lot of head-scratching moments when your images look weird!

import matplotlib.pyplot as plt

# Load image (BGR format in OpenCV)
image_bgr = cv2.imread('photo.jpg')

# Convert BGR to RGB for Matplotlib
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Display with Matplotlib
plt.figure(figsize=(10, 6))
plt.imshow(image_rgb)
plt.title('Image displayed with Matplotlib')
plt.axis('off')  # Hide axes
plt.show()
Function Description Example
cv2.imread() Load image from file img = cv2.imread('photo.jpg')
cv2.imshow() Display image in window cv2.imshow('Title', img)
cv2.imwrite() Save image to file cv2.imwrite('output.png', img)
cv2.waitKey() Wait for key press cv2.waitKey(0)
cv2.destroyAllWindows() Close all windows cv2.destroyAllWindows()

Practice Questions

Task: Write code to load an image and print its dimensions, number of channels, and total pixel count.

# Your code here: Load 'sample.jpg' and print:
# - Height and width
# - Number of color channels
# - Total number of pixels
import cv2

# Load the image
image = cv2.imread('sample.jpg')

# Extract dimensions
height, width, channels = image.shape

# Calculate total pixels
total_pixels = height * width

print(f"Height: {height} pixels")
print(f"Width: {width} pixels")
print(f"Channels: {channels}")
print(f"Total pixels: {total_pixels:,}")

Task: Load an image with OpenCV, convert it to RGB, and display it using Matplotlib with a title.

# Your code here: 
# 1. Load 'landscape.jpg' with cv2.imread()
# 2. Convert from BGR to RGB
# 3. Display with matplotlib, add title "Landscape Photo"
import cv2
import matplotlib.pyplot as plt

# Load image (BGR format)
image_bgr = cv2.imread('landscape.jpg')

# Convert BGR to RGB
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Display with Matplotlib
plt.figure(figsize=(12, 8))
plt.imshow(image_rgb)
plt.title('Landscape Photo', fontsize=14)
plt.axis('off')
plt.show()

Task: Load two images and display them side by side using Matplotlib subplots.

# Your code here:
# 1. Load 'before.jpg' and 'after.jpg'
# 2. Convert both to RGB
# 3. Create 1x2 subplot figure
# 4. Display images with titles "Before" and "After"
import cv2
import matplotlib.pyplot as plt

# Load both images
before_bgr = cv2.imread('before.jpg')
after_bgr = cv2.imread('after.jpg')

# Convert both to RGB
before_rgb = cv2.cvtColor(before_bgr, cv2.COLOR_BGR2RGB)
after_rgb = cv2.cvtColor(after_bgr, cv2.COLOR_BGR2RGB)

# Create side-by-side plot
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

axes[0].imshow(before_rgb)
axes[0].set_title('Before', fontsize=14)
axes[0].axis('off')

axes[1].imshow(after_rgb)
axes[1].set_title('After', fontsize=14)
axes[1].axis('off')

plt.tight_layout()
plt.show()
02

Image Fundamentals

Before you can build impressive computer vision applications, you need to understand the simple but powerful concept at the heart of all digital images: pixels and numbers. When you look at a beautiful photograph on your screen, what you are actually seeing is a giant grid of tiny colored squares, each one called a pixel. To a computer, that same photo is just a big table (or matrix) of numbers, where each number represents how bright or what color each pixel should be. This might sound abstract at first, but once you grasp this fundamental concept, everything else in computer vision starts to make sense. In this section, you will learn exactly how computers store images, how to access and modify individual pixels, and how to work with both grayscale (black and white) and color images using Python and NumPy arrays.

Interactive: Pixel Intensity Explorer

Live Demo

Adjust the slider to see how pixel values from 0-255 translate to grayscale intensity. Click on the grid to see individual pixel values.

0 (Black) 128 (Gray) 255 (White)

RGB: (128, 128, 128)


8x8 Pixel Grid Visualization

Each cell represents a pixel. Hover to see values. The gradient shows 0-255 intensity range.

Hover over a pixel to see its value

Pixels: The Building Blocks

A pixel (short for "picture element") is the smallest building block of any digital image, like a single LEGO brick in a massive LEGO creation. If you zoom in enough on any photo on your computer or phone, you will eventually see these tiny individual squares. In a grayscale (black and white) image, each pixel holds just one number ranging from 0 (pure black) to 255 (pure white), with all the shades of gray in between. A value of 128 would be medium gray, 50 would be dark gray, and 200 would be light gray. Color images are more complex because they need to represent the full rainbow of colors. Instead of one number, each pixel in a color image contains three numbers: one for red, one for green, and one for blue (the RGB model). By mixing different intensities of these three primary colors, we can create any color you can imagine. Understanding how pixels work is absolutely fundamental because every single operation in computer vision, from detecting edges to recognizing faces, ultimately comes down to reading and manipulating these pixel values.

Key Concept

Pixel

The smallest addressable element of a digital image. Each pixel contains numerical values representing color or intensity. Standard images use 8-bit values (0-255) per channel.

# Create a simple grayscale image (8x8 pixels)
import numpy as np
import cv2

# Values range from 0 (black) to 255 (white)
grayscale_img = np.array([
    [0, 50, 100, 150, 200, 255, 200, 150],
    [50, 100, 150, 200, 255, 200, 150, 100],
    [100, 150, 200, 255, 200, 150, 100, 50],
    [150, 200, 255, 200, 150, 100, 50, 0],
    [200, 255, 200, 150, 100, 50, 0, 50],
    [255, 200, 150, 100, 50, 0, 50, 100],
    [200, 150, 100, 50, 0, 50, 100, 150],
    [150, 100, 50, 0, 50, 100, 150, 200]
], dtype=np.uint8)

print(f"Shape: {grayscale_img.shape}")  # (8, 8)
print(f"Data type: {grayscale_img.dtype}")  # uint8

Images as NumPy Arrays

Here is where the magic happens: OpenCV stores every image as a NumPy array, which is essentially a powerful, multi-dimensional table of numbers. If you have used NumPy before for data science or math operations, you already have a head start! When you load an image with OpenCV, you get back a NumPy array where each element represents a pixel value. The .shape property tells you the image dimensions: a grayscale image has shape (height, width), meaning it is a 2D array with rows and columns. A color image has shape (height, width, channels), typically (height, width, 3) for the three color channels. This array-based representation is incredibly powerful because you can use all of NumPy's tools to manipulate images. Want to make an image brighter? Just add a number to all pixels. Want to access a specific pixel? Use array indexing like image[row, column]. Want to crop a region? Use array slicing. The possibilities are endless, and the syntax is clean and intuitive.

# Load a color image and examine its structure
image = cv2.imread('photo.jpg')

# Image dimensions
height, width, channels = image.shape
print(f"Height: {height}, Width: {width}, Channels: {channels}")

# Access a single pixel (row, column)
pixel = image[100, 200]  # Returns [B, G, R] values
print(f"Pixel at (100, 200): B={pixel[0]}, G={pixel[1]}, R={pixel[2]}")

# Modify a pixel
image[100, 200] = [255, 0, 0]  # Set to blue (BGR)

# Access a region of interest (ROI)
roi = image[50:150, 100:250]  # Rows 50-150, Columns 100-250
print(f"ROI shape: {roi.shape}")

Grayscale vs Color Images

One of the most important decisions you will make for each computer vision task is whether to work with color or grayscale images. Grayscale images have just one channel (brightness only), which means they use three times less memory than color images and process three times faster. That might not sound like much for a single small image, but when you are processing thousands of images or working with video at 30 frames per second, the difference adds up quickly! More importantly, many computer vision algorithms, especially edge detection and shape analysis, do not actually need color information. They work just as well (or even better) on grayscale images because they focus on structure rather than color. On the other hand, if color is important for your task, like detecting a red stop sign among gray buildings, you obviously need to keep the color information. The rule of thumb is simple: start with grayscale for speed and simplicity, and only use color when your specific task requires it.

# Load as grayscale directly
gray_image = cv2.imread('photo.jpg', cv2.IMREAD_GRAYSCALE)
print(f"Grayscale shape: {gray_image.shape}")  # (height, width) - no channel dimension

# Convert color to grayscale
color_image = cv2.imread('photo.jpg')
gray_converted = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)

# Create a blank image (black)
blank_gray = np.zeros((480, 640), dtype=np.uint8)
blank_color = np.zeros((480, 640, 3), dtype=np.uint8)

# Create a white image
white_image = np.ones((480, 640), dtype=np.uint8) * 255

# Create an image with a specific color (red in BGR)
red_image = np.zeros((480, 640, 3), dtype=np.uint8)
red_image[:, :] = [0, 0, 255]  # BGR format

Reading Modes

OpenCV's imread function accepts a flag parameter that controls how the image is loaded. You can force grayscale conversion, preserve alpha channels, or load unchanged. Choosing the right mode prevents unnecessary processing and memory usage.

Flag Value Description
cv2.IMREAD_COLOR 1 Load as BGR color image (default)
cv2.IMREAD_GRAYSCALE 0 Load as grayscale
cv2.IMREAD_UNCHANGED -1 Load with alpha channel if present
# Different reading modes
color = cv2.imread('photo.jpg', cv2.IMREAD_COLOR)       # Default BGR
gray = cv2.imread('photo.jpg', cv2.IMREAD_GRAYSCALE)    # Grayscale
unchanged = cv2.imread('logo.png', cv2.IMREAD_UNCHANGED)  # With alpha

print(f"Color shape: {color.shape}")       # (h, w, 3)
print(f"Gray shape: {gray.shape}")         # (h, w)
print(f"Unchanged shape: {unchanged.shape}")  # (h, w, 4) if has alpha

Drawing on Images

OpenCV provides functions for drawing shapes, lines, and text on images. These are useful for visualizing results, annotating detections, and creating debug outputs. All drawing functions modify the image in-place, so create a copy if you need to preserve the original.

# Create a blank canvas
canvas = np.zeros((400, 600, 3), dtype=np.uint8)

# Draw a line (image, start, end, color, thickness)
cv2.line(canvas, (50, 50), (550, 50), (255, 0, 0), 2)

# Draw a rectangle (image, top-left, bottom-right, color, thickness)
cv2.rectangle(canvas, (50, 100), (200, 200), (0, 255, 0), 3)

# Draw a filled rectangle (thickness = -1)
cv2.rectangle(canvas, (250, 100), (400, 200), (0, 255, 255), -1)

# Draw a circle (image, center, radius, color, thickness)
cv2.circle(canvas, (500, 150), 50, (0, 0, 255), 2)

# Draw text (image, text, origin, font, scale, color, thickness)
cv2.putText(canvas, 'OpenCV Drawing', (50, 350), 
            cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

Practice Questions

Task: Create a 256x256 grayscale image where pixel values increase from left (0) to right (255).

# Your code here: Create horizontal gradient image
# Hint: Use np.tile or np.arange with reshaping
import numpy as np
import cv2

# Create gradient row
gradient_row = np.arange(256, dtype=np.uint8)

# Tile to create full image
gradient_image = np.tile(gradient_row, (256, 1))

print(f"Shape: {gradient_image.shape}")
print(f"Left edge value: {gradient_image[0, 0]}")   # 0
print(f"Right edge value: {gradient_image[0, 255]}")  # 255

# Display
cv2.imshow('Gradient', gradient_image)
cv2.waitKey(0)

Task: Load an image, extract a 100x100 region from the center, and replace it with red color.

# Your code here:
# 1. Load 'photo.jpg'
# 2. Calculate center coordinates
# 3. Extract 100x100 ROI from center
# 4. Fill ROI with red (BGR: 0, 0, 255)
import cv2

# Load image
image = cv2.imread('photo.jpg')
height, width = image.shape[:2]

# Calculate center region bounds
center_y, center_x = height // 2, width // 2
start_y, end_y = center_y - 50, center_y + 50
start_x, end_x = center_x - 50, center_x + 50

# Extract ROI (for reference)
roi = image[start_y:end_y, start_x:end_x].copy()

# Replace center with red
image[start_y:end_y, start_x:end_x] = [0, 0, 255]

# Display result
cv2.imshow('Modified Image', image)
cv2.waitKey(0)

Task: Create a 400x400 image with concentric circles forming a target pattern (5 rings).

# Your code here:
# 1. Create 400x400 white canvas
# 2. Draw 5 concentric circles from center
# 3. Alternate between red and white colors
# 4. Draw a small filled circle at center
import numpy as np
import cv2

# Create white canvas
canvas = np.ones((400, 400, 3), dtype=np.uint8) * 255
center = (200, 200)

# Draw concentric circles
radii = [180, 140, 100, 60, 20]
colors = [(0, 0, 255), (255, 255, 255), (0, 0, 255), (255, 255, 255), (0, 0, 255)]

for radius, color in zip(radii, colors):
    cv2.circle(canvas, center, radius, color, -1)

# Draw outline rings
for radius in [180, 140, 100, 60, 20]:
    cv2.circle(canvas, center, radius, (0, 0, 0), 2)

# Center bullseye
cv2.circle(canvas, center, 10, (0, 0, 0), -1)

cv2.imshow('Target', canvas)
cv2.waitKey(0)
03

Color Spaces

Color spaces are different ways of describing and organizing colors using numbers, and understanding them is crucial for effective computer vision work. Think of it like having different languages to describe the same color: you could say a tomato is "red" in everyday English, or you could precisely specify its RGB values as (255, 99, 71) for a computer. Different color spaces excel at different tasks. RGB (Red, Green, Blue) is intuitive because it matches how computer monitors create colors by mixing light. HSV (Hue, Saturation, Value) separates the actual color (hue) from how vivid or bright it is, making it much easier to detect objects by color regardless of lighting conditions. Grayscale strips away all color information, leaving just brightness values, which simplifies and speeds up many processing tasks. In this section, you will learn how to convert between these color spaces, when to use each one, and practical techniques for color-based object detection that you can apply to your own projects.

RGB and BGR Color Models

RGB (Red, Green, Blue) is the most intuitive color model because it directly matches how your computer monitor, phone screen, and TV create colors. Every pixel on these displays contains three tiny lights: one red, one green, and one blue. By varying the intensity of each light from 0 (off) to 255 (maximum brightness), we can create over 16 million different colors! Pure red is (255, 0, 0), pure green is (0, 255, 0), pure blue is (0, 0, 255), white is all lights at max (255, 255, 255), and black is all lights off (0, 0, 0). Now here is an important quirk you need to remember: OpenCV uses BGR (Blue, Green, Red) format instead of RGB, with the channels in reverse order. This historical decision dates back to early camera hardware from the 1990s. It is a common source of confusion for beginners because if you load an image with OpenCV and display it with another library like Matplotlib (which expects RGB), your colors will look completely wrong, with blues appearing red and reds appearing blue! The solution is simple: always convert with cv2.cvtColor(image, cv2.COLOR_BGR2RGB) before displaying with other libraries.

Color Model

BGR (Blue, Green, Red)

OpenCV's default color format where channels are ordered Blue, Green, Red. A pixel with value [255, 0, 0] is pure blue, not red. Always convert to RGB before displaying with Matplotlib.

# Understanding BGR channel order
import cv2
import numpy as np

# Create a pure blue image (BGR: Blue=255, Green=0, Red=0)
blue_image = np.zeros((100, 100, 3), dtype=np.uint8)
blue_image[:, :] = [255, 0, 0]  # This is BLUE in BGR

# Create a pure red image
red_image = np.zeros((100, 100, 3), dtype=np.uint8)
red_image[:, :] = [0, 0, 255]  # This is RED in BGR

# Split channels
image = cv2.imread('photo.jpg')
b, g, r = cv2.split(image)
print(f"Blue channel shape: {b.shape}")

# Merge channels (can reorder)
rgb_image = cv2.merge([r, g, b])  # Now in RGB order

Converting Between Color Spaces

OpenCV makes converting between different color spaces incredibly easy with the cvtColor function (short for "convert color"). You simply pass your image and a conversion code that tells OpenCV what to convert from and to. For example, cv2.COLOR_BGR2RGB converts from BGR to RGB, cv2.COLOR_BGR2GRAY converts to grayscale, and cv2.COLOR_BGR2HSV converts to HSV. The conversion codes follow a consistent naming pattern: COLOR_source2destination. OpenCV supports dozens of color spaces, but the ones you will use most often as a beginner are BGR, RGB, Grayscale, and HSV. Each conversion happens instantly, even for large images, thanks to OpenCV's highly optimized code running under the hood. One important note: these conversions create a new image, they do not modify the original. So you can safely convert the same image to multiple color spaces without worrying about messing up your original data.

# Load color image
image_bgr = cv2.imread('photo.jpg')

# Convert to different color spaces
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
image_gray = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2GRAY)
image_hsv = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2HSV)
image_lab = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2LAB)

print(f"Original BGR: {image_bgr.shape}")   # (h, w, 3)
print(f"Grayscale: {image_gray.shape}")     # (h, w)
print(f"HSV: {image_hsv.shape}")            # (h, w, 3)
print(f"LAB: {image_lab.shape}")            # (h, w, 3)

HSV Color Space

HSV stands for Hue, Saturation, and Value, and it is a game-changer for color-based object detection. Here is why: in RGB, detecting a "red" object is tricky because a red apple in bright sunlight and the same apple in shadow have completely different RGB values, even though they are both obviously red to our eyes. HSV solves this problem by separating "what color is it" (Hue) from "how vivid is the color" (Saturation) and "how bright is it" (Value). The Hue channel represents pure colors on a circular scale: in OpenCV, red is around 0-10 or 170-179 (it wraps around!), orange is around 10-25, yellow is 25-35, green is 35-85, cyan is 85-100, blue is 100-130, and purple/magenta is 130-170. Saturation ranges from 0 (gray, no color) to 255 (pure, vivid color). Value ranges from 0 (black) to 255 (brightest). By specifying a range of Hue values and allowing wide ranges for Saturation and Value, you can detect objects by color regardless of lighting conditions. This is the secret behind most color-based tracking and segmentation systems!

Hue (H)
Color type: 0-179 in OpenCV. Red is near 0/180, Green around 60, Blue around 120.
Saturation (S)
Color purity: 0-255. Low values are grayish, high values are vivid.
Value (V)
Brightness: 0-255. Zero is black, 255 is brightest.
# Color detection using HSV
image = cv2.imread('colored_objects.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

# Define range for blue color detection
lower_blue = np.array([100, 50, 50])
upper_blue = np.array([130, 255, 255])

# Create mask where blue pixels are white, others black
mask = cv2.inRange(hsv, lower_blue, upper_blue)

# Apply mask to original image
blue_only = cv2.bitwise_and(image, image, mask=mask)

# Common HSV ranges (OpenCV uses H: 0-179)
# Red: H=0-10 or H=170-180, S=50-255, V=50-255
# Green: H=35-85, S=50-255, V=50-255
# Blue: H=100-130, S=50-255, V=50-255
# Yellow: H=20-35, S=50-255, V=50-255

Grayscale Conversion

Converting a color image to grayscale is not as simple as just averaging the RGB channels, and understanding why reveals something fascinating about human vision! Our eyes are not equally sensitive to all colors. We are most sensitive to green (which makes sense evolutionarily, since detecting green plants and predators in green forests was crucial for survival), moderately sensitive to red, and least sensitive to blue. The standard grayscale conversion formula reflects this: Gray = 0.299 x Red + 0.587 x Green + 0.114 x Blue. Notice how green contributes almost 59% to the final brightness, while blue contributes only about 11%! This weighted average produces grayscale images that match how humans perceive brightness. For computer vision, grayscale is incredibly useful: edge detection algorithms like Canny, shape analysis, template matching, and many machine learning models work exclusively on grayscale images. Converting to grayscale is often your first preprocessing step, and OpenCV can do it either during loading (cv2.IMREAD_GRAYSCALE) or afterward (cv2.COLOR_BGR2GRAY).

# Standard grayscale conversion
image = cv2.imread('photo.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Grayscale formula (approximately):
# Gray = 0.299*R + 0.587*G + 0.114*B

# Manual grayscale conversion for understanding
b, g, r = cv2.split(image)
gray_manual = (0.114 * b + 0.587 * g + 0.299 * r).astype(np.uint8)

# Load directly as grayscale (more efficient)
gray_direct = cv2.imread('photo.jpg', cv2.IMREAD_GRAYSCALE)
Color Space Channels Best Use Case
BGR/RGB 3 (Blue, Green, Red) Display, standard processing
Grayscale 1 (Intensity) Edge detection, shape analysis
HSV 3 (Hue, Saturation, Value) Color-based segmentation
LAB 3 (Lightness, a, b) Color correction, perceptual uniformity
YCrCb 3 (Luma, Chroma) Skin detection, video compression

Practice Questions

Task: Load a color image, split it into B, G, R channels, and display each channel as a grayscale image.

# Your code here:
# 1. Load 'photo.jpg'
# 2. Split into B, G, R channels
# 3. Display each channel using matplotlib subplots
import cv2
import matplotlib.pyplot as plt

# Load image
image = cv2.imread('photo.jpg')
b, g, r = cv2.split(image)

# Display channels
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

# Original (convert to RGB for display)
axes[0].imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
axes[0].set_title('Original')

# Individual channels (displayed as grayscale)
axes[1].imshow(b, cmap='gray')
axes[1].set_title('Blue Channel')

axes[2].imshow(g, cmap='gray')
axes[2].set_title('Green Channel')

axes[3].imshow(r, cmap='gray')
axes[3].set_title('Red Channel')

for ax in axes:
    ax.axis('off')
plt.tight_layout()
plt.show()

Task: Create a mask to detect green objects in an image using HSV color space.

# Your code here:
# 1. Load image and convert to HSV
# 2. Define green color range (H: 35-85, S: 50-255, V: 50-255)
# 3. Create mask using cv2.inRange()
# 4. Apply mask to show only green regions
import cv2
import numpy as np

# Load image and convert to HSV
image = cv2.imread('colored_objects.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

# Define green color range
lower_green = np.array([35, 50, 50])
upper_green = np.array([85, 255, 255])

# Create mask
mask = cv2.inRange(hsv, lower_green, upper_green)

# Apply mask to original image
green_only = cv2.bitwise_and(image, image, mask=mask)

# Display results
cv2.imshow('Original', image)
cv2.imshow('Mask', mask)
cv2.imshow('Green Objects', green_only)
cv2.waitKey(0)

Task: Detect both red and blue objects, highlight red in one color and blue in another.

# Your code here:
# 1. Load image and convert to HSV
# 2. Create masks for red (0-10 and 170-180) and blue (100-130)
# 3. Combine red masks with bitwise_or
# 4. Create output showing red regions in magenta, blue in cyan
import cv2
import numpy as np

# Load and convert
image = cv2.imread('colored_objects.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

# Red detection (wraps around hue spectrum)
lower_red1 = np.array([0, 50, 50])
upper_red1 = np.array([10, 255, 255])
lower_red2 = np.array([170, 50, 50])
upper_red2 = np.array([180, 255, 255])

mask_red1 = cv2.inRange(hsv, lower_red1, upper_red1)
mask_red2 = cv2.inRange(hsv, lower_red2, upper_red2)
mask_red = cv2.bitwise_or(mask_red1, mask_red2)

# Blue detection
lower_blue = np.array([100, 50, 50])
upper_blue = np.array([130, 255, 255])
mask_blue = cv2.inRange(hsv, lower_blue, upper_blue)

# Create output image
output = image.copy()
output[mask_red > 0] = [255, 0, 255]  # Magenta for red
output[mask_blue > 0] = [255, 255, 0]  # Cyan for blue

cv2.imshow('Detected Colors', output)
cv2.waitKey(0)
04

Image Transformations

Image transformations are the essential tools that let you modify, enhance, and prepare images for analysis. Think of them as photo editing operations, but with precise mathematical control. Need to resize an image to fit a neural network's input requirements? That is a transformation. Want to rotate a document photo so the text is upright? Transformation. Need to blur an image to reduce noise before detecting edges? Also a transformation! These operations fall into two main categories: geometric transformations (like resizing, rotating, and flipping) that change where pixels are located, and pixel-wise transformations (like blurring and thresholding) that change pixel values. In real-world computer vision pipelines, you will almost always apply several transformations to preprocess your images before the main analysis step. In this section, you will learn the most commonly used transformations, when to apply each one, and practical code examples you can adapt for your own projects.

Resizing Images

Resizing is one of the most common operations in computer vision, used to change an image's dimensions by making it larger or smaller. But here is an interesting challenge: when you shrink an image from 1000x1000 pixels to 100x100, you are discarding 99% of the original pixels! And when you enlarge an image, you need to create new pixels that were not there before. How does the computer decide which pixels to keep or what values to give new pixels? That is where interpolation methods come in. They are mathematical algorithms that calculate new pixel values based on the surrounding pixels. For shrinking images, INTER_AREA gives the best results because it properly averages pixels. For enlarging, INTER_LINEAR (fast and good) or INTER_CUBIC (slower but smoother) work well. Resizing is absolutely essential when working with deep learning models, which typically require fixed input sizes like 224x224 or 416x416 pixels. It is also crucial for managing memory when processing very large images.

# Resize by specifying dimensions
image = cv2.imread('photo.jpg')
height, width = image.shape[:2]

# Resize to specific dimensions
resized = cv2.resize(image, (400, 300))  # (width, height)

# Resize by scale factor
scaled_down = cv2.resize(image, None, fx=0.5, fy=0.5)
scaled_up = cv2.resize(image, None, fx=2.0, fy=2.0)

# Different interpolation methods
shrunk_area = cv2.resize(image, (200, 150), interpolation=cv2.INTER_AREA)
enlarged_linear = cv2.resize(image, (800, 600), interpolation=cv2.INTER_LINEAR)
enlarged_cubic = cv2.resize(image, (800, 600), interpolation=cv2.INTER_CUBIC)

print(f"Original: {image.shape}")
print(f"Resized: {resized.shape}")

Geometric Transformations

Geometric transformations physically move pixels around to change an image's orientation, position, or shape. The simplest transformations are flips: horizontal flip mirrors an image left-to-right (like looking in a mirror), vertical flip flips it upside-down, and flipping both axes rotates it 180 degrees. Rotation is slightly more complex because pixels do not always land exactly on the new grid, requiring interpolation to calculate the new values. OpenCV handles rotation using transformation matrices, which are 2x3 grids of numbers that mathematically describe how to move each pixel. Do not worry if the math sounds intimidating: OpenCV's getRotationMatrix2D function creates the matrix for you automatically! Just specify the center point, rotation angle, and scale factor. One tricky thing about rotation: by default, OpenCV keeps the output image the same size as the input, which means the corners of your rotated image might get cut off. The solution is to calculate a larger output size that fits the entire rotated image, which we demonstrate in the code examples.

# Flip operations
horizontal_flip = cv2.flip(image, 1)   # Flip horizontally
vertical_flip = cv2.flip(image, 0)     # Flip vertically
both_flip = cv2.flip(image, -1)        # Flip both axes

# Rotation
height, width = image.shape[:2]
center = (width // 2, height // 2)

# Get rotation matrix (center, angle, scale)
rotation_matrix = cv2.getRotationMatrix2D(center, 45, 1.0)

# Apply rotation
rotated = cv2.warpAffine(image, rotation_matrix, (width, height))

# Rotation with automatic size adjustment
def rotate_with_padding(image, angle):
    h, w = image.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    
    # Calculate new dimensions
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    new_w = int(h * sin + w * cos)
    new_h = int(h * cos + w * sin)
    
    # Adjust rotation matrix
    M[0, 2] += (new_w - w) / 2
    M[1, 2] += (new_h - h) / 2
    
    return cv2.warpAffine(image, M, (new_w, new_h))

rotated_full = rotate_with_padding(image, 30)

Image Blurring

Blurring might seem counterproductive at first: why would you want to make your images less sharp? The answer is noise reduction and preprocessing. Real-world images from cameras always contain some random noise, tiny random variations in pixel values caused by sensor imperfections, low light, or compression artifacts. This noise can confuse edge detection and other algorithms, causing them to detect "edges" that are just noise. Blurring smooths out these small variations while preserving the larger structures you actually care about. Different blur types serve different purposes: Average blur (also called box blur) treats all neighboring pixels equally and is fast but can create artifacts. Gaussian blur weights center pixels more heavily than edge pixels, creating smoother, more natural-looking results and is the most commonly used blur. Median blur replaces each pixel with the median of its neighbors, making it incredibly effective at removing "salt and pepper" noise (random black and white pixels). Bilateral filter is special because it blurs while preserving edges by only averaging pixels with similar colors, perfect for skin smoothing and artistic effects.

Key Concept

Kernel (Filter)

A small matrix used to apply effects like blurring, sharpening, or edge detection. The kernel slides over the image, computing new pixel values from the weighted sum of neighboring pixels.

# Average (Box) Blur
blur_avg = cv2.blur(image, (5, 5))  # 5x5 kernel

# Gaussian Blur (weighted average, better for natural images)
blur_gaussian = cv2.GaussianBlur(image, (5, 5), 0)

# Median Blur (excellent for salt-and-pepper noise)
blur_median = cv2.medianBlur(image, 5)  # Kernel size must be odd

# Bilateral Filter (preserves edges while blurring)
blur_bilateral = cv2.bilateralFilter(image, 9, 75, 75)

# Compare blur strengths
blur_light = cv2.GaussianBlur(image, (3, 3), 0)
blur_medium = cv2.GaussianBlur(image, (7, 7), 0)
blur_heavy = cv2.GaussianBlur(image, (15, 15), 0)

Thresholding

Thresholding is the process of converting a grayscale image into a binary (pure black and white) image by comparing each pixel to a threshold value. If a pixel's value is above the threshold, it becomes white (255); if below, it becomes black (0). This simple operation is incredibly powerful for separating objects from backgrounds, a critical step before contour detection and shape analysis. For example, if you have a white document on a dark desk, thresholding can create a clean mask showing just the document. The challenge is choosing the right threshold value. Too low and you will include background noise; too high and you will lose parts of your object. Otsu's method helps by automatically calculating the optimal threshold for images with two distinct brightness peaks (bimodal histograms). For images with uneven lighting (like a photo of text with shadows), adaptive thresholding is your best friend: it calculates different thresholds for different regions of the image, handling lighting variations that would defeat simple thresholding. This is the secret behind document scanning apps that make photos of paper look crisp and clean!

# Convert to grayscale first
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Simple threshold
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
_, binary_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

# Otsu's method (automatic threshold selection)
_, otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Adaptive thresholding (handles varying lighting)
adaptive_mean = cv2.adaptiveThreshold(gray, 255, 
                                       cv2.ADAPTIVE_THRESH_MEAN_C,
                                       cv2.THRESH_BINARY, 11, 2)

adaptive_gaussian = cv2.adaptiveThreshold(gray, 255,
                                           cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                           cv2.THRESH_BINARY, 11, 2)
Threshold Type Description Best Use
THRESH_BINARY Above threshold becomes max, else 0 Simple foreground extraction
THRESH_BINARY_INV Inverse of BINARY Dark objects on light background
THRESH_OTSU Automatic threshold selection Bimodal histograms
ADAPTIVE_THRESH_MEAN_C Local mean threshold Uneven lighting
ADAPTIVE_THRESH_GAUSSIAN_C Gaussian-weighted local threshold Documents, text

Practice Questions

Task: Load an image and resize it to 224x224 pixels (common deep learning input size).

# Your code here:
# 1. Load 'photo.jpg'
# 2. Resize to 224x224 using appropriate interpolation
# 3. Print original and new shapes
import cv2

# Load image
image = cv2.imread('photo.jpg')
print(f"Original shape: {image.shape}")

# Resize to 224x224 (common model input size)
resized = cv2.resize(image, (224, 224), interpolation=cv2.INTER_LINEAR)
print(f"Resized shape: {resized.shape}")

# Display
cv2.imshow('Resized 224x224', resized)
cv2.waitKey(0)

Task: Apply Gaussian, Median, and Bilateral blur to an image and display all results side by side.

# Your code here:
# 1. Load image
# 2. Apply GaussianBlur (7,7), medianBlur(7), bilateralFilter(9,75,75)
# 3. Display original and all 3 blurred versions using matplotlib
import cv2
import matplotlib.pyplot as plt

# Load image
image = cv2.imread('photo.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Apply different blurs
gaussian = cv2.GaussianBlur(image, (7, 7), 0)
median = cv2.medianBlur(image, 7)
bilateral = cv2.bilateralFilter(image, 9, 75, 75)

# Convert all to RGB for display
gaussian_rgb = cv2.cvtColor(gaussian, cv2.COLOR_BGR2RGB)
median_rgb = cv2.cvtColor(median, cv2.COLOR_BGR2RGB)
bilateral_rgb = cv2.cvtColor(bilateral, cv2.COLOR_BGR2RGB)

# Display comparison
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes[0, 0].imshow(image_rgb)
axes[0, 0].set_title('Original')
axes[0, 1].imshow(gaussian_rgb)
axes[0, 1].set_title('Gaussian Blur')
axes[1, 0].imshow(median_rgb)
axes[1, 0].set_title('Median Blur')
axes[1, 1].imshow(bilateral_rgb)
axes[1, 1].set_title('Bilateral Filter')

for ax in axes.flat:
    ax.axis('off')
plt.tight_layout()
plt.show()

Task: Create a document scanner effect: convert to grayscale, apply adaptive threshold, and clean up with morphological operations.

# Your code here:
# 1. Load document image and convert to grayscale
# 2. Apply Gaussian blur to reduce noise
# 3. Use adaptive Gaussian thresholding
# 4. Apply morphological close operation to clean up
import cv2
import numpy as np

# Load and convert to grayscale
image = cv2.imread('document.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply Gaussian blur to reduce noise
blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Adaptive thresholding
binary = cv2.adaptiveThreshold(blurred, 255,
                                cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                cv2.THRESH_BINARY, 11, 2)

# Morphological operations to clean up
kernel = np.ones((2, 2), np.uint8)
cleaned = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)

# Display results
cv2.imshow('Original', image)
cv2.imshow('Scanned Document', cleaned)
cv2.waitKey(0)
05

Edge Detection and Contours

Edge detection is one of the most fundamental and powerful techniques in computer vision, used to find the boundaries of objects in images. Think about how you recognize objects in the real world: you see the outline of a coffee cup against the table, the edge of a car against the road, or the boundary between sky and mountains. Your brain is essentially performing edge detection! In digital images, edges occur where there are sudden changes in pixel brightness, like where a dark object meets a light background. By detecting these edges, we can find object boundaries, identify shapes, and simplify complex scenes into clean line drawings. Once we have edges, we can take it a step further with contour detection, which groups connected edge pixels into closed boundaries that represent complete object outlines. These contours can then be analyzed to measure object sizes, count items, recognize shapes, and much more. In this section, you will learn the famous Canny edge detector (used in countless real-world applications) and how to find and analyze contours like a pro.

Understanding Edges

To understand edges mathematically, imagine walking across an image from left to right while measuring pixel brightness. In a smooth, uniform region (like a clear blue sky), the brightness stays roughly constant, meaning there is very little change from one pixel to the next. But when you cross from one object to another (like from the sky to a building), the brightness suddenly jumps up or down, creating a sharp transition. These sudden jumps are what we call edges! Edge detection algorithms work by calculating the gradient (rate of change) of pixel intensity at every location in the image. Where the gradient is high, there is an edge. Where it is low, you are in a uniform region. The Sobel operator, which you will learn about shortly, calculates gradients in both the horizontal and vertical directions, allowing us to detect edges running in any direction. The challenge is distinguishing real edges (object boundaries) from noise (random pixel variations), which is why preprocessing with blur and careful threshold selection is so important.

Key Concept

Image Gradient

The directional change in intensity at each pixel. High gradient magnitude indicates an edge. Computed using derivative operators like Sobel, which measure intensity changes in x and y directions.

Sobel Edge Detection

The Sobel operator is one of the foundational edge detection methods, and understanding it helps you grasp how all edge detection works. Sobel uses small 3x3 kernels (filters) that slide across your image, performing convolution operations at each position. One kernel detects horizontal edges (vertical changes in brightness), and another detects vertical edges (horizontal changes). By applying both kernels, you get gradient values in the x and y directions at every pixel. Where both gradients are low, you are in a smooth region. Where one or both are high, you have found an edge! The magnitude (strength) of the edge is calculated as the square root of (gradient_x squared + gradient_y squared), which gives the overall edge strength regardless of direction. The Sobel operator is particularly useful when you need the raw gradient information, not just binary edges. For example, gradient magnitude and direction are inputs to more sophisticated algorithms like Canny, and are also used in feature descriptors like HOG (Histogram of Oriented Gradients) for object detection.

# Load and convert to grayscale
image = cv2.imread('photo.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Sobel gradients
sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)  # Horizontal edges
sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)  # Vertical edges

# Convert to absolute values for display
sobel_x_abs = cv2.convertScaleAbs(sobel_x)
sobel_y_abs = cv2.convertScaleAbs(sobel_y)

# Combine gradients
sobel_combined = cv2.addWeighted(sobel_x_abs, 0.5, sobel_y_abs, 0.5, 0)

# Calculate gradient magnitude
magnitude = np.sqrt(sobel_x**2 + sobel_y**2)
magnitude = np.uint8(np.clip(magnitude, 0, 255))

Canny Edge Detection

The Canny edge detector, developed by John Canny in 1986, is considered the gold standard for edge detection and is used in countless real-world applications. What makes Canny special is its multi-stage approach that produces clean, thin, well-connected edges with minimal noise. First, it applies Gaussian blur to reduce noise (though you should still apply your own blur beforehand for best results). Second, it calculates gradients using Sobel operators. Third comes the magic: non-maximum suppression, which thins thick edges down to single-pixel-wide lines by keeping only the pixels with the strongest gradients in each local region. Finally, hysteresis thresholding uses two thresholds (low and high) to classify edges: pixels above the high threshold are definitely edges, pixels below the low threshold are definitely not edges, and pixels in between are kept only if they connect to strong edges. This clever approach ensures continuous edge lines without gaps. The two threshold parameters give you control over sensitivity: lower thresholds detect more edges (including some noise), higher thresholds detect only the strongest edges.

# Prepare image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Canny edge detection
# threshold1: Lower threshold for edge linking
# threshold2: Upper threshold for strong edges
edges = cv2.Canny(blurred, 50, 150)

# Experiment with different thresholds
edges_sensitive = cv2.Canny(blurred, 30, 100)   # More edges, more noise
edges_strict = cv2.Canny(blurred, 100, 200)     # Fewer, stronger edges

# Auto Canny using median
median_val = np.median(gray)
lower = int(max(0, 0.7 * median_val))
upper = int(min(255, 1.3 * median_val))
edges_auto = cv2.Canny(blurred, lower, upper)

Finding Contours

While edge detection finds individual edge pixels, contour detection takes the next step by grouping connected edge pixels into complete object boundaries. Think of contours as the outlines you would draw if you traced around objects with a pen. To find contours, you first need a binary image where your objects are white and the background is black (created using thresholding or edge detection). Then cv2.findContours() traces along the boundaries of white regions, returning a list of contours where each contour is an array of (x, y) coordinate points forming the boundary. The retrieval mode parameter controls which contours to find: RETR_EXTERNAL finds only the outermost boundaries (ignoring any holes or nested shapes), RETR_LIST finds all contours without any hierarchy information, and RETR_TREE finds all contours with full parent-child relationships (useful when objects contain other objects). Once you have contours, you can draw them on images for visualization, calculate their properties, or use them for object counting and measurement.

# Prepare binary image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

# Find contours
contours, hierarchy = cv2.findContours(binary, cv2.RETR_EXTERNAL, 
                                        cv2.CHAIN_APPROX_SIMPLE)

print(f"Found {len(contours)} contours")

# Draw all contours on a copy
output = image.copy()
cv2.drawContours(output, contours, -1, (0, 255, 0), 2)  # -1 draws all

# Draw individual contours
for i, contour in enumerate(contours):
    cv2.drawContours(output, [contour], 0, (0, 255, 0), 2)

# Retrieval modes:
# RETR_EXTERNAL: Only outermost contours
# RETR_LIST: All contours, no hierarchy
# RETR_TREE: Full hierarchy of contours

Contour Analysis

Once you have detected contours, the real fun begins: analyzing them to extract useful information about the objects in your image! OpenCV provides a rich set of functions for measuring contour properties. cv2.contourArea() calculates the area enclosed by a contour in pixels, which is great for filtering out noise (small contours) or finding the largest object. cv2.arcLength() measures the perimeter (total boundary length), and the ratio of area to perimeter can help identify shape types. cv2.boundingRect() returns the smallest upright rectangle that contains the contour, giving you a simple bounding box for object localization. cv2.minEnclosingCircle() finds the smallest circle that contains the contour, useful for detecting circular objects. cv2.approxPolyDP() simplifies a contour to fewer points while preserving its essential shape, turning a complex 1000-point contour into a simple triangle or rectangle with 3-4 points. Finally, cv2.moments() calculates statistical moments that let you find the centroid (center of mass) and orientation of shapes. These tools together enable powerful applications like counting objects, measuring sizes, recognizing shapes, and much more!

# Analyze each contour
for contour in contours:
    # Area and perimeter
    area = cv2.contourArea(contour)
    perimeter = cv2.arcLength(contour, closed=True)
    
    # Skip small contours (noise)
    if area < 100:
        continue
    
    # Bounding rectangle
    x, y, w, h = cv2.boundingRect(contour)
    cv2.rectangle(output, (x, y), (x+w, y+h), (255, 0, 0), 2)
    
    # Minimum enclosing circle
    (cx, cy), radius = cv2.minEnclosingCircle(contour)
    cv2.circle(output, (int(cx), int(cy)), int(radius), (0, 0, 255), 2)
    
    # Contour approximation (simplify shape)
    epsilon = 0.02 * perimeter
    approx = cv2.approxPolyDP(contour, epsilon, True)
    
    # Centroid using moments
    M = cv2.moments(contour)
    if M['m00'] > 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
        cv2.circle(output, (cx, cy), 5, (0, 255, 255), -1)
Function Returns Use Case
cv2.contourArea() Area in pixels Size filtering, object counting
cv2.arcLength() Perimeter length Shape complexity measure
cv2.boundingRect() x, y, width, height Object localization
cv2.minEnclosingCircle() Center, radius Circular object detection
cv2.approxPolyDP() Simplified contour Shape recognition
cv2.moments() Shape moments Centroid, orientation

Practice Questions

Task: Load an image, apply Gaussian blur, then detect edges using Canny.

# Your code here:
# 1. Load 'photo.jpg' and convert to grayscale
# 2. Apply GaussianBlur (5,5)
# 3. Apply Canny with thresholds 50, 150
# 4. Display original and edges side by side
import cv2
import matplotlib.pyplot as plt

# Load and prepare
image = cv2.imread('photo.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Canny edge detection
edges = cv2.Canny(blurred, 50, 150)

# Display
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
axes[0].set_title('Original')
axes[1].imshow(edges, cmap='gray')
axes[1].set_title('Canny Edges')

for ax in axes:
    ax.axis('off')
plt.tight_layout()
plt.show()

Task: Find and count all objects in an image using contour detection.

# Your code here:
# 1. Load image and convert to grayscale
# 2. Apply threshold (Otsu's method)
# 3. Find external contours
# 4. Filter contours by area (> 500 pixels)
# 5. Draw bounding boxes and count
import cv2

# Load and threshold
image = cv2.imread('objects.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Find contours
contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, 
                                cv2.CHAIN_APPROX_SIMPLE)

# Filter and draw
output = image.copy()
count = 0
for contour in contours:
    area = cv2.contourArea(contour)
    if area > 500:  # Filter small contours
        count += 1
        x, y, w, h = cv2.boundingRect(contour)
        cv2.rectangle(output, (x, y), (x+w, y+h), (0, 255, 0), 2)
        cv2.putText(output, str(count), (x, y-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

print(f"Found {count} objects")
cv2.imshow('Detected Objects', output)
cv2.waitKey(0)

Task: Detect contours and classify shapes as triangles, rectangles, or circles based on vertex count.

# Your code here:
# 1. Load image with shapes, threshold it
# 2. Find contours
# 3. For each contour, use approxPolyDP to get vertices
# 4. Classify: 3 vertices = triangle, 4 = rectangle, > 8 = circle
# 5. Label each shape on the image
import cv2

# Load and prepare
image = cv2.imread('shapes.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

# Find contours
contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, 
                                cv2.CHAIN_APPROX_SIMPLE)

output = image.copy()

for contour in contours:
    # Skip small contours
    if cv2.contourArea(contour) < 100:
        continue
    
    # Approximate polygon
    perimeter = cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, 0.04 * perimeter, True)
    vertices = len(approx)
    
    # Classify shape
    if vertices == 3:
        shape = "Triangle"
        color = (0, 255, 0)
    elif vertices == 4:
        shape = "Rectangle"
        color = (255, 0, 0)
    elif vertices > 8:
        shape = "Circle"
        color = (0, 0, 255)
    else:
        shape = "Polygon"
        color = (255, 255, 0)
    
    # Draw and label
    cv2.drawContours(output, [contour], 0, color, 2)
    M = cv2.moments(contour)
    if M['m00'] > 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
        cv2.putText(output, shape, (cx-30, cy),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

cv2.imshow('Shape Classification', output)
cv2.waitKey(0)

Key Takeaways

Images as Arrays

Digital images are NumPy arrays where each element represents pixel intensity. Grayscale images are 2D arrays (height x width), while color images are 3D arrays (height x width x channels)

Color Space Selection

Choose color spaces based on your task: RGB for display, BGR for OpenCV, HSV for color-based segmentation, and Grayscale for edge detection and shape analysis

Geometric Transformations

Resize, rotate, flip, and crop images using OpenCV functions. Use interpolation methods like INTER_LINEAR for smooth scaling and INTER_NEAREST for preserving sharp edges

Image Filtering

Apply kernels for blurring (Gaussian, median), sharpening, and noise reduction. Kernel size affects the strength of the filter effect

Edge Detection

Use Canny edge detection with appropriate thresholds to find object boundaries. Sobel and Laplacian detect gradients in specific directions

Contour Analysis

Extract and analyze contours to identify shapes, calculate areas, find bounding boxes, and detect objects in preprocessed binary images

Knowledge Check

Quick Quiz

Test your understanding of Computer Vision fundamentals

1 How does OpenCV store color images by default?
2 What is the shape of a 640x480 RGB image as a NumPy array?
3 Which color space is best for color-based object segmentation?
4 What does Gaussian blur primarily help with in image preprocessing?
5 Which edge detection algorithm uses two thresholds for hysteresis?
6 What function is used to find contours in OpenCV?
Answer all questions to check your score