Capstone Project 4

Object Detection

Build a production-ready object detection system using the YOLO (You Only Look Once) architecture. You will implement transfer learning with pre-trained models, process images and video streams in real-time, and deploy a complete detection pipeline capable of identifying multiple object classes.

14-18 hours
Advanced
700 Points
What You Will Build
  • YOLO model implementation
  • Transfer learning pipeline
  • Custom dataset training
  • Real-time video detection
  • Performance optimization
Contents
01

Project Overview

Object detection is one of the most impactful computer vision applications, powering autonomous vehicles, security systems, retail analytics, and industrial automation. In this project, you will work with YOLOv8, the latest iteration of the revolutionary YOLO architecture, to build a real-time object detection system. You will use the COCO dataset for pre-training and fine-tune on a custom dataset. Target performance: mAP over 0.5 on your custom classes with over 30 FPS inference speed.

Skills Applied: This project tests your proficiency in deep learning architectures, transfer learning, data annotation, model optimization, and real-time video processing with OpenCV.
YOLO Architecture

Understand single-shot detection and anchor-free design

Transfer Learning

Fine-tune pre-trained models on custom datasets

Video Processing

Real-time detection on webcam and video files

Optimization

Achieve high FPS with model optimization techniques

Learning Objectives

Technical Skills
  • Implement YOLOv8 using Ultralytics library
  • Prepare and annotate custom datasets
  • Fine-tune models with transfer learning
  • Process video streams with OpenCV
  • Evaluate models with mAP, precision, recall
Computer Vision Concepts
  • Understand bounding box regression
  • Learn non-maximum suppression (NMS)
  • Explore anchor-free detection methods
  • Handle multi-scale object detection
  • Optimize for real-time performance
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

SecureVision AI

You have been hired as a Computer Vision Engineer at SecureVision AI, a startup developing intelligent surveillance systems for retail stores and warehouses. The company needs a real-time object detection system that can identify products, people, and potential security threats from CCTV footage. Your task is to build the core detection engine.

"We need a detection system that runs at 30+ FPS on standard hardware, can detect at least 10 custom object classes, and provides accurate bounding boxes. The system should work with both recorded video and live camera feeds. Can you build this?"

Alex Chen, CTO, SecureVision AI

Technical Challenges to Solve

Model Selection
  • Which YOLO variant to use (YOLOv8n, s, m, l, x)?
  • Trade-offs between speed and accuracy
  • Memory constraints on edge devices
  • Pre-trained weights selection
Data Preparation
  • How to annotate images efficiently?
  • YOLO annotation format requirements
  • Data augmentation strategies
  • Handling class imbalance
Real-time Processing
  • Video capture and frame processing
  • Drawing bounding boxes efficiently
  • Handling different video resolutions
  • Frame rate optimization
Evaluation
  • Calculating mAP correctly
  • Understanding IoU thresholds
  • Precision-recall curves
  • Confusion matrix for detection
Pro Tip: Start with YOLOv8n (nano) for fast iteration during development, then scale up to larger models for final training. Always validate on a held-out test set.
03

The Dataset

You will work with the COCO128 dataset for initial training and can use additional datasets for custom object classes. Download from Kaggle or use the Ultralytics built-in datasets.

Primary Dataset: COCO128

COCO128 is a small tutorial dataset composed of the first 128 images from the COCO train2017 dataset. It contains 80 object classes and is perfect for testing and prototyping object detection models.

Dataset Info: 128 images | 80 object classes | YOLO format annotations | ~7MB | Includes train/val split
Additional Datasets for Custom Training

Use these datasets to train on specific object classes:

Vehicles

Cars, trucks, buses, motorcycles

Kaggle
Safety Equipment

Helmets, vests, PPE detection

Kaggle
Face Detection

Face bounding boxes

Kaggle
Note: For custom training, you need at least 100-500 images per class. Use tools like LabelImg, Roboflow, or CVAT for annotation.
04

Project Requirements

Your project must include all of the following components. This is a comprehensive computer vision project covering YOLO implementation, transfer learning, and real-time video processing.

1
Data Preparation

Prepare and explore the dataset:

  • Download COCO128 or chosen dataset
  • Visualize sample images with annotations
  • Analyze class distribution
  • Create train/val/test splits (70/20/10)
  • Create data.yaml configuration file
Deliverable: Data exploration notebook with visualizations of annotated images and class distribution charts.
2
YOLO Model Setup

Set up YOLOv8 with Ultralytics:

  • Install Ultralytics library
  • Load pre-trained YOLOv8 model
  • Run inference on sample images
  • Understand model outputs (boxes, scores, classes)
  • Visualize detections with bounding boxes
Deliverable: Notebook demonstrating pre-trained model inference with visualizations.
3
Transfer Learning

Fine-tune on custom dataset:

  • Configure training hyperparameters
  • Set up data augmentation
  • Train model with transfer learning
  • Monitor training with TensorBoard or Weights and Biases
  • Save best model checkpoint

Target Performance: mAP50 over 0.5

Deliverable: Training notebook with loss curves, mAP progression, and final metrics.
4
Model Evaluation

Comprehensive evaluation on test set:

  • Calculate mAP50 and mAP50-95
  • Generate precision-recall curves
  • Create confusion matrix
  • Analyze per-class performance
  • Identify failure cases and edge cases
Deliverable: Evaluation notebook with metrics, curves, and error analysis.
5
Real-time Video Detection

Build real-time detection pipeline:

  • Set up OpenCV video capture
  • Process frames with YOLO model
  • Draw bounding boxes and labels
  • Display FPS counter
  • Support webcam and video file input
  • Optionally save processed video

Target Performance: Over 30 FPS on GPU, over 10 FPS on CPU

Deliverable: Python script for real-time detection with demo video recording.
6
Optimization and Deployment

Optimize for production:

  • Export model to ONNX format
  • Compare inference speeds (PyTorch vs ONNX)
  • Document hardware requirements
  • Create inference script for deployment
Deliverable: Exported model files and deployment documentation.
05

YOLO Architecture

YOLO (You Only Look Once) is a single-shot detector that processes the entire image in one forward pass, making it extremely fast for real-time applications.

YOLOv8 Model Variants
ModelSize (MB)mAP50-95Speed (ms)Use Case
yolov8n6.337.31.2Edge devices, real-time
yolov8s22.444.91.9Balanced speed/accuracy
yolov8m52.050.24.0General purpose
yolov8l87.752.96.5High accuracy needed
yolov8x136.753.910.8Maximum accuracy
06

Transfer Learning

Transfer learning allows you to leverage pre-trained weights and fine-tune on your custom dataset with fewer images and less training time.

07

Real-time Video Detection

Build a complete video detection pipeline that processes webcam or video file input in real-time.

Performance Tips:
  • Use smaller model (yolov8n) for higher FPS
  • Reduce input resolution for faster processing
  • Use GPU acceleration with CUDA
  • Enable half-precision (FP16) inference
08

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
object-detection-yolo
github.com/<your-username>/object-detection-yolo
Required Project Structure
object-detection-yolo/
├── notebooks/
│   ├── 01_data_exploration.ipynb     # Dataset exploration
│   ├── 02_model_training.ipynb       # Transfer learning
│   └── 03_evaluation.ipynb           # Model evaluation
├── src/
│   ├── detect_image.py               # Image detection script
│   ├── detect_video.py               # Video detection script
│   └── utils.py                      # Helper functions
├── models/
│   └── best.pt                       # Trained model weights
├── data/
│   └── data.yaml                     # Dataset configuration
├── reports/
│   ├── confusion_matrix.png          # Confusion matrix
│   ├── precision_recall.png          # PR curve
│   └── sample_detections.png         # Sample detection results
├── videos/
│   └── demo.mp4                      # Demo video with detections
├── requirements.txt                  # Python dependencies
└── README.md                         # Project documentation
README.md Required Sections
1. Project Header
  • Project title and description
  • Your full name and submission date
  • Final mAP score achieved
2. Dataset
  • Dataset used and source
  • Number of classes and images
  • Sample images with annotations
3. Model Training
  • YOLO variant used
  • Training hyperparameters
  • Training curves (loss, mAP)
4. Results
  • mAP50 and mAP50-95 scores
  • Per-class performance
  • Confusion matrix
5. Real-time Demo
  • FPS achieved
  • Demo video or GIF
  • Hardware specifications
6. How to Run
  • Installation instructions
  • Training command
  • Inference command
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

09

Grading Rubric

Your project will be graded on the following criteria. Total: 700 points.

Criteria Points Description
Data Preparation 75 Dataset setup, visualization, proper splits
YOLO Implementation 100 Model setup, inference, understanding outputs
Transfer Learning 150 Training pipeline, achieving mAP over 0.5
Model Evaluation 100 Metrics, curves, confusion matrix, error analysis
Real-time Detection 150 Video pipeline, 30+ FPS, clean visualization
Documentation 100 README quality, code comments, demo video
Bonus: ONNX Export 25 Model export and inference comparison
Total 700
Grading Levels
Excellent
630-700

mAP over 0.6, 40+ FPS, exceptional docs

Good
525-629

Meets all requirements, good demo

Satisfactory
420-524

Meets minimum requirements

Needs Work
< 420

Missing components or poor performance

Ready to Submit?

Make sure your model is trained and demo video is recorded.

Submit Your Project
10

Pre-Submission Checklist

Use this checklist to verify you have completed all requirements before submitting.

Data and Model
Evaluation
Real-time Detection
Documentation
Final Check: Run all notebooks and scripts from scratch to ensure reproducibility. Test on a different machine if possible.