Project 4: Object Detection | AI Course

Project Overview

Object detection is one of the most impactful computer vision applications, powering autonomous vehicles, security systems, retail analytics, and industrial automation. In this project, you will work with YOLOv8, the latest iteration of the revolutionary YOLO architecture, to build a real-time object detection system. You will use the COCO dataset for pre-training and fine-tune on a custom dataset. Target performance: mAP over 0.5 on your custom classes with over 30 FPS inference speed.

Skills Applied: This project tests your proficiency in deep learning architectures, transfer learning, data annotation, model optimization, and real-time video processing with OpenCV.

YOLO Architecture

Understand single-shot detection and anchor-free design

Transfer Learning

Fine-tune pre-trained models on custom datasets

Video Processing

Real-time detection on webcam and video files

Optimization

Achieve high FPS with model optimization techniques

Learning Objectives

Technical Skills

Implement YOLOv8 using Ultralytics library
Prepare and annotate custom datasets
Fine-tune models with transfer learning
Process video streams with OpenCV
Evaluate models with mAP, precision, recall

Computer Vision Concepts

Understand bounding box regression
Learn non-maximum suppression (NMS)
Explore anchor-free detection methods
Handle multi-scale object detection
Optimize for real-time performance

Ready to submit? Already completed the project? Submit your work now!

Submit Now

Business Scenario

SecureVision AI

You have been hired as a Computer Vision Engineer at SecureVision AI, a startup developing intelligent surveillance systems for retail stores and warehouses. The company needs a real-time object detection system that can identify products, people, and potential security threats from CCTV footage. Your task is to build the core detection engine.

"We need a detection system that runs at 30+ FPS on standard hardware, can detect at least 10 custom object classes, and provides accurate bounding boxes. The system should work with both recorded video and live camera feeds. Can you build this?"

Alex Chen, CTO, SecureVision AI

Technical Challenges to Solve

Model Selection

Which YOLO variant to use (YOLOv8n, s, m, l, x)?
Trade-offs between speed and accuracy
Memory constraints on edge devices
Pre-trained weights selection

Data Preparation

How to annotate images efficiently?
YOLO annotation format requirements
Data augmentation strategies
Handling class imbalance

Real-time Processing

Video capture and frame processing
Drawing bounding boxes efficiently
Handling different video resolutions
Frame rate optimization

Evaluation

Calculating mAP correctly
Understanding IoU thresholds
Precision-recall curves
Confusion matrix for detection

Pro Tip: Start with YOLOv8n (nano) for fast iteration during development, then scale up to larger models for final training. Always validate on a held-out test set.

The Dataset

You will work with the COCO128 dataset for initial training and can use additional datasets for custom object classes. Download from Kaggle or use the Ultralytics built-in datasets.

Primary Dataset: COCO128

COCO128 is a small tutorial dataset composed of the first 128 images from the COCO train2017 dataset. It contains 80 object classes and is perfect for testing and prototyping object detection models.

COCO128 on Kaggle Full COCO 2017

Dataset Info: 128 images | 80 object classes | YOLO format annotations | ~7MB | Includes train/val split

Additional Datasets for Custom Training

Use these datasets to train on specific object classes:

Vehicles

Cars, trucks, buses, motorcycles

Kaggle

Safety Equipment

Helmets, vests, PPE detection

Kaggle

Face Detection

Face bounding boxes

Kaggle

Note: For custom training, you need at least 100-500 images per class. Use tools like LabelImg, Roboflow, or CVAT for annotation.

Project Requirements

Your project must include all of the following components. This is a comprehensive computer vision project covering YOLO implementation, transfer learning, and real-time video processing.

Data Preparation

Prepare and explore the dataset:

Download COCO128 or chosen dataset
Visualize sample images with annotations
Analyze class distribution
Create train/val/test splits (70/20/10)
Create data.yaml configuration file

Deliverable: Data exploration notebook with visualizations of annotated images and class distribution charts.

YOLO Model Setup

Set up YOLOv8 with Ultralytics:

Install Ultralytics library
Load pre-trained YOLOv8 model
Run inference on sample images
Understand model outputs (boxes, scores, classes)
Visualize detections with bounding boxes

Deliverable: Notebook demonstrating pre-trained model inference with visualizations.

Transfer Learning

Fine-tune on custom dataset:

Configure training hyperparameters
Set up data augmentation
Train model with transfer learning
Monitor training with TensorBoard or Weights and Biases
Save best model checkpoint

Target Performance: mAP50 over 0.5

Deliverable: Training notebook with loss curves, mAP progression, and final metrics.

Model Evaluation

Comprehensive evaluation on test set:

Calculate mAP50 and mAP50-95
Generate precision-recall curves
Create confusion matrix
Analyze per-class performance
Identify failure cases and edge cases

Deliverable: Evaluation notebook with metrics, curves, and error analysis.

Real-time Video Detection

Build real-time detection pipeline:

Set up OpenCV video capture
Process frames with YOLO model
Draw bounding boxes and labels
Display FPS counter
Support webcam and video file input
Optionally save processed video

Target Performance: Over 30 FPS on GPU, over 10 FPS on CPU

Deliverable: Python script for real-time detection with demo video recording.

Optimization and Deployment

Optimize for production:

Export model to ONNX format
Compare inference speeds (PyTorch vs ONNX)
Document hardware requirements
Create inference script for deployment

Deliverable: Exported model files and deployment documentation.

YOLO Architecture

YOLO (You Only Look Once) is a single-shot detector that processes the entire image in one forward pass, making it extremely fast for real-time applications.

YOLOv8 Model Variants

Model	Size (MB)	mAP50-95	Speed (ms)	Use Case
`yolov8n`	6.3	37.3	1.2	Edge devices, real-time
`yolov8s`	22.4	44.9	1.9	Balanced speed/accuracy
`yolov8m`	52.0	50.2	4.0	General purpose
`yolov8l`	87.7	52.9	6.5	High accuracy needed
`yolov8x`	136.7	53.9	10.8	Maximum accuracy

Transfer Learning

Transfer learning allows you to leverage pre-trained weights and fine-tune on your custom dataset with fewer images and less training time.

Real-time Video Detection

Build a complete video detection pipeline that processes webcam or video file input in real-time.

Performance Tips:

Use smaller model (yolov8n) for higher FPS
Reduce input resolution for faster processing
Use GPU acceleration with CUDA
Enable half-precision (FP16) inference

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name

object-detection-yolo

github.com/<your-username>/object-detection-yolo

Required Project Structure

object-detection-yolo/
├── notebooks/
│   ├── 01_data_exploration.ipynb     # Dataset exploration
│   ├── 02_model_training.ipynb       # Transfer learning
│   └── 03_evaluation.ipynb           # Model evaluation
├── src/
│   ├── detect_image.py               # Image detection script
│   ├── detect_video.py               # Video detection script
│   └── utils.py                      # Helper functions
├── models/
│   └── best.pt                       # Trained model weights
├── data/
│   └── data.yaml                     # Dataset configuration
├── reports/
│   ├── confusion_matrix.png          # Confusion matrix
│   ├── precision_recall.png          # PR curve
│   └── sample_detections.png         # Sample detection results
├── videos/
│   └── demo.mp4                      # Demo video with detections
├── requirements.txt                  # Python dependencies
└── README.md                         # Project documentation

README.md Required Sections

1. Project Header

Project title and description
Your full name and submission date
Final mAP score achieved

2. Dataset

Dataset used and source
Number of classes and images
Sample images with annotations

3. Model Training

YOLO variant used
Training hyperparameters
Training curves (loss, mAP)

4. Results

mAP50 and mAP50-95 scores
Per-class performance
Confusion matrix

5. Real-time Demo

FPS achieved
Demo video or GIF
Hardware specifications

6. How to Run

Installation instructions
Training command
Inference command

Submit Your Project

Enter your GitHub username - we will verify your repository automatically

Grading Rubric

Your project will be graded on the following criteria. Total: 700 points.

Criteria	Points	Description
Data Preparation	75	Dataset setup, visualization, proper splits
YOLO Implementation	100	Model setup, inference, understanding outputs
Transfer Learning	150	Training pipeline, achieving mAP over 0.5
Model Evaluation	100	Metrics, curves, confusion matrix, error analysis
Real-time Detection	150	Video pipeline, 30+ FPS, clean visualization
Documentation	100	README quality, code comments, demo video
Bonus: ONNX Export	25	Model export and inference comparison
Total	700

Grading Levels

Excellent

630-700

mAP over 0.6, 40+ FPS, exceptional docs

Good

525-629

Meets all requirements, good demo

Satisfactory

420-524

Meets minimum requirements

Needs Work

< 420

Missing components or poor performance

Ready to Submit?

Make sure your model is trained and demo video is recorded.

Submit Your Project

Object Detection

What You Will Build

Contents

Project Overview

YOLO Architecture

Transfer Learning

Video Processing

Optimization

Learning Objectives

Technical Skills

Computer Vision Concepts

Business Scenario

SecureVision AI

Technical Challenges to Solve

The Dataset

Primary Dataset: COCO128

Additional Datasets for Custom Training

Vehicles

Safety Equipment

Face Detection

Project Requirements

Data Preparation

YOLO Model Setup

Transfer Learning

Model Evaluation

Real-time Video Detection

Optimization and Deployment

YOLO Architecture

YOLOv8 Model Variants

Transfer Learning

Real-time Video Detection

Submission Requirements

Required Repository Name

Required Project Structure

README.md Required Sections

1. Project Header

2. Dataset

3. Model Training

4. Results

5. Real-time Demo

6. How to Run

Grading Rubric

Grading Levels

Excellent

Good

Satisfactory

Needs Work

Ready to Submit?

Pre-Submission Checklist

Data and Model

Evaluation

Real-time Detection

Documentation