Project Overview
Object detection is one of the most impactful computer vision applications, powering autonomous vehicles, security systems, retail analytics, and industrial automation. In this project, you will work with YOLOv8, the latest iteration of the revolutionary YOLO architecture, to build a real-time object detection system. You will use the COCO dataset for pre-training and fine-tune on a custom dataset. Target performance: mAP over 0.5 on your custom classes with over 30 FPS inference speed.
YOLO Architecture
Understand single-shot detection and anchor-free design
Transfer Learning
Fine-tune pre-trained models on custom datasets
Video Processing
Real-time detection on webcam and video files
Optimization
Achieve high FPS with model optimization techniques
Learning Objectives
Technical Skills
- Implement YOLOv8 using Ultralytics library
- Prepare and annotate custom datasets
- Fine-tune models with transfer learning
- Process video streams with OpenCV
- Evaluate models with mAP, precision, recall
Computer Vision Concepts
- Understand bounding box regression
- Learn non-maximum suppression (NMS)
- Explore anchor-free detection methods
- Handle multi-scale object detection
- Optimize for real-time performance
Business Scenario
SecureVision AI
You have been hired as a Computer Vision Engineer at SecureVision AI, a startup developing intelligent surveillance systems for retail stores and warehouses. The company needs a real-time object detection system that can identify products, people, and potential security threats from CCTV footage. Your task is to build the core detection engine.
"We need a detection system that runs at 30+ FPS on standard hardware, can detect at least 10 custom object classes, and provides accurate bounding boxes. The system should work with both recorded video and live camera feeds. Can you build this?"
Technical Challenges to Solve
- Which YOLO variant to use (YOLOv8n, s, m, l, x)?
- Trade-offs between speed and accuracy
- Memory constraints on edge devices
- Pre-trained weights selection
- How to annotate images efficiently?
- YOLO annotation format requirements
- Data augmentation strategies
- Handling class imbalance
- Video capture and frame processing
- Drawing bounding boxes efficiently
- Handling different video resolutions
- Frame rate optimization
- Calculating mAP correctly
- Understanding IoU thresholds
- Precision-recall curves
- Confusion matrix for detection
The Dataset
You will work with the COCO128 dataset for initial training and can use additional datasets for custom object classes. Download from Kaggle or use the Ultralytics built-in datasets.
Primary Dataset: COCO128
COCO128 is a small tutorial dataset composed of the first 128 images from the COCO train2017 dataset. It contains 80 object classes and is perfect for testing and prototyping object detection models.
Project Requirements
Your project must include all of the following components. This is a comprehensive computer vision project covering YOLO implementation, transfer learning, and real-time video processing.
Data Preparation
Prepare and explore the dataset:
- Download COCO128 or chosen dataset
- Visualize sample images with annotations
- Analyze class distribution
- Create train/val/test splits (70/20/10)
- Create data.yaml configuration file
YOLO Model Setup
Set up YOLOv8 with Ultralytics:
- Install Ultralytics library
- Load pre-trained YOLOv8 model
- Run inference on sample images
- Understand model outputs (boxes, scores, classes)
- Visualize detections with bounding boxes
Transfer Learning
Fine-tune on custom dataset:
- Configure training hyperparameters
- Set up data augmentation
- Train model with transfer learning
- Monitor training with TensorBoard or Weights and Biases
- Save best model checkpoint
Target Performance: mAP50 over 0.5
Model Evaluation
Comprehensive evaluation on test set:
- Calculate mAP50 and mAP50-95
- Generate precision-recall curves
- Create confusion matrix
- Analyze per-class performance
- Identify failure cases and edge cases
Real-time Video Detection
Build real-time detection pipeline:
- Set up OpenCV video capture
- Process frames with YOLO model
- Draw bounding boxes and labels
- Display FPS counter
- Support webcam and video file input
- Optionally save processed video
Target Performance: Over 30 FPS on GPU, over 10 FPS on CPU
Optimization and Deployment
Optimize for production:
- Export model to ONNX format
- Compare inference speeds (PyTorch vs ONNX)
- Document hardware requirements
- Create inference script for deployment
YOLO Architecture
YOLO (You Only Look Once) is a single-shot detector that processes the entire image in one forward pass, making it extremely fast for real-time applications.
YOLOv8 Model Variants
| Model | Size (MB) | mAP50-95 | Speed (ms) | Use Case |
|---|---|---|---|---|
yolov8n | 6.3 | 37.3 | 1.2 | Edge devices, real-time |
yolov8s | 22.4 | 44.9 | 1.9 | Balanced speed/accuracy |
yolov8m | 52.0 | 50.2 | 4.0 | General purpose |
yolov8l | 87.7 | 52.9 | 6.5 | High accuracy needed |
yolov8x | 136.7 | 53.9 | 10.8 | Maximum accuracy |
Transfer Learning
Transfer learning allows you to leverage pre-trained weights and fine-tune on your custom dataset with fewer images and less training time.
Real-time Video Detection
Build a complete video detection pipeline that processes webcam or video file input in real-time.
- Use smaller model (yolov8n) for higher FPS
- Reduce input resolution for faster processing
- Use GPU acceleration with CUDA
- Enable half-precision (FP16) inference
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
object-detection-yolo
Required Project Structure
object-detection-yolo/
├── notebooks/
│ ├── 01_data_exploration.ipynb # Dataset exploration
│ ├── 02_model_training.ipynb # Transfer learning
│ └── 03_evaluation.ipynb # Model evaluation
├── src/
│ ├── detect_image.py # Image detection script
│ ├── detect_video.py # Video detection script
│ └── utils.py # Helper functions
├── models/
│ └── best.pt # Trained model weights
├── data/
│ └── data.yaml # Dataset configuration
├── reports/
│ ├── confusion_matrix.png # Confusion matrix
│ ├── precision_recall.png # PR curve
│ └── sample_detections.png # Sample detection results
├── videos/
│ └── demo.mp4 # Demo video with detections
├── requirements.txt # Python dependencies
└── README.md # Project documentation
README.md Required Sections
1. Project Header
- Project title and description
- Your full name and submission date
- Final mAP score achieved
2. Dataset
- Dataset used and source
- Number of classes and images
- Sample images with annotations
3. Model Training
- YOLO variant used
- Training hyperparameters
- Training curves (loss, mAP)
4. Results
- mAP50 and mAP50-95 scores
- Per-class performance
- Confusion matrix
5. Real-time Demo
- FPS achieved
- Demo video or GIF
- Hardware specifications
6. How to Run
- Installation instructions
- Training command
- Inference command
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 700 points.
| Criteria | Points | Description |
|---|---|---|
| Data Preparation | 75 | Dataset setup, visualization, proper splits |
| YOLO Implementation | 100 | Model setup, inference, understanding outputs |
| Transfer Learning | 150 | Training pipeline, achieving mAP over 0.5 |
| Model Evaluation | 100 | Metrics, curves, confusion matrix, error analysis |
| Real-time Detection | 150 | Video pipeline, 30+ FPS, clean visualization |
| Documentation | 100 | README quality, code comments, demo video |
| Bonus: ONNX Export | 25 | Model export and inference comparison |
| Total | 700 |
Grading Levels
Excellent
mAP over 0.6, 40+ FPS, exceptional docs
Good
Meets all requirements, good demo
Satisfactory
Meets minimum requirements
Needs Work
Missing components or poor performance
Pre-Submission Checklist
Use this checklist to verify you have completed all requirements before submitting.