Capstone Project 2

MNIST Digit Recognition

Build a Convolutional Neural Network (CNN) to recognize handwritten digits from the famous MNIST dataset. You will implement data augmentation techniques, design a CNN architecture, train the model with proper callbacks, and evaluate performance using accuracy metrics and confusion matrix visualization.

6-8 hours
Intermediate
250 Points
What You Will Build
  • Data preprocessing pipeline
  • CNN architecture design
  • Image augmentation techniques
  • Model training with callbacks
  • Performance evaluation visuals
Contents
01

Project Overview

This project brings together core deep learning concepts from the AI course. You will work with the MNIST dataset containing 70,000 grayscale images of handwritten digits (0-9), each 28×28 pixels. The dataset includes 60,000 training images and 10,000 test images collected from high school students and Census Bureau employees. Your goal is to build a CNN that achieves ≥99% accuracy on the test set through proper architecture design, data augmentation, and hyperparameter tuning.

Skills Applied: This project tests your proficiency in TensorFlow/Keras (CNN layers, model compilation), data preprocessing (normalization, reshaping), augmentation techniques (rotation, shifting), and model evaluation (confusion matrix, classification metrics).
Data Prep

Load, normalize, reshape, and split the MNIST dataset

CNN Design

Build Conv2D, pooling, and dense layers architecture

Augmentation

Apply rotation, shift, zoom to expand training data

Evaluation

Analyze accuracy, confusion matrix, and misclassifications

Learning Objectives

Technical Skills
  • Build CNN architectures with TensorFlow/Keras
  • Apply Conv2D, MaxPooling, BatchNorm, and Dropout
  • Implement ImageDataGenerator for augmentation
  • Use callbacks: EarlyStopping, ReduceLROnPlateau
  • Generate confusion matrix and classification report
Analytical Skills
  • Understand image data preprocessing requirements
  • Analyze training curves for overfitting detection
  • Interpret confusion matrix patterns
  • Identify commonly misclassified digit pairs
  • Document findings and methodology
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

PostalTech Solutions

You have been hired as a Machine Learning Engineer at PostalTech Solutions, a logistics company that processes millions of handwritten postal codes daily. Currently, the company relies on manual data entry which is slow, expensive, and error-prone. Your task is to build an AI system that can automatically recognize handwritten digits to automate the parcel sorting process.

"We process over 5 million parcels daily, and reading handwritten ZIP codes is our biggest bottleneck. We need an AI system that can accurately recognize digits to automate our sorting process. Accuracy is critical—even 1% error rate means 50,000 misrouted packages per day!"

Michael Torres, CTO of PostalTech Solutions

Project Goals

Model Performance
  • Achieve ≥99% accuracy on test set
  • Minimize confusion between similar digits (3/8, 4/9)
  • Ensure model generalizes to different handwriting styles
  • Optimize inference speed for production use
Technical Deliverables
  • Complete Jupyter notebook with all code
  • Trained model saved in Keras format
  • Visualization of training history
  • Confusion matrix and error analysis
Data Augmentation
  • Implement rotation (±10-15 degrees)
  • Apply width and height shifts (10-15%)
  • Add zoom augmentation (10-15%)
  • Document augmentation impact on accuracy
Documentation
  • Clear README with methodology
  • CNN architecture explanation
  • Key findings and observations
  • Instructions to run the notebook
Pro Tip: Focus on preventing overfitting! Use proper regularization (Dropout, BatchNorm) and monitor validation accuracy during training.
03

The Dataset

You will work with the MNIST dataset, the most famous benchmark in computer vision. The dataset can be loaded directly from TensorFlow/Keras or downloaded from Kaggle:

Dataset Access

Load MNIST directly from TensorFlow or download from Kaggle for offline use.

Download from Kaggle
Original Data Source

The MNIST database (Modified National Institute of Standards and Technology) is a large collection of handwritten digits widely used for training and testing machine learning models. It was created by Yann LeCun, Corinna Cortes, and Chris Burges.

Dataset Info: 70,000 images | Image Size: 28×28 pixels (grayscale) | Training: 60,000 | Test: 10,000 | Classes: 10 (digits 0-9) | Format: NumPy arrays (pixel values 0-255)
Dataset Structure

ArrayShapeTypeDescription
X_train(60000, 28, 28)uint8Training images (pixel values 0-255)
y_train(60000,)uint8Training labels (digits 0-9)

ArrayShapeTypeDescription
X_test(10000, 28, 28)uint8Test images (pixel values 0-255)
y_test(10000,)uint8Test labels (digits 0-9)

DigitTraining CountTest CountPercentage
05,923980~10%
16,7421,135~11%
25,9581,032~10%
36,1311,010~10%
45,842982~10%
55,421892~9%
65,918958~10%
76,2651,028~10%
85,851974~10%
95,9491,009~10%
Dataset Stats: 70,000 total images, 28×28 pixels each, grayscale (single channel)
Target Performance: ≥99% test accuracy with proper CNN architecture
Preprocessing Required: You must normalize pixel values from [0, 255] to [0, 1], reshape images to include channel dimension (28, 28, 1), and one-hot encode the labels for categorical crossentropy loss.
04

Project Requirements

Your project must include all of the following components. Structure your Jupyter notebook with clear markdown sections and well-commented code.

1
Data Loading & Preprocessing

Load and prepare the MNIST dataset:

  • Load data using tensorflow.keras.datasets.mnist
  • Normalize pixel values to [0, 1] range
  • Reshape images from (28, 28) to (28, 28, 1) for CNN input
  • One-hot encode labels using to_categorical
  • Create validation split (10-20% of training data)
Deliverable: Preprocessed data ready for CNN training with proper shapes and types.
2
Exploratory Data Analysis

Visualize and understand the dataset:

  • Display sample images from each digit class (0-9 grid)
  • Plot class distribution bar chart
  • Analyze pixel intensity distribution
  • Visualize average digit image per class (optional)
Deliverable: Sample digits grid and class distribution visualization saved to figures/.
3
CNN Architecture

Design and implement your CNN model:

  • At least 2-3 convolutional blocks (Conv2D + MaxPooling)
  • Include BatchNormalization layers for training stability
  • Apply Dropout (0.25-0.5) for regularization
  • Flatten and add Dense layers with ReLU activation
  • Output layer with 10 units and Softmax activation
Deliverable: CNN model compiled with Adam optimizer and categorical_crossentropy loss.
4
Data Augmentation

Implement augmentation to improve generalization:

  • Use ImageDataGenerator from Keras
  • Rotation range: ±10-15 degrees
  • Width and height shift: 0.1-0.15
  • Zoom range: 0.1-0.15
  • Visualize augmented examples

Important: Do NOT use horizontal or vertical flip for digits!

Deliverable: Augmentation examples visualization saved to figures/.
5
Model Training

Train with proper callbacks and monitoring:

  • EarlyStopping with patience=5 and restore_best_weights=True
  • ReduceLROnPlateau to reduce learning rate on plateau
  • ModelCheckpoint to save best model weights
  • Train for sufficient epochs (20-30) with appropriate batch size
  • Use augmented data via datagen.flow()
Deliverable: Training history object and saved model in models/ folder.
6
Model Evaluation

Evaluate performance comprehensively:

  • Calculate test accuracy (target: ≥99%)
  • Generate confusion matrix heatmap
  • Print classification report (precision, recall, F1)
  • Visualize misclassified examples
  • Plot training/validation accuracy and loss curves
Deliverable: Confusion matrix, training history, and misclassified examples in figures/.
7
Save Model & Documentation

Save trained model and document your work:

  • Save model in Keras format: model.save('models/mnist_cnn.keras')
  • Write comprehensive README.md with methodology
  • Include architecture summary and key hyperparameters
  • Document final accuracy and key observations
Deliverable: Saved model file and README.md documentation.
05

CNN Architecture Guide

Design your CNN with the following layer components. The recommended architecture achieves ≥99% accuracy with proper training.

Convolutional Layers
  • Conv2D: Extract features using learnable filters
  • Filters: Start with 32, increase to 64, 128
  • Kernel Size: (3, 3) is standard for MNIST
  • Activation: ReLU for non-linearity
  • Padding: 'same' to preserve spatial dimensions
Pooling & Regularization
  • MaxPooling2D: Reduce spatial dimensions by 2×
  • BatchNormalization: Stabilize training
  • Dropout: 0.25 after conv, 0.5 after dense
  • Pool Size: (2, 2) is standard
  • Strides: Default (same as pool size)
Dense Layers
  • Flatten: Convert 2D feature maps to 1D
  • Dense (Hidden): 128-256 units with ReLU
  • Dense (Output): 10 units with Softmax
  • Dropout: 0.5 before output layer
  • No activation: Softmax handles probabilities
Training Configuration
  • Optimizer: Adam (learning rate: 0.001)
  • Loss: categorical_crossentropy
  • Metrics: accuracy
  • Batch Size: 64-128
  • Epochs: 20-30 with early stopping
Recommended Architecture Flow
Input
28×28×1
Conv Block 1
32 filters
MaxPool
14×14
Conv Block 2
64 filters
MaxPool
7×7
Flatten
3136
Dense
256
Output
10
Best Practice: Add BatchNormalization after each Conv2D layer for faster convergence and better generalization. Use Dropout after pooling layers.
06

Required Visualizations

Create at least 6 visualizations and save them to the figures/ folder. Each visualization should have proper titles, labels, and be publication-quality.

Visualization 1

Sample Digits Grid

Display sample images from each digit class (0-9)

  • 2 rows × 5 columns grid showing digits 0-9
  • Use grayscale colormap (cmap='gray')
  • Add title for each subplot showing the digit
  • Save as: figures/sample_digits.png
Visualization 2

Class Distribution

Bar chart showing number of samples per digit class

  • Bar chart with digit labels on x-axis
  • Count values on y-axis
  • Add data labels on top of each bar
  • Save as: figures/class_distribution.png
Visualization 3

Augmentation Examples

Original image vs multiple augmented versions

  • Original image in first position
  • 9 augmented versions showing rotation, shift, zoom
  • Label each as "Original" or "Augmented #"
  • Save as: figures/augmentation_examples.png
Visualization 4

Training History

Training and validation accuracy/loss over epochs

  • Two subplots: Accuracy and Loss
  • Training and validation curves on each
  • Legend, grid, and proper labels
  • Save as: figures/training_history.png
Visualization 5

Confusion Matrix

Heatmap showing predicted vs actual classes

  • 10×10 matrix heatmap using seaborn
  • Annotated with count values
  • Predicted labels on x-axis, Actual on y-axis
  • Save as: figures/confusion_matrix.png
Visualization 6

Misclassified Examples

Examples where the model made incorrect predictions

  • Grid of 10-15 misclassified images
  • Title showing "True: X, Pred: Y" for each
  • Use red color for title text
  • Save as: figures/misclassified.png
07

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
mnist-digit-recognition
github.com/<your-username>/mnist-digit-recognition
Required Project Structure
mnist-digit-recognition/
├── notebooks/
│   └── mnist_classification.ipynb   # Main Jupyter notebook
├── models/
│   └── mnist_cnn.keras              # Trained model file
├── figures/
│   ├── sample_digits.png            # Sample digits grid
│   ├── class_distribution.png       # Class distribution bar chart
│   ├── augmentation_examples.png    # Augmentation examples
│   ├── training_history.png         # Training accuracy/loss curves
│   ├── confusion_matrix.png         # Confusion matrix heatmap
│   └── misclassified.png            # Misclassified examples
├── requirements.txt                 # Python dependencies
└── README.md                        # Project documentation
README.md Required Sections
1. Project Header
  • Project title and description
  • Your full name and submission date
  • Course and project number
2. Dataset
  • MNIST dataset overview
  • Preprocessing steps applied
  • Train/test split information
3. Model Architecture
  • CNN layer summary
  • Key hyperparameters
  • Augmentation techniques used
4. Results
  • Final test accuracy achieved
  • Training time and epochs
  • Key observations
5. Visualizations
  • Include figure screenshots
  • Brief caption for each
  • Use markdown image syntax
6. How to Run
  • Installation instructions
  • How to run the notebook
  • Required packages
Do Include
  • All required files in correct folders
  • Complete Jupyter notebook with outputs
  • Saved model in .keras format
  • All 6 visualization figures
  • requirements.txt with dependencies
  • Comprehensive README.md
Do Not Include
  • Jupyter checkpoints (.ipynb_checkpoints/)
  • Python cache files (__pycache__/)
  • Large dataset files (load from Keras)
  • Incomplete or broken notebooks
  • Models without training completion
Important: Before submitting, restart your notebook kernel and run all cells to ensure the notebook executes without errors.
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

08

Grading Rubric

Your project will be graded on the following criteria. Total: 250 points.

Criteria Points Description
Data Preprocessing 25 Proper normalization, reshaping, one-hot encoding, and validation split
Exploratory Data Analysis 25 Sample visualization, class distribution, and data understanding
CNN Architecture 50 Well-designed CNN with Conv2D, pooling, BatchNorm, Dropout, and proper output
Data Augmentation 30 Effective augmentation techniques with visualization
Training & Callbacks 25 Proper use of EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint
Model Performance 40 Test accuracy: ≥99% (full), ≥98% (partial), <98% (minimum)
Visualizations 30 All 6 required visualizations with proper labels and saved to figures/
Documentation 25 Complete README, code comments, and clear methodology
Total 250
Grading Levels
Excellent
225-250

Exceeds all requirements with ≥99% accuracy

Good
188-224

Meets all requirements with good quality

Satisfactory
150-187

Meets minimum requirements

Needs Work
< 150

Missing key requirements

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
09

Pre-Submission Checklist

Use this checklist to verify you have completed all requirements before submitting your project.

Data & Preprocessing
CNN Architecture
Augmentation & Training
Repository Requirements
Final Check: Restart your notebook kernel and run all cells from top to bottom to ensure the notebook executes without errors.