Project 2: MNIST Digit Recognition | AI Course

Project Overview

This project brings together core deep learning concepts from the AI course. You will work with the MNIST dataset containing 70,000 grayscale images of handwritten digits (0-9), each 28×28 pixels. The dataset includes 60,000 training images and 10,000 test images collected from high school students and Census Bureau employees. Your goal is to build a CNN that achieves ≥99% accuracy on the test set through proper architecture design, data augmentation, and hyperparameter tuning.

Skills Applied: This project tests your proficiency in TensorFlow/Keras (CNN layers, model compilation), data preprocessing (normalization, reshaping), augmentation techniques (rotation, shifting), and model evaluation (confusion matrix, classification metrics).

Data Prep

Load, normalize, reshape, and split the MNIST dataset

CNN Design

Build Conv2D, pooling, and dense layers architecture

Augmentation

Apply rotation, shift, zoom to expand training data

Evaluation

Analyze accuracy, confusion matrix, and misclassifications

Learning Objectives

Technical Skills

Build CNN architectures with TensorFlow/Keras
Apply Conv2D, MaxPooling, BatchNorm, and Dropout
Implement ImageDataGenerator for augmentation
Use callbacks: EarlyStopping, ReduceLROnPlateau
Generate confusion matrix and classification report

Analytical Skills

Understand image data preprocessing requirements
Analyze training curves for overfitting detection
Interpret confusion matrix patterns
Identify commonly misclassified digit pairs
Document findings and methodology

Ready to submit? Already completed the project? Submit your work now!

Submit Now

Business Scenario

PostalTech Solutions

You have been hired as a Machine Learning Engineer at PostalTech Solutions, a logistics company that processes millions of handwritten postal codes daily. Currently, the company relies on manual data entry which is slow, expensive, and error-prone. Your task is to build an AI system that can automatically recognize handwritten digits to automate the parcel sorting process.

"We process over 5 million parcels daily, and reading handwritten ZIP codes is our biggest bottleneck. We need an AI system that can accurately recognize digits to automate our sorting process. Accuracy is critical—even 1% error rate means 50,000 misrouted packages per day!"

Michael Torres, CTO of PostalTech Solutions

Project Goals

Model Performance

Achieve ≥99% accuracy on test set
Minimize confusion between similar digits (3/8, 4/9)
Ensure model generalizes to different handwriting styles
Optimize inference speed for production use

Technical Deliverables

Complete Jupyter notebook with all code
Trained model saved in Keras format
Visualization of training history
Confusion matrix and error analysis

Data Augmentation

Implement rotation (±10-15 degrees)
Apply width and height shifts (10-15%)
Add zoom augmentation (10-15%)
Document augmentation impact on accuracy

Documentation

Clear README with methodology
CNN architecture explanation
Key findings and observations
Instructions to run the notebook

Pro Tip: Focus on preventing overfitting! Use proper regularization (Dropout, BatchNorm) and monitor validation accuracy during training.

The Dataset

You will work with the MNIST dataset, the most famous benchmark in computer vision. The dataset can be loaded directly from TensorFlow/Keras or downloaded from Kaggle:

Dataset Access

Load MNIST directly from TensorFlow or download from Kaggle for offline use.

Download from Kaggle

Original Data Source

The MNIST database (Modified National Institute of Standards and Technology) is a large collection of handwritten digits widely used for training and testing machine learning models. It was created by Yann LeCun, Corinna Cortes, and Chris Burges.

View on Kaggle Original MNIST Page

Dataset Structure

Array	Shape	Type	Description
`X_train`	(60000, 28, 28)	uint8	Training images (pixel values 0-255)
`y_train`	(60000,)	uint8	Training labels (digits 0-9)

Array	Shape	Type	Description
`X_test`	(10000, 28, 28)	uint8	Test images (pixel values 0-255)
`y_test`	(10000,)	uint8	Test labels (digits 0-9)

Digit	Training Count	Test Count	Percentage
0	5,923	980	~10%
1	6,742	1,135	~11%
2	5,958	1,032	~10%
3	6,131	1,010	~10%
4	5,842	982	~10%
5	5,421	892	~9%
6	5,918	958	~10%
7	6,265	1,028	~10%
8	5,851	974	~10%
9	5,949	1,009	~10%

Dataset Stats: 70,000 total images, 28×28 pixels each, grayscale (single channel)

Target Performance: ≥99% test accuracy with proper CNN architecture

Preprocessing Required: You must normalize pixel values from [0, 255] to [0, 1], reshape images to include channel dimension (28, 28, 1), and one-hot encode the labels for categorical crossentropy loss.

Project Requirements

Your project must include all of the following components. Structure your Jupyter notebook with clear markdown sections and well-commented code.

Data Loading & Preprocessing

Load and prepare the MNIST dataset:

Load data using tensorflow.keras.datasets.mnist
Normalize pixel values to [0, 1] range
Reshape images from (28, 28) to (28, 28, 1) for CNN input
One-hot encode labels using to_categorical
Create validation split (10-20% of training data)

Deliverable: Preprocessed data ready for CNN training with proper shapes and types.

Exploratory Data Analysis

Visualize and understand the dataset:

Display sample images from each digit class (0-9 grid)
Plot class distribution bar chart
Analyze pixel intensity distribution
Visualize average digit image per class (optional)

Deliverable: Sample digits grid and class distribution visualization saved to figures/.

CNN Architecture

Design and implement your CNN model:

At least 2-3 convolutional blocks (Conv2D + MaxPooling)
Include BatchNormalization layers for training stability
Apply Dropout (0.25-0.5) for regularization
Flatten and add Dense layers with ReLU activation
Output layer with 10 units and Softmax activation

Deliverable: CNN model compiled with Adam optimizer and categorical_crossentropy loss.

Data Augmentation

Implement augmentation to improve generalization:

Use ImageDataGenerator from Keras
Rotation range: ±10-15 degrees
Width and height shift: 0.1-0.15
Zoom range: 0.1-0.15
Visualize augmented examples

Important: Do NOT use horizontal or vertical flip for digits!

Deliverable: Augmentation examples visualization saved to figures/.

Model Training

Train with proper callbacks and monitoring:

EarlyStopping with patience=5 and restore_best_weights=True
ReduceLROnPlateau to reduce learning rate on plateau
ModelCheckpoint to save best model weights
Train for sufficient epochs (20-30) with appropriate batch size
Use augmented data via datagen.flow()

Deliverable: Training history object and saved model in models/ folder.

Model Evaluation

Evaluate performance comprehensively:

Calculate test accuracy (target: ≥99%)
Generate confusion matrix heatmap
Print classification report (precision, recall, F1)
Visualize misclassified examples
Plot training/validation accuracy and loss curves

Deliverable: Confusion matrix, training history, and misclassified examples in figures/.

Save Model & Documentation

Save trained model and document your work:

Save model in Keras format: model.save('models/mnist_cnn.keras')
Write comprehensive README.md with methodology
Include architecture summary and key hyperparameters
Document final accuracy and key observations

Deliverable: Saved model file and README.md documentation.

CNN Architecture Guide

Design your CNN with the following layer components. The recommended architecture achieves ≥99% accuracy with proper training.

Convolutional Layers

Conv2D: Extract features using learnable filters
Filters: Start with 32, increase to 64, 128
Kernel Size: (3, 3) is standard for MNIST
Activation: ReLU for non-linearity
Padding: 'same' to preserve spatial dimensions

Pooling & Regularization

MaxPooling2D: Reduce spatial dimensions by 2×
BatchNormalization: Stabilize training
Dropout: 0.25 after conv, 0.5 after dense
Pool Size: (2, 2) is standard
Strides: Default (same as pool size)

Dense Layers

Flatten: Convert 2D feature maps to 1D
Dense (Hidden): 128-256 units with ReLU
Dense (Output): 10 units with Softmax
Dropout: 0.5 before output layer
No activation: Softmax handles probabilities

Training Configuration

Optimizer: Adam (learning rate: 0.001)
Loss: categorical_crossentropy
Metrics: accuracy
Batch Size: 64-128
Epochs: 20-30 with early stopping

Recommended Architecture Flow

Input
28×28×1 Conv Block 1
32 filters MaxPool
14×14 Conv Block 2
64 filters MaxPool
7×7 Flatten
3136 Dense
256 Output
10

Best Practice: Add BatchNormalization after each Conv2D layer for faster convergence and better generalization. Use Dropout after pooling layers.

Required Visualizations

Create at least 6 visualizations and save them to the figures/ folder. Each visualization should have proper titles, labels, and be publication-quality.

Visualization 1

Sample Digits Grid

Display sample images from each digit class (0-9)

2 rows × 5 columns grid showing digits 0-9
Use grayscale colormap (cmap='gray')
Add title for each subplot showing the digit
Save as: figures/sample_digits.png

Visualization 2

Class Distribution

Bar chart showing number of samples per digit class

Bar chart with digit labels on x-axis
Count values on y-axis
Add data labels on top of each bar
Save as: figures/class_distribution.png

Visualization 3

Augmentation Examples

Original image vs multiple augmented versions

Original image in first position
9 augmented versions showing rotation, shift, zoom
Label each as "Original" or "Augmented #"
Save as: figures/augmentation_examples.png

Visualization 4

Training History

Training and validation accuracy/loss over epochs

Two subplots: Accuracy and Loss
Training and validation curves on each
Legend, grid, and proper labels
Save as: figures/training_history.png

Visualization 5

Confusion Matrix

Heatmap showing predicted vs actual classes

10×10 matrix heatmap using seaborn
Annotated with count values
Predicted labels on x-axis, Actual on y-axis
Save as: figures/confusion_matrix.png

Visualization 6

Misclassified Examples

Examples where the model made incorrect predictions

Grid of 10-15 misclassified images
Title showing "True: X, Pred: Y" for each
Use red color for title text
Save as: figures/misclassified.png

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name

mnist-digit-recognition

github.com/<your-username>/mnist-digit-recognition

Required Project Structure

mnist-digit-recognition/
├── notebooks/
│   └── mnist_classification.ipynb   # Main Jupyter notebook
├── models/
│   └── mnist_cnn.keras              # Trained model file
├── figures/
│   ├── sample_digits.png            # Sample digits grid
│   ├── class_distribution.png       # Class distribution bar chart
│   ├── augmentation_examples.png    # Augmentation examples
│   ├── training_history.png         # Training accuracy/loss curves
│   ├── confusion_matrix.png         # Confusion matrix heatmap
│   └── misclassified.png            # Misclassified examples
├── requirements.txt                 # Python dependencies
└── README.md                        # Project documentation

README.md Required Sections

1. Project Header

Project title and description
Your full name and submission date
Course and project number

2. Dataset

MNIST dataset overview
Preprocessing steps applied
Train/test split information

3. Model Architecture

CNN layer summary
Key hyperparameters
Augmentation techniques used

4. Results

Final test accuracy achieved
Training time and epochs
Key observations

5. Visualizations

Include figure screenshots
Brief caption for each
Use markdown image syntax

6. How to Run

Installation instructions
How to run the notebook
Required packages

Do Include

All required files in correct folders
Complete Jupyter notebook with outputs
Saved model in .keras format
All 6 visualization figures
requirements.txt with dependencies
Comprehensive README.md

Do Not Include

Jupyter checkpoints (.ipynb_checkpoints/)
Python cache files (__pycache__/)
Large dataset files (load from Keras)
Incomplete or broken notebooks
Models without training completion

Important: Before submitting, restart your notebook kernel and run all cells to ensure the notebook executes without errors.

Submit Your Project

Enter your GitHub username - we will verify your repository automatically

Grading Rubric

Your project will be graded on the following criteria. Total: 250 points.

Criteria	Points	Description
Data Preprocessing	25	Proper normalization, reshaping, one-hot encoding, and validation split
Exploratory Data Analysis	25	Sample visualization, class distribution, and data understanding
CNN Architecture	50	Well-designed CNN with Conv2D, pooling, BatchNorm, Dropout, and proper output
Data Augmentation	30	Effective augmentation techniques with visualization
Training & Callbacks	25	Proper use of EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint
Model Performance	40	Test accuracy: ≥99% (full), ≥98% (partial), <98% (minimum)
Visualizations	30	All 6 required visualizations with proper labels and saved to figures/
Documentation	25	Complete README, code comments, and clear methodology
Total	250

Grading Levels

Excellent

225-250

Exceeds all requirements with ≥99% accuracy

Good

188-224

Meets all requirements with good quality

Satisfactory

150-187

Meets minimum requirements

Needs Work

< 150

Missing key requirements

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project

MNIST Digit Recognition

What You Will Build

Contents

Project Overview

Data Prep

CNN Design

Augmentation

Evaluation

Learning Objectives

Technical Skills

Analytical Skills

Business Scenario

PostalTech Solutions

Project Goals

The Dataset

Dataset Access

Original Data Source

Dataset Structure

1 Training Data (60,000 images)

2 Test Data (10,000 images)

3 Class Distribution (approximate per class)

Project Requirements

Data Loading & Preprocessing

Exploratory Data Analysis

CNN Architecture

Data Augmentation

Model Training

Model Evaluation

Save Model & Documentation

CNN Architecture Guide

Recommended Architecture Flow

Required Visualizations

Sample Digits Grid

Class Distribution

Augmentation Examples

Training History

Confusion Matrix

Misclassified Examples

Submission Requirements

Required Repository Name

Required Project Structure

README.md Required Sections

1. Project Header

2. Dataset

3. Model Architecture

4. Results

5. Visualizations

6. How to Run

Do Include

Do Not Include

Grading Rubric

Grading Levels

Excellent

Good

Satisfactory

Needs Work

Ready to Submit?

Pre-Submission Checklist

Data & Preprocessing

CNN Architecture

Augmentation & Training

Repository Requirements