Project Overview
This project brings together core deep learning concepts from the AI course. You will work with the MNIST dataset containing 70,000 grayscale images of handwritten digits (0-9), each 28×28 pixels. The dataset includes 60,000 training images and 10,000 test images collected from high school students and Census Bureau employees. Your goal is to build a CNN that achieves ≥99% accuracy on the test set through proper architecture design, data augmentation, and hyperparameter tuning.
Data Prep
Load, normalize, reshape, and split the MNIST dataset
CNN Design
Build Conv2D, pooling, and dense layers architecture
Augmentation
Apply rotation, shift, zoom to expand training data
Evaluation
Analyze accuracy, confusion matrix, and misclassifications
Learning Objectives
Technical Skills
- Build CNN architectures with TensorFlow/Keras
- Apply Conv2D, MaxPooling, BatchNorm, and Dropout
- Implement ImageDataGenerator for augmentation
- Use callbacks: EarlyStopping, ReduceLROnPlateau
- Generate confusion matrix and classification report
Analytical Skills
- Understand image data preprocessing requirements
- Analyze training curves for overfitting detection
- Interpret confusion matrix patterns
- Identify commonly misclassified digit pairs
- Document findings and methodology
Business Scenario
PostalTech Solutions
You have been hired as a Machine Learning Engineer at PostalTech Solutions, a logistics company that processes millions of handwritten postal codes daily. Currently, the company relies on manual data entry which is slow, expensive, and error-prone. Your task is to build an AI system that can automatically recognize handwritten digits to automate the parcel sorting process.
"We process over 5 million parcels daily, and reading handwritten ZIP codes is our biggest bottleneck. We need an AI system that can accurately recognize digits to automate our sorting process. Accuracy is critical—even 1% error rate means 50,000 misrouted packages per day!"
Project Goals
- Achieve ≥99% accuracy on test set
- Minimize confusion between similar digits (3/8, 4/9)
- Ensure model generalizes to different handwriting styles
- Optimize inference speed for production use
- Complete Jupyter notebook with all code
- Trained model saved in Keras format
- Visualization of training history
- Confusion matrix and error analysis
- Implement rotation (±10-15 degrees)
- Apply width and height shifts (10-15%)
- Add zoom augmentation (10-15%)
- Document augmentation impact on accuracy
- Clear README with methodology
- CNN architecture explanation
- Key findings and observations
- Instructions to run the notebook
The Dataset
You will work with the MNIST dataset, the most famous benchmark in computer vision. The dataset can be loaded directly from TensorFlow/Keras or downloaded from Kaggle:
Dataset Access
Load MNIST directly from TensorFlow or download from Kaggle for offline use.
Original Data Source
The MNIST database (Modified National Institute of Standards and Technology) is a large collection of handwritten digits widely used for training and testing machine learning models. It was created by Yann LeCun, Corinna Cortes, and Chris Burges.
Dataset Structure
| Array | Shape | Type | Description |
|---|---|---|---|
X_train | (60000, 28, 28) | uint8 | Training images (pixel values 0-255) |
y_train | (60000,) | uint8 | Training labels (digits 0-9) |
| Array | Shape | Type | Description |
|---|---|---|---|
X_test | (10000, 28, 28) | uint8 | Test images (pixel values 0-255) |
y_test | (10000,) | uint8 | Test labels (digits 0-9) |
| Digit | Training Count | Test Count | Percentage |
|---|---|---|---|
| 0 | 5,923 | 980 | ~10% |
| 1 | 6,742 | 1,135 | ~11% |
| 2 | 5,958 | 1,032 | ~10% |
| 3 | 6,131 | 1,010 | ~10% |
| 4 | 5,842 | 982 | ~10% |
| 5 | 5,421 | 892 | ~9% |
| 6 | 5,918 | 958 | ~10% |
| 7 | 6,265 | 1,028 | ~10% |
| 8 | 5,851 | 974 | ~10% |
| 9 | 5,949 | 1,009 | ~10% |
Project Requirements
Your project must include all of the following components. Structure your Jupyter notebook with clear markdown sections and well-commented code.
Data Loading & Preprocessing
Load and prepare the MNIST dataset:
- Load data using
tensorflow.keras.datasets.mnist - Normalize pixel values to [0, 1] range
- Reshape images from (28, 28) to (28, 28, 1) for CNN input
- One-hot encode labels using
to_categorical - Create validation split (10-20% of training data)
Exploratory Data Analysis
Visualize and understand the dataset:
- Display sample images from each digit class (0-9 grid)
- Plot class distribution bar chart
- Analyze pixel intensity distribution
- Visualize average digit image per class (optional)
CNN Architecture
Design and implement your CNN model:
- At least 2-3 convolutional blocks (Conv2D + MaxPooling)
- Include BatchNormalization layers for training stability
- Apply Dropout (0.25-0.5) for regularization
- Flatten and add Dense layers with ReLU activation
- Output layer with 10 units and Softmax activation
Data Augmentation
Implement augmentation to improve generalization:
- Use
ImageDataGeneratorfrom Keras - Rotation range: ±10-15 degrees
- Width and height shift: 0.1-0.15
- Zoom range: 0.1-0.15
- Visualize augmented examples
Important: Do NOT use horizontal or vertical flip for digits!
Model Training
Train with proper callbacks and monitoring:
- EarlyStopping with patience=5 and restore_best_weights=True
- ReduceLROnPlateau to reduce learning rate on plateau
- ModelCheckpoint to save best model weights
- Train for sufficient epochs (20-30) with appropriate batch size
- Use augmented data via
datagen.flow()
Model Evaluation
Evaluate performance comprehensively:
- Calculate test accuracy (target: ≥99%)
- Generate confusion matrix heatmap
- Print classification report (precision, recall, F1)
- Visualize misclassified examples
- Plot training/validation accuracy and loss curves
Save Model & Documentation
Save trained model and document your work:
- Save model in Keras format:
model.save('models/mnist_cnn.keras') - Write comprehensive README.md with methodology
- Include architecture summary and key hyperparameters
- Document final accuracy and key observations
CNN Architecture Guide
Design your CNN with the following layer components. The recommended architecture achieves ≥99% accuracy with proper training.
- Conv2D: Extract features using learnable filters
- Filters: Start with 32, increase to 64, 128
- Kernel Size: (3, 3) is standard for MNIST
- Activation: ReLU for non-linearity
- Padding: 'same' to preserve spatial dimensions
- MaxPooling2D: Reduce spatial dimensions by 2×
- BatchNormalization: Stabilize training
- Dropout: 0.25 after conv, 0.5 after dense
- Pool Size: (2, 2) is standard
- Strides: Default (same as pool size)
- Flatten: Convert 2D feature maps to 1D
- Dense (Hidden): 128-256 units with ReLU
- Dense (Output): 10 units with Softmax
- Dropout: 0.5 before output layer
- No activation: Softmax handles probabilities
- Optimizer: Adam (learning rate: 0.001)
- Loss: categorical_crossentropy
- Metrics: accuracy
- Batch Size: 64-128
- Epochs: 20-30 with early stopping
Recommended Architecture Flow
28×28×1 Conv Block 1
32 filters MaxPool
14×14 Conv Block 2
64 filters MaxPool
7×7 Flatten
3136 Dense
256 Output
10
Required Visualizations
Create at least 6 visualizations and save them to the figures/ folder.
Each visualization should have proper titles, labels, and be publication-quality.
Sample Digits Grid
Display sample images from each digit class (0-9)
- 2 rows × 5 columns grid showing digits 0-9
- Use grayscale colormap (cmap='gray')
- Add title for each subplot showing the digit
- Save as:
figures/sample_digits.png
Class Distribution
Bar chart showing number of samples per digit class
- Bar chart with digit labels on x-axis
- Count values on y-axis
- Add data labels on top of each bar
- Save as:
figures/class_distribution.png
Augmentation Examples
Original image vs multiple augmented versions
- Original image in first position
- 9 augmented versions showing rotation, shift, zoom
- Label each as "Original" or "Augmented #"
- Save as:
figures/augmentation_examples.png
Training History
Training and validation accuracy/loss over epochs
- Two subplots: Accuracy and Loss
- Training and validation curves on each
- Legend, grid, and proper labels
- Save as:
figures/training_history.png
Confusion Matrix
Heatmap showing predicted vs actual classes
- 10×10 matrix heatmap using seaborn
- Annotated with count values
- Predicted labels on x-axis, Actual on y-axis
- Save as:
figures/confusion_matrix.png
Misclassified Examples
Examples where the model made incorrect predictions
- Grid of 10-15 misclassified images
- Title showing "True: X, Pred: Y" for each
- Use red color for title text
- Save as:
figures/misclassified.png
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
mnist-digit-recognition
Required Project Structure
mnist-digit-recognition/
├── notebooks/
│ └── mnist_classification.ipynb # Main Jupyter notebook
├── models/
│ └── mnist_cnn.keras # Trained model file
├── figures/
│ ├── sample_digits.png # Sample digits grid
│ ├── class_distribution.png # Class distribution bar chart
│ ├── augmentation_examples.png # Augmentation examples
│ ├── training_history.png # Training accuracy/loss curves
│ ├── confusion_matrix.png # Confusion matrix heatmap
│ └── misclassified.png # Misclassified examples
├── requirements.txt # Python dependencies
└── README.md # Project documentation
README.md Required Sections
1. Project Header
- Project title and description
- Your full name and submission date
- Course and project number
2. Dataset
- MNIST dataset overview
- Preprocessing steps applied
- Train/test split information
3. Model Architecture
- CNN layer summary
- Key hyperparameters
- Augmentation techniques used
4. Results
- Final test accuracy achieved
- Training time and epochs
- Key observations
5. Visualizations
- Include figure screenshots
- Brief caption for each
- Use markdown image syntax
6. How to Run
- Installation instructions
- How to run the notebook
- Required packages
Do Include
- All required files in correct folders
- Complete Jupyter notebook with outputs
- Saved model in .keras format
- All 6 visualization figures
- requirements.txt with dependencies
- Comprehensive README.md
Do Not Include
- Jupyter checkpoints (.ipynb_checkpoints/)
- Python cache files (__pycache__/)
- Large dataset files (load from Keras)
- Incomplete or broken notebooks
- Models without training completion
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 250 points.
| Criteria | Points | Description |
|---|---|---|
| Data Preprocessing | 25 | Proper normalization, reshaping, one-hot encoding, and validation split |
| Exploratory Data Analysis | 25 | Sample visualization, class distribution, and data understanding |
| CNN Architecture | 50 | Well-designed CNN with Conv2D, pooling, BatchNorm, Dropout, and proper output |
| Data Augmentation | 30 | Effective augmentation techniques with visualization |
| Training & Callbacks | 25 | Proper use of EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint |
| Model Performance | 40 | Test accuracy: ≥99% (full), ≥98% (partial), <98% (minimum) |
| Visualizations | 30 | All 6 required visualizations with proper labels and saved to figures/ |
| Documentation | 25 | Complete README, code comments, and clear methodology |
| Total | 250 |
Grading Levels
Excellent
Exceeds all requirements with ≥99% accuracy
Good
Meets all requirements with good quality
Satisfactory
Meets minimum requirements
Needs Work
Missing key requirements
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your ProjectPre-Submission Checklist
Use this checklist to verify you have completed all requirements before submitting your project.