Capstone Overview
The final capstone project is your opportunity to demonstrate mastery of all AI concepts covered throughout this course. You will build a complete, end-to-end AI system that integrates multiple components: data processing, machine learning models, deep learning architectures, natural language understanding, computer vision (if applicable), and production deployment.
ML and Deep Learning
Build and train models using scikit-learn, TensorFlow, or PyTorch
NLP and Vision
Process text or images using transformers, CNNs, and pre-trained models
Production Deployment
Deploy with Docker, REST APIs, monitoring, and ethical considerations
The Scenario
AI Solutions Startup
You have been hired as the Lead AI Engineer at a cutting-edge AI startup. The CEO has given you an ambitious challenge:
"We need a production-ready AI system that solves a real-world problem. It must demonstrate the full AI lifecycle: from data collection and preprocessing, through model development and training, to deployment and monitoring. The system should be robust, ethical, and ready for our investors to see. You have complete freedom to choose the domain, but it must be impressive and practical."
Your Mission
Choose one of the project options below (or propose your own with instructor approval) and build a complete AI system from scratch. Your solution must integrate multiple AI techniques, demonstrate proper software engineering practices, and include deployment infrastructure.
Project Options
Choose ONE of the following project options. Each option requires you to integrate multiple AI techniques and build a complete end-to-end system.
Intelligent Document Processing System
NLP + Computer Vision + ML Pipeline
Build a system that can process, understand, and extract information from documents (PDFs, images, scanned files). The system should perform OCR, entity extraction, document classification, and provide a search interface.
Required Components:
- OCR Pipeline: Extract text from images and PDFs using Tesseract or cloud OCR
- NLP Processing: Named entity recognition, key phrase extraction, summarization
- Document Classification: Categorize documents using ML or transformer models
- Search Engine: Semantic search with embeddings and vector database
- REST API: Endpoints for upload, processing, search, and retrieval
Multi-Modal AI Assistant
NLP + Vision + Conversational AI
Create an AI assistant that can understand both text and images, answer questions about uploaded images, maintain conversation context, and provide intelligent responses using modern NLU techniques.
Required Components:
- Intent Recognition: Understand user queries and classify intents
- Image Understanding: Process and describe uploaded images using CNN or vision transformers
- Visual Question Answering: Answer questions about image content
- Context Management: Track conversation history and maintain context
- REST API: Endpoints for text chat, image upload, and multi-turn conversations
Predictive Analytics Platform
ML + Time Series + Dashboard
Build an end-to-end predictive analytics platform for a specific domain (finance, healthcare, e-commerce, IoT, etc.). The system should ingest data, train models, make predictions, and visualize results.
Required Components:
- Data Pipeline: Ingest, clean, and preprocess data from multiple sources
- Feature Engineering: Automated feature extraction and selection
- Model Training: Train and compare multiple models (at least 3 algorithms)
- Prediction Engine: Real-time predictions with confidence scores
- Dashboard: Interactive visualization of data, predictions, and model performance
- REST API: Endpoints for data ingestion, predictions, and model status
Custom Project (With Approval)
Your Own Idea + All Required Components
Have your own idea? You can propose a custom project as long as it meets all the required components listed in the Requirements section below. Submit your proposal via the contact form before starting.
Proposal Must Include:
- Problem statement and target users
- AI techniques to be used (minimum 3 different techniques)
- Data sources and preprocessing plan
- Architecture diagram
- Deployment plan
Requirements
Regardless of which project option you choose, your capstone must include ALL of the following components:
Data Collection and Preprocessing
Your project must include:
- Clear documentation of data sources (public datasets, APIs, web scraping, etc.)
- Data cleaning and validation scripts
- Exploratory data analysis (EDA) in a Jupyter notebook
- Feature engineering and preprocessing pipelines
- Train/validation/test split strategy with justification
Model Development
Your project must include:
- At least 3 different AI/ML techniques (classification, NLP, CNN, transformer, etc.)
- Model training notebooks with clear explanations
- Hyperparameter tuning with documented experiments
- Model evaluation with appropriate metrics
- Comparison of multiple approaches with justification for final choice
Production-Ready Code
Your project must include:
- Modular, well-organized Python code in
src/directory - Proper error handling and logging
- Configuration management (environment variables or config files)
- Unit tests with at least 70% code coverage
- Type hints and docstrings for all functions
REST API
Your project must include:
- FastAPI or Flask-based REST API
- At least 5 endpoints with proper HTTP methods
- Input validation and error responses
- API documentation (Swagger/OpenAPI)
- Authentication (basic API key or JWT)
Deployment Infrastructure
Your project must include:
- Dockerfile with optimized multi-stage build
- Docker Compose for local development
- Deployment instructions for cloud platform (AWS, GCP, Azure, or similar)
- CI/CD pipeline configuration (GitHub Actions, GitLab CI, etc.)
- Basic monitoring and health check endpoints
Ethical AI Considerations
Your project must include:
- Bias analysis of your model and data
- Fairness metrics where applicable
- Model explainability (SHAP, LIME, or similar)
- Privacy considerations and data handling practices
- Documented limitations and potential misuse concerns
Documentation
Your project must include:
- Comprehensive README with setup instructions
- Architecture diagram showing all components
- API documentation with examples
- User guide or demo video (optional but recommended)
- Technical blog post or presentation slides
Submission
Create a public GitHub repository with the exact name shown below:
Required Repository Name
ai-capstone-project
Required Repository Structure
ai-capstone-project/
├── notebooks/ # Jupyter notebooks for EDA and experimentation
│ ├── 01_data_exploration.ipynb
│ ├── 02_feature_engineering.ipynb
│ ├── 03_model_training.ipynb
│ └── 04_evaluation.ipynb
├── src/ # Production Python code
│ ├── __init__.py
│ ├── data/ # Data loading and preprocessing
│ ├── models/ # Model definitions and training
│ ├── api/ # FastAPI/Flask application
│ └── utils/ # Utility functions
├── tests/ # Unit and integration tests
├── config/ # Configuration files
├── docker/ # Docker-related files
│ ├── Dockerfile
│ └── docker-compose.yml
├── docs/ # Documentation
│ ├── architecture.md
│ ├── api.md
│ └── deployment.md
├── .github/ # CI/CD workflows
│ └── workflows/
│ └── ci.yml
├── data/ # Sample data (if small) or data download scripts
├── models/ # Saved model files or download scripts
├── requirements.txt # Python dependencies
├── setup.py # Package setup
├── README.md # Project documentation
├── LICENSE # Open source license
└── .gitignore # Git ignore file
README.md Must Include:
- Your full name and submission date
- Project overview and problem statement
- Architecture diagram with component descriptions
- Installation and setup instructions
- API documentation with example requests
- Model performance metrics and comparisons
- Ethical considerations and limitations
- Future improvements and roadmap
Do Include
- All notebooks with clear outputs
- Well-documented source code
- Working Docker configuration
- Comprehensive test suite
- API with Swagger documentation
- Architecture and deployment diagrams
- Bias analysis and ethics report
Do Not Include
- Large model files (use Git LFS or cloud storage)
- API keys or secrets (use environment variables)
- Virtual environment folders
- Cache files (
__pycache__,.pyc) - IDE configuration files
- Large dataset files (provide download scripts)
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your capstone will be graded on the following criteria:
| Criteria | Points | Description |
|---|---|---|
| Data Pipeline | 50 | Data collection, cleaning, EDA, and feature engineering quality |
| Model Development | 100 | Multiple techniques, proper training, tuning, and evaluation |
| Code Quality | 75 | Modular code, tests, error handling, and documentation |
| API and Deployment | 100 | REST API, Docker, CI/CD, and deployment readiness |
| Ethical AI | 75 | Bias analysis, fairness metrics, explainability, and limitations |
| Documentation | 50 | README, architecture diagrams, API docs, and user guide |
| Innovation and Polish | 50 | Creativity, user experience, and overall presentation |
| Total | 500 |
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your CapstoneSkills Demonstrated
AI and ML Mastery
Data pipelines, model training, hyperparameter tuning, and evaluation across multiple AI techniques
Software Engineering
Clean code, modular architecture, testing, error handling, and production-ready practices
Deployment and DevOps
Docker, CI/CD, REST APIs, monitoring, and cloud deployment knowledge
Responsible AI
Bias detection, fairness metrics, model explainability, and ethical considerations
Pro Tips
Development Strategy
- Start with a working prototype, then iterate
- Use version control from day one
- Write tests as you develop, not after
- Document decisions and trade-offs
Architecture
- Keep components loosely coupled
- Use dependency injection for flexibility
- Plan for scalability from the start
- Create clear interfaces between modules
Time Management
- Allocate time for each component (data, model, API, deploy)
- Leave 20% buffer for unexpected issues
- Get early feedback on architecture
- Do not over-engineer - solve the problem first
Common Pitfalls
- Committing secrets or API keys
- Skipping error handling
- Not testing edge cases
- Poor documentation