Project Overview
This capstone project brings together everything you have learned in the Data Science course. You will work with a realistic e-commerce sales dataset containing 2000 orders spanning an entire year, with 100 products across 3 categories (Electronics, Furniture, Office Supplies), 300 customers, and 4 geographic regions. Your goal is to build a professional-grade analytics dashboard that transforms raw data into actionable business insights through interactive visualizations and comprehensive KPI analysis.
Data Cleaning
Handle missing values, data types, and outliers
KPI Calculation
Compute revenue, AOV, growth rates, and more
Trend Analysis
Identify patterns and seasonality in sales
Dashboard
Build interactive Plotly visualizations
Learning Objectives
Technical Skills
- Master pandas for data cleaning and transformation
- Calculate and interpret business KPIs
- Create interactive visualizations with Plotly
- Perform time-series and trend analysis
- Handle large datasets efficiently
Business Skills
- Translate business questions into analyses
- Derive actionable insights from data
- Present findings to non-technical stakeholders
- Make data-driven recommendations
- Build executive-ready dashboards
Business Scenario
TechMart E-Commerce
You have been hired as a Data Analyst at TechMart, a growing e-commerce company specializing in electronics, furniture, and office supplies. The executive team is preparing for their quarterly board meeting and needs a comprehensive sales dashboard.
"We need to understand our sales performance across regions, identify our top products and customers, and spot any trends that could inform our Q2 strategy. Can you build us an interactive dashboard with the key metrics and visualizations?"
Business Questions to Answer
- What is the total revenue for each month?
- Which region generates the most revenue?
- What is our month-over-month growth rate?
- What are the top 10 best-selling products?
- Which category has the highest sales volume?
- What is the average order value by category?
- Who are our top 5 customers by revenue?
- What is the customer repeat purchase rate?
- How does payment method affect order value?
- Are there any weekly patterns in orders?
- How do discounts impact total revenue?
- What is the regional distribution of orders?
The Dataset
You will work with a realistic e-commerce sales dataset. Download the CSV file and place it
in your project's data/ folder:
Dataset Download
Download the e-commerce sales dataset and save it to your project's data/ folder.
The CSV file contains 2000 real transaction records with all necessary columns for analysis.
Dataset Schema
| Column | Type | Description |
|---|---|---|
order_id | Integer | Unique order identifier |
order_date | Date | Date of the order (YYYY-MM-DD) |
customer_id | String | Unique customer identifier |
customer_name | String | Customer full name |
region | String | Geographic region (North, South, East, West) |
city | String | City name |
product_id | String | Unique product identifier |
product_name | String | Product name |
category | String | Product category (Electronics, Furniture, Office Supplies) |
subcategory | String | Product subcategory |
quantity | Integer | Number of units ordered |
unit_price | Float | Price per unit ($) |
discount | Float | Discount applied (0.0 to 0.20) |
total_amount | Float | Total order amount after discount |
payment_method | String | Payment type (Credit Card, Debit Card, PayPal) |
order_status | String | Order status (Delivered) |
Sample Data Preview
Understanding the data structure is crucial. Here's what a typical row looks like:
| order_id | order_date | customer_name | region | product_name | category | quantity | total_amount |
|---|---|---|---|---|---|---|---|
| 1001 | 2024-04-12 | William Cox | West | Gaming Desktop RGB | Electronics | 1 | $1,709.99 |
| 1002 | 2024-12-14 | Jeffrey Howard | West | Keyboard Mouse Combo | Electronics | 1 | $38.97 |
| 1003 | 2024-09-27 | Richard Gray | North | Monitor 27 inch | Electronics | 1 | $337.49 |
Project Requirements
Your Jupyter Notebook must include all of the following components. Structure your notebook with clear markdown headers and explanations for each section.
Project Setup and Introduction
Begin your notebook with a professional introduction including project title, your full name, submission date, and comprehensive project overview. Clearly state the business context and objectives. Import all required libraries including pandas for data manipulation, numpy for calculations, and plotly for interactive visualizations.
Data Cleaning and Preparation
Data Quality Assessment:
- Inspect the dataset structure using
info(),describe(), andhead() - Identify and document any missing values, duplicates, or inconsistencies
- Verify data types for all 16 columns match the schema
- Check for logical errors (e.g., negative quantities, future dates)
Feature Engineering:
- Convert
order_datefrom string to datetime format for time-series analysis - Extract temporal features: month, month name, week number, day of week, quarter
- Create derived metrics: revenue per item, discount amount, profit margin indicators
- Add categorical flags: is_discounted, is_weekend, season, high_value_order
Data Validation:
- Verify calculations: total_amount = unit_price * quantity * (1 - discount)
- Check value ranges are realistic (prices, quantities, discounts)
- Ensure referential integrity between customer_id, product_id, and names
Exploratory Data Analysis (EDA)
Univariate Analysis:
- Statistical summary of numerical columns (revenue, quantity, discount, unit_price)
- Frequency distributions for categorical variables (category, region, payment_method)
- Identify outliers using box plots and statistical methods (IQR, z-scores)
- Analyze value ranges, means, medians, and standard deviations
Bivariate Analysis:
- Correlation matrix between numerical features (quantity, price, discount, revenue)
- Revenue distribution across categories, regions, and payment methods
- Relationship between discount levels and order values
- Customer purchase patterns and repeat behavior analysis
Pattern Discovery:
- Temporal patterns: sales trends by month, week, day of week
- Geographic patterns: regional performance differences
- Product patterns: category mix, best/worst performers
- Customer patterns: segmentation by purchase frequency and value
KPI Calculations
Calculate all required KPIs organized into four categories (see detailed specifications below):
- Revenue Metrics: Total revenue, AOV, monthly trends, MoM growth, regional breakdown
- Product Metrics: Units sold, top performers, category analysis, discount impact
- Customer Metrics: Total customers, retention rate, CLV, top spenders
- Operational Metrics: Order volume, payment distribution, regional counts, patterns
Create a comprehensive KPI summary section that displays all metrics in a clear, organized format. Consider using pandas DataFrames, formatted tables, or creating custom display functions. Include percentage changes, comparisons, and context for each metric.
calculate_revenue_kpis(df),
calculate_product_kpis(df) to organize your code and enable testing.
Visualizations with Plotly
Create at least 8 professional interactive visualizations using Plotly Express and Graph Objects. Each visualization should be business-ready with:
- Clear titles that explain what the chart shows
- Axis labels with proper units (e.g., "Revenue ($)", "Number of Orders")
- Color schemes that enhance readability and highlight key information
- Interactivity: hover details, zoom, pan, and filtering capabilities
- Annotations for important data points, trends, or insights
- Consistent styling using themes (plotly_dark, plotly_white)
See the Visualizations section below for specific chart requirements and recommendations. Group related charts together and provide context before each visualization explaining what business question it answers.
Insights and Recommendations
Executive Summary: Create a comprehensive analysis summary with clear, actionable insights.
Key Findings (5-7 insights):
- Data-driven insights backed by specific numbers from your analysis
- Trend identification (growth patterns, seasonality, anomalies)
- Performance highlights (top products, regions, customer segments)
- Problem areas or concerns (declining metrics, underperformers)
- Opportunity identification (untapped markets, product potential)
Business Recommendations (3-5 actionable items):
- Specific actions the business should take based on your findings
- Prioritized by potential impact (high/medium/low)
- Include expected outcomes or benefits for each recommendation
- Consider operational feasibility and resource requirements
Future Analysis: Suggest additional analyses or data that would provide deeper insights.
Recommended Workflow
Follow this structured approach to complete your project efficiently and systematically. Each phase builds upon the previous one to create a comprehensive analytics dashboard.
Setup & Initial Exploration
1-2 hours- Set up project folder structure
- Load data and initial inspection
- Create data quality report
- Document observations
df.info(), df.describe(), and df.head()
Data Preparation
2-3 hours- Convert dates & extract features
- Create derived columns
- Add categorical flags
- Validate transformations
- Save cleaned dataset
KPI Calculation
1-2 hours- Create modular KPI functions
- Calculate all metrics
- Build KPI summary
- Verify with spot checks
Visualization Development
2-3 hours- Create 8+ Plotly charts
- Customize styling & themes
- Add interactivity
- Organize by theme
Analysis & Insights
1-2 hours- Analyze patterns & trends
- Document 5-7 key findings
- Develop recommendations
- Create executive summary
Documentation & Submission
1 hour- Write comprehensive README
- Add visualization screenshots
- Create requirements.txt
- Test end-to-end execution
- Push to GitHub & submit
Completion Milestone
Once you complete all 6 phases, you'll have a professional portfolio piece showcasing your data science skills!
KPI Specifications
Calculate and display the following Key Performance Indicators. Create functions for reusable calculations.
- Total Revenue: Sum of all
total_amount - Average Order Value (AOV): Total Revenue / Number of Orders
- Revenue by Month: Monthly revenue totals
- MoM Growth Rate: (Current - Previous) / Previous * 100
- Revenue by Region: Regional breakdown
- Total Units Sold: Sum of all
quantity - Top 10 Products: By revenue and by quantity
- Category Revenue: Revenue per category
- Average Discount: Mean discount rate
- Discount Impact: Revenue with vs without discount
- Total Customers: Unique
customer_idcount - Repeat Customer Rate: Customers with 2+ orders / Total
- Top 5 Customers: By total revenue
- Avg Orders per Customer: Total orders / Unique customers
- Customer Lifetime Value: Avg revenue per customer
- Orders per Day: Average daily order count
- Payment Method Split: Distribution by payment type
- Regional Order Count: Orders per region
- Busiest Day: Day of week with most orders
- Category Mix: Percentage by category
KPI Implementation Tips
Required Visualizations
Create at least 8 of the following visualizations using Plotly. All charts should be interactive and professional-quality.
Business Question: How has revenue changed over the year? Are there growth patterns or seasonal trends?
Chart Type: Line chart with markers
Key Features:
- X-axis: Months (Jan-Dec 2024)
- Y-axis: Total Revenue ($)
- Add markers at each data point
- Include MoM growth rate annotations
- Use gradient colors to highlight growth
Insight to Find: Identify peak sales months, growth trends, potential seasonality
Business Question: Which products are driving the most revenue?
Chart Type: Horizontal bar chart
Key Features:
- Sort from highest to lowest revenue
- Color bars by product category
- Show exact revenue values on bars
- Include percentage of total revenue
Insight to Find: Product concentration (are sales dependent on few products?), category of winners
Business Question: What is the revenue mix across product categories?
Chart Type: Donut chart (preferred) or pie chart
Key Features:
- Show percentages for each category
- Use distinct colors for each segment
- Display actual dollar amounts in hover
- Pull out largest segment slightly
Insight to Find: Category dominance, diversification level, potential gaps
Business Question: How do different regions compare in revenue and order volume?
Chart Type: Grouped bar chart
Key Features:
- X-axis: Regions (North, South, East, West)
- Y-axis: Both revenue and order count (dual axis or grouped)
- Color-code revenue vs orders differently
- Include average order value per region
Insight to Find: Regional disparities, high-value vs high-volume regions, expansion opportunities
Business Question: When do customers prefer to shop? Are there weekly patterns?
Chart Type: Heatmap
Key Features:
- Rows: Days of week (Monday-Sunday)
- Columns: Weeks or months
- Color intensity: Order count or revenue
- Use diverging color scale
Insight to Find: Busiest days, weekly patterns, potential for promotions
Business Question: What is the typical order size? Are there outliers?
Chart Type: Box plot (or violin plot for bonus points)
Key Features:
- Show distribution by category or region
- Display median, quartiles, and outliers
- Use horizontal orientation for better readability
- Include mean line for comparison
Insight to Find: Order value consistency, outlier orders, category pricing differences
Business Question: Is there a relationship between quantity sold and revenue?
Chart Type: Scatter plot
Key Features:
- X-axis: Quantity, Y-axis: Total Amount
- Color points by category
- Size points by discount level (optional)
- Add trend line if correlation exists
Insight to Find: High-value low-quantity products, bulk purchase patterns, category behaviors
Business Question: How does revenue distribute across category/subcategory/product levels?
Chart Type: Treemap
Key Features:
- Hierarchy: Category → Subcategory → Product
- Size: Revenue or units sold
- Color: Profit margin or category
- Interactive drill-down capability
Insight to Find: Category composition, subcategory winners, product concentration
Business Question: What are the key performance metrics at a glance?
Chart Type: Dashboard with indicator cards using Plotly
Key Features:
- Create 6-8 KPI cards showing: Total Revenue, AOV, Orders, Customers, Growth Rate, Top Category
- Use color coding: green for positive, red for negative trends
- Include delta indicators (arrows, percentages)
- Arrange in logical grid layout
Insight to Find: Overall business health snapshot, key metrics summary
Visualization Best Practices
Chart Selection
- Line charts for time trends
- Bar charts for comparisons
- Pie/Donut for composition
- Heatmaps for patterns
Interactivity
- Enable hover tooltips
- Add zoom and pan
- Use consistent colors
- Include legends
Clarity
- Descriptive titles
- Clear axis labels
- Proper formatting
- Annotations for insights
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
sales-dashboard-project
Required Project Structure
sales-dashboard-project/
├── data/
│ └── ecommerce_sales.csv # The dataset (download from above)
├── notebooks/
│ └── sales_analysis.ipynb # Your main analysis notebook
├── requirements.txt # Python dependencies
└── README.md # REQUIRED - see contents below
README.md Template Structure
Your README should follow this professional structure:
1. Project Header
- Project title
- Your full name
- Submission date
- Course name and project number
2. Business Context
- Brief description of TechMart scenario
- Project objectives
- Dataset overview (2000 orders, 12 months, etc.)
3. Technologies Used
- Python 3.x
- Pandas (version)
- Plotly (version)
- Jupyter Notebook
- Other libraries used
4. Key Findings
- 5-7 data-driven insights
- Each finding with supporting metric
- Business implications
5. Visualizations
- 3-4 screenshots of your best charts
- Brief caption for each
- Use markdown image syntax
6. Installation & Usage
- Clone repository command
- Install dependencies:
pip install -r requirements.txt - How to run notebook
- Expected runtime
7. Project Structure
- Directory tree showing file organization
- Brief description of each major file
8. Contact Information
- Your email (optional)
- GitHub profile link
- LinkedIn (optional)
requirements.txt
pandas>=2.0.0
numpy>=1.24.0
plotly>=5.18.0
jupyter>=1.0.0
nbformat>=5.9.0
Do Include
- Clear markdown sections with headers
- All code cells executed with outputs
- At least 8 interactive Plotly charts
- KPI summary table or cards
- Business insights and recommendations
- README with screenshots
Do Not Include
- Virtual environment folders (venv, .env)
- Any .pyc or __pycache__ files
- Unexecuted notebooks
- Hardcoded absolute file paths
- API keys or credentials
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 500 points.
| Criteria | Points | Description |
|---|---|---|
| Data Cleaning | 75 | Proper handling of data types, missing values, feature engineering |
| KPI Calculations | 100 | All required metrics calculated correctly with clear code |
| Visualizations | 125 | At least 8 interactive, professional Plotly charts |
| Analysis Quality | 75 | Insightful observations and business-relevant findings |
| Code Quality | 50 | Clean, well-organized, reusable code with comments |
| Documentation | 50 | Clear markdown, README with screenshots, requirements.txt |
| Recommendations | 25 | Actionable business recommendations based on analysis |
| Total | 500 |
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your ProjectCommon Issues & Solutions
Problem: pd.to_datetime() fails or produces unexpected results
Solution: Specify the date format explicitly: pd.to_datetime(df['order_date'], format='%Y-%m-%d')
Verify: Check that all dates fall within 2024 and month extraction works correctly
Problem: Charts show blank output or only display in separate browser window
Solution 1: In Jupyter Lab, install the extension: !pip install jupyterlab-plotly
Solution 2: Use fig.show(renderer='notebook') for classic Jupyter
Solution 3: Try import plotly.io as pio; pio.renderers.default='iframe'
Problem: KPIs don't match expected values or seem incorrect
Solution: Double-check your groupby operations and aggregation functions
Tips:
- Verify: Total revenue should be around $1.1M
- Check: AOV should be ~$560
- Validate: 300 unique customers, 2000 total orders
- Test calculations on small subset first
Problem: Months appear in alphabetical order (Apr, Aug, Dec...) instead of chronological
Solution: Convert month names to Categorical with proper ordering
Alternative: Use month numbers for sorting, then display month names
Best Practice: Sort by original date column, then aggregate by month
Problem: FileNotFoundError when loading CSV
Solution: Use relative paths from notebook location: '../data/ecommerce_sales.csv'
Check: Verify your folder structure matches the required layout
Avoid: Never use absolute paths like C:/Users/YourName/...