Capstone Project 5

Time Series Forecasting

Build a comprehensive energy consumption forecasting system using time series decomposition, ARIMA modeling, and Facebook Prophet. Master trend analysis, seasonality patterns, and forecast evaluation metrics.

12-18 hours
Advanced
600 Points
What You Will Build
  • Time series decomposition pipeline
  • ARIMA and Prophet forecasting models
  • Seasonal pattern analysis
  • Forecast accuracy evaluation
  • Interactive forecast visualizations
Contents
01

Project Overview

In this capstone project, you will build an end-to-end time series forecasting system for energy consumption prediction. You will apply trend and seasonality decomposition, implement ARIMA and Prophet models, and evaluate forecast accuracy using industry-standard metrics.

Skills Applied: This project tests your proficiency in time series analysis, stationarity testing, seasonal decomposition, ARIMA/SARIMA modeling, and Facebook Prophet.
Decomposition

Extract trend, seasonality, and residuals

ARIMA/SARIMA

Fit statistical time series models

Prophet

Build Facebook Prophet forecasts

Evaluation

Compare models with MAE, RMSE, MAPE

Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

GridSmart Energy Solutions

You have been hired as a Senior Data Scientist at GridSmart Energy Solutions, a leading smart grid technology company serving metropolitan areas across India. The company manages energy distribution for over 2 million households and 50,000 commercial establishments.

"Welcome to the GridSmart analytics team! Our biggest challenge is predicting energy consumption accurately to optimize grid operations and prevent blackouts. We need you to build a forecasting system that captures daily patterns, weekly cycles, and seasonal trends. Your models will directly impact our capacity planning and help us reduce energy waste by 15-20%."

Dr. Priya Sharma, Chief Analytics Officer

Business Questions to Answer

Trend Analysis
  • What is the overall trend in energy consumption?
  • Is demand increasing or stabilizing over time?
  • What is the long-term growth rate?
Seasonality
  • What are the daily consumption patterns?
  • How does temperature affect energy usage?
  • What are weekly and yearly cycles?
Peak Prediction
  • When do peak consumption periods occur?
  • Can we predict peak demand 24-48 hours ahead?
  • How accurate are short-term forecasts?
Model Comparison
  • Which model performs best for short-term forecasts?
  • Which model handles seasonality better?
  • What are the trade-offs between models?
Pro Tip: Think like an energy analyst! Your forecasts should help grid operators plan capacity and prevent blackouts.
03

The Dataset

The dataset contains hourly energy consumption readings from GridSmart's smart meter network, spanning from January 2022 to January 2025. It includes weather data and temporal features essential for time series analysis.

Dataset Schema
Column Type Description
dateDatetimeDate of the reading (YYYY-MM-DD)
consumption_kwhFloatEnergy consumption in kilowatt-hours
temperature_cFloatAmbient temperature in Celsius
humidity_pctIntegerRelative humidity percentage
is_weekendIntegerWeekend indicator (0=Weekday, 1=Weekend)
is_holidayIntegerPublic holiday indicator (0=No, 1=Yes)
hourIntegerHour of the day (0-23)
day_of_weekIntegerDay of week (0=Monday, 6=Sunday)
monthIntegerMonth of the year (1-12)
yearIntegerYear of the reading
Dataset Stats: 3+ years of hourly data, 26,000+ readings, weather features, and calendar indicators
04

Project Requirements

Complete the following steps to build your time series forecasting system. Each step builds upon the previous one, culminating in a comprehensive forecasting solution.

1
Project Setup and Data Loading

Create project structure with data/, notebooks/, and reports/ folders. Load energy_consumption.csv with proper datetime parsing. Set datetime as index and verify data types.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from prophet import Prophet

# Load the data
df = pd.read_csv('data/energy_consumption.csv', parse_dates=['date'])
df.set_index('date', inplace=True)
print(f"Dataset shape: {df.shape}")
df.head()
2
Exploratory Time Series Analysis
  • Plot the complete time series to visualize patterns
  • Analyze hourly, daily, and monthly consumption patterns
  • Identify correlation between consumption and temperature
  • Compare weekday vs weekend consumption profiles
3
Stationarity Testing
  • Perform Augmented Dickey-Fuller (ADF) test
  • Perform KPSS test for stationarity confirmation
  • Apply differencing if series is non-stationary
  • Document transformation steps needed
from statsmodels.tsa.stattools import adfuller, kpss

# ADF Test
adf_result = adfuller(daily_data.dropna())
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"p-value: {adf_result[1]:.4f}")
4
Time Series Decomposition
  • Perform additive decomposition on daily aggregated data
  • Perform multiplicative decomposition for comparison
  • Extract and analyze trend, seasonal, and residual components
  • Determine which decomposition type fits better
5
ARIMA/SARIMA Model Development
  • Plot ACF and PACF to determine p, d, q parameters
  • Fit ARIMA model with selected parameters
  • Implement SARIMA to capture seasonal patterns
  • Perform model diagnostics (residual analysis)
6
Prophet Model Development
  • Prepare data in Prophet format (ds, y columns)
  • Configure daily and weekly seasonality
  • Add holiday effects for Indian holidays
  • Tune changepoint parameters for trend flexibility
7
Model Evaluation and Comparison
  • Split data into train (80%) and test (20%) sets
  • Calculate MAE, RMSE, and MAPE for both models
  • Perform time series cross-validation
  • Compare model performance and select best model
8
Forecasting and Business Insights
  • Generate 7-day and 30-day forecasts with confidence intervals
  • Visualize forecasts against actual values
  • Identify peak consumption periods
  • Provide actionable business recommendations
05

Time Series Decomposition

Decomposition separates a time series into its fundamental components: trend, seasonality, and residuals. This helps understand underlying patterns and improves forecasting accuracy.

Additive Decomposition

Use when seasonal variation is constant over time.

Y(t) = Trend + Seasonal + Residual

from statsmodels.tsa.seasonal import seasonal_decompose

# Aggregate to daily for decomposition
daily_data = df['consumption_kwh'].resample('D').sum()

# Additive decomposition
result_add = seasonal_decompose(
    daily_data, 
    model='additive', 
    period=7  # Weekly seasonality
)
result_add.plot()
plt.tight_layout()
plt.show()
Multiplicative Decomposition

Use when seasonal variation changes proportionally with trend.

Y(t) = Trend × Seasonal × Residual

# Multiplicative decomposition
result_mult = seasonal_decompose(
    daily_data, 
    model='multiplicative', 
    period=7
)
result_mult.plot()
plt.tight_layout()
plt.show()

# Compare residual variance
add_var = result_add.resid.var()
mult_var = result_mult.resid.var()
print(f"Additive Var: {add_var:.2f}")
print(f"Multiplicative Var: {mult_var:.2f}")
Decomposition Tips: Use 7 for weekly patterns, 365 for yearly patterns. If seasonal amplitude grows with trend, use multiplicative.
06

Forecasting Models

Implement both ARIMA and Prophet models to compare their forecasting performance on energy consumption data. Each has strengths for different forecasting scenarios.

AR (p)

Autoregressive order - number of lag observations

I (d)

Integration order - degree of differencing

MA (q)

Moving average order - size of moving average window

ARIMA / SARIMA Model
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Plot ACF and PACF to determine p and q
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(daily_data.dropna(), lags=30, ax=axes[0])
plot_pacf(daily_data.dropna(), lags=30, ax=axes[1])
plt.tight_layout()
plt.show()

# Fit SARIMA model with seasonal component
model = SARIMAX(
    train_data,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 7),  # Weekly seasonality
)
sarima_result = model.fit(disp=False)

# Model diagnostics
sarima_result.plot_diagnostics(figsize=(12, 8))
plt.show()

# Forecast
forecast = sarima_result.forecast(steps=30)
conf_int = sarima_result.get_forecast(steps=30).conf_int()
Facebook Prophet Model

Prophet is designed for business forecasting with automatic handling of seasonality, holidays, and trend changes. It is particularly robust to missing data and outliers.

from prophet import Prophet

# Prepare data for Prophet (requires 'ds' and 'y' columns)
prophet_df = daily_data.reset_index()
prophet_df.columns = ['ds', 'y']

# Create and configure Prophet model
model = Prophet(
    weekly_seasonality=True,
    yearly_seasonality=True,
    changepoint_prior_scale=0.05
)

# Add Indian holidays
model.add_country_holidays(country_name='IN')
model.fit(prophet_df)

# Create future dataframe and forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# Plot forecast and components
model.plot(forecast)
model.plot_components(forecast)
plt.show()
Model Comparison
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

def calculate_mape(actual, predicted):
    """Calculate Mean Absolute Percentage Error"""
    return np.mean(np.abs((actual - predicted) / actual)) * 100

def evaluate_model(actual, predicted, model_name):
    """Evaluate forecasting model performance"""
    mae = mean_absolute_error(actual, predicted)
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mape = calculate_mape(actual, predicted)
    return {'Model': model_name, 'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Compare models
comparison_df = pd.DataFrame([
    evaluate_model(test_data, sarima_forecast, 'SARIMA'),
    evaluate_model(test_data, prophet_forecast, 'Prophet')
])
print(comparison_df)
07

Required Visualizations

Create the following visualizations to demonstrate your time series analysis and forecasting capabilities. Each visualization should be properly labeled.

1. Line Chart
Complete Time Series

Full consumption data over the entire period

2. Multi-Panel
Decomposition Plot

Trend, seasonal, and residual components

3. Correlation
ACF and PACF

Autocorrelation function plots

4. Heatmap
Hourly Consumption Pattern

Hour vs day of week heatmap

5. Box Plot
Monthly Distribution

Consumption by month showing seasonality

6. Scatter Plot
Temperature vs Consumption

Relationship between temp and usage

7. Diagnostic
SARIMA Residuals

4-panel diagnostic plot for validation

8. Forecast
SARIMA Forecast

Forecast with confidence intervals

9. Prophet Forecast
Prophet Forecast Plot

Prophet forecast with uncertainty

10. Components
Prophet Components

Trend, weekly, and yearly seasonality

08

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
time-series-forecasting
github.com/<your-username>/time-series-forecasting
Required Project Structure
time-series-forecasting/
├── data/
│   └── energy_consumption.csv  # The dataset
├── notebooks/
│   └── time_series_analysis.ipynb  # Your main analysis notebook
├── reports/
│   └── (visualizations)  # Saved plots
├── requirements.txt  # Python dependencies
└── README.md  # Project documentation
README.md Must Include:
  • Your full name and submission date
  • Project overview and business context
  • Key findings (5-7 bullet points)
  • Technologies used (Python, statsmodels, Prophet, etc.)
  • Instructions to run the notebook
  • Screenshots of at least 3 visualizations
requirements.txt
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
statsmodels>=0.14.0
prophet>=1.1.0
jupyter>=1.0.0
Do Include
  • Clear markdown sections with headers
  • All code cells executed with outputs
  • All 10 required visualizations
  • Model comparison table with metrics
  • Business insights and recommendations
  • README with screenshots
Do Not Include
  • Virtual environment folders (venv, .env)
  • Any .pyc or __pycache__ files
  • Unexecuted notebooks
  • Hardcoded absolute file paths
  • API keys or credentials
Important: Before submitting, run Kernel > Restart and Run All to ensure your notebook executes from top to bottom without errors!
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

09

Grading Rubric

Your project will be graded on the following criteria. Total: 600 points.

Criteria Points Description
Data Loading & EDA 110 Proper datetime parsing, missing values, pattern identification
Stationarity & Decomposition 120 ADF/KPSS tests, additive and multiplicative decomposition
ARIMA/SARIMA Model 80 Parameter selection, fitting, and diagnostics
Prophet Model 80 Configuration, seasonality, and holiday effects
Model Evaluation 60 MAE, RMSE, MAPE calculation and comparison
Visualizations 70 All 10 required charts with proper labels
Code & Documentation 40 Clean code, README, requirements.txt
Business Insights 40 Actionable recommendations based on analysis
Total 600

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
10

Pre-Submission Checklist

Notebook Requirements
Repository Requirements