Capstone Project 4

HR Analytics

Analyze employee data to uncover workforce trends, predict attrition risk, and build an interactive HR dashboard that helps organizations retain top talent and optimize HR strategies.

10-14 hours
Intermediate
600 Points
What You Will Build
  • Employee attrition analysis
  • Workforce demographics dashboard
  • Attrition prediction model
  • Compensation analysis report
  • Interactive Power BI dashboard
Contents
01

Project Overview

This capstone project focuses on HR analytics techniques used by People Analytics teams worldwide. You will work with realistic employee data containing demographics, performance, and attrition information to uncover workforce trends, identify attrition risk factors, and build predictive models. Your goal is to deliver actionable insights that help HR leaders retain top talent and optimize workforce strategies.

Skills Applied: This project tests your proficiency in Python (pandas, scikit-learn), Power BI (DAX, visualizations), statistical analysis, predictive modeling, and HR domain knowledge.
Attrition Analysis

Identify why employees leave and risk factors

Workforce Demographics

Analyze employee distribution and diversity

Predictive Modeling

Build models to predict employee attrition

Compensation Analysis

Evaluate salary equity and benchmarks

Learning Objectives

Technical Skills
  • Perform exploratory data analysis on HR data
  • Build classification models for attrition prediction
  • Calculate key HR metrics and KPIs
  • Create statistical analyses and hypothesis tests
  • Design interactive Power BI dashboards
Business Skills
  • Interpret attrition patterns and root causes
  • Develop employee retention strategies
  • Analyze compensation fairness and equity
  • Present insights to HR leadership
  • Make data-driven workforce recommendations
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

TechFlow Solutions Inc.

You have been hired as an HR Data Analyst at TechFlow Solutions, a mid-size technology company with 1,500 employees across multiple departments. The company has been experiencing higher than industry-average employee turnover, and the CHRO wants to understand the root causes and develop data-driven retention strategies.

"Our attrition rate has climbed to 18% this year, well above the tech industry average of 13%. We are losing talented employees and spending millions on recruitment and training. I need a comprehensive analysis that helps us understand who is leaving, why they are leaving, and which employees are at risk so we can take proactive action."

Michael Rodriguez, Chief Human Resources Officer

Business Questions to Answer

Attrition Analysis
  • What is our current attrition rate by department?
  • Which demographics are most likely to leave?
  • What are the top reasons for employee departures?
Workforce Demographics
  • What is our age and tenure distribution?
  • How diverse is our workforce by department?
  • What are the education levels across roles?
Predictive Insights
  • Which employees are at high risk of leaving?
  • What factors best predict attrition?
  • How accurate can our predictions be?
Compensation & Satisfaction
  • Is there pay equity across demographics?
  • How does compensation relate to attrition?
  • What is the relationship between satisfaction and tenure?
Pro Tip: Think like a CHRO! Your analysis should provide clear, actionable insights with specific retention strategies that HR teams can immediately implement to reduce turnover.
03

The Dataset

You will work with a comprehensive HR dataset containing employee demographics, job information, performance metrics, satisfaction scores, and attrition status for workforce analytics.

Kaggle Datasets (Recommended)

Download real HR analytics datasets from Kaggle for authentic analysis experience. These datasets require a free Kaggle account to download.

Primary Datasets (Use These)
IBM HR Analytics 1,470 rows Employee attrition dataset with 35 features, ideal for prediction
HR Employee Data 311 rows Rich dataset with termination reasons and manager info
HR Analytics Dataset 1,480 rows Comprehensive HR data with satisfaction and performance
Employee Attrition 1,470 rows Attrition dataset with work-life balance and job involvement
Note: You need a free Kaggle account to download datasets.
Dataset Structure
File Name Records Description Key Columns
employees.csv 1,470 Employee master data with demographics and job info EmployeeID, Age, Gender, Department, JobRole, Attrition
performance.csv 1,470 Performance ratings and satisfaction scores EmployeeID, PerformanceRating, JobSatisfaction, WorkLifeBalance
compensation.csv 1,470 Salary and compensation details EmployeeID, MonthlyIncome, HourlyRate, PercentSalaryHike
Employee Data Schema
Column Type Description
EmployeeNumberIntegerUnique employee identifier
AgeIntegerEmployee age in years
GenderStringMale or Female
DepartmentStringDepartment name (Sales, R&D, HR)
JobRoleStringSpecific job title
MonthlyIncomeIntegerMonthly salary in dollars
YearsAtCompanyIntegerTenure in years
AttritionStringYes/No - whether employee left
Sample Data Preview

Understanding the data structure is crucial. Here is what a typical employee record looks like:

EmployeeNumberAgeDepartmentJobRoleMonthlyIncomeAttrition
100135SalesSales Executive$5,993No
100228R&DResearch Scientist$4,120Yes
100342HRHR Manager$8,500No
Data Scope: 1,470 employees, 35 features including demographics, performance, and satisfaction scores
Data Quality: Clean dataset with no missing values, ready for analysis and modeling
Data Preparation Note: Convert categorical variables (Attrition, Department, Gender) to appropriate formats for analysis. Create derived features like TenureBand and AgeBand for segmentation.
04

Project Requirements

Your deliverables must include Python notebooks with analysis, an attrition prediction model, a Power BI dashboard, and comprehensive documentation with actionable recommendations.

1
Data Exploration & Cleaning

Start with comprehensive exploratory data analysis:

  • Load and validate the HR dataset
  • Check for missing values and data types
  • Convert categorical variables appropriately
  • Create derived columns (AgeBand, TenureBand, IncomeLevel)
  • Generate summary statistics and distributions
Notebook: 01_data_exploration.ipynb
2
Attrition Analysis

Analyze Attrition Patterns:

  • Overall Rate: Calculate company-wide attrition percentage
  • By Department: Compare attrition across departments
  • By Demographics: Age, gender, education analysis

Key Questions:

  • Which departments have highest turnover?
  • What tenure length shows most attrition?
  • How does satisfaction relate to leaving?
Use cross-tabulations and chi-square tests for categorical analysis
3
Predictive Modeling

Build Attrition Prediction Model:

  • Encode categorical features (LabelEncoder/OneHotEncoder)
  • Split data into training and test sets (80/20)
  • Handle class imbalance (SMOTE or class weights)
  • Train classification models (Logistic Regression, Random Forest)

Model Evaluation:

  • Calculate accuracy, precision, recall, F1-score
  • Create confusion matrix visualization
  • Generate feature importance rankings
4
Compensation Analysis

Salary Equity Analysis:

  • By Gender: Compare average salary across genders
  • By Department: Salary distribution by department
  • By Role: Compensation benchmarks by job role

Attrition vs Compensation:

  • Compare salaries of leavers vs stayers
  • Analyze salary hike percentages impact
  • Identify underpaid high-risk segments
5
Power BI Dashboard

Create an interactive HR analytics dashboard with the following pages:

  • Workforce Overview: Total employees, demographics, KPI cards
  • Attrition Analysis: Rates by department, demographics, trends
  • Compensation: Salary distribution, equity analysis
  • Risk Assessment: High-risk employee identification
  • Satisfaction: Job satisfaction and work-life balance

Dashboard Requirements:

  • Interactive slicers (department, gender, job role)
  • Drill-through for employee details
  • Conditional formatting for risk indicators
  • Professional color scheme with HR theme
6
Insights and Recommendations

Key Deliverables:

  • 5-7 data-driven insights about workforce patterns
  • Top risk factors for employee attrition
  • Retention strategy recommendations by segment
  • Compensation adjustment suggestions
  • Executive summary document (1-2 pages)

Recommended Workflow

Total Estimated Time: 10-14 hours Recommended: Spread across 3-4 days
1
Data Prep
2-3 hours
  • Download dataset
  • EDA and cleaning
  • Feature engineering
2
Analysis
4-5 hours
  • Attrition analysis
  • Predictive modeling
  • Compensation review
3
Dashboard
4-5 hours
  • Power BI visuals
  • Documentation
  • Submit project
05

HR KPI Specifications

Calculate and display the following Key Performance Indicators for HR analytics. Implement these in both Python (for analysis) and Power BI (for visualization).

Attrition Metrics
  • Attrition Rate: (Leavers / Total Employees) * 100
  • Voluntary Turnover: Employees who chose to leave
  • Attrition by Department: Rate per department
  • Tenure at Exit: Avg years before leaving
  • Monthly Attrition: Leavers per month trend
Workforce Metrics
  • Total Headcount: Count of active employees
  • Average Age: Mean age of workforce
  • Average Tenure: Mean years at company
  • Gender Ratio: Male to Female distribution
  • Department Distribution: Employees per dept
Compensation Metrics
  • Average Salary: Mean monthly income
  • Salary by Department: Avg per department
  • Salary by Role: Avg per job role
  • Gender Pay Gap: Male vs Female salary diff
  • Salary Hike %: Average increase percentage
Satisfaction Metrics
  • Job Satisfaction: Avg score (1-4 scale)
  • Work-Life Balance: Avg score (1-4 scale)
  • Environment Satisfaction: Avg score (1-4 scale)
  • Relationship Satisfaction: Avg score (1-4 scale)
  • Job Involvement: Engagement level score
Attrition Risk Categories
Risk Level Criteria Description Recommended Action
High Risk Prediction > 70% Strong indicators of leaving soon Immediate retention intervention
Medium Risk Prediction 40-70% Some concerning patterns present Proactive engagement and check-ins
Low Risk Prediction < 40% Stable with positive indicators Continue regular engagement
Python: Use pandas for calculations, matplotlib/seaborn for visualizations, and scikit-learn for predictive modeling.
Power BI: Create DAX measures for dynamic calculations. Use CALCULATE with filters for department-specific metrics.
06

Required Visualizations

Create the following visualizations in your Power BI dashboard. All charts should be interactive, professionally formatted, and tell a clear HR analytics story.

1 Attrition by Department Bar Chart

Business Question: Which departments have highest turnover?

  • Departments on Y-axis, attrition rate on X-axis
  • Sort descending by attrition rate
  • Add company average reference line
2 Attrition by Age Group Column Chart

Business Question: Which age groups are leaving most?

  • Age bands on X-axis (18-25, 26-35, etc.)
  • Attrition count or rate on Y-axis
  • Color code by risk level
3 Gender Distribution Donut Chart

Business Question: What is our workforce gender mix?

  • Male vs Female as segments
  • Total headcount in center
  • Show percentage and count
4 Tenure Distribution Histogram

Business Question: How long do employees stay?

  • Years at company on X-axis
  • Employee count on Y-axis
  • Add average tenure reference line
5 Salary by Department Box Plot

Business Question: How does pay vary by department?

  • Departments on X-axis
  • Salary distribution showing median, quartiles
  • Highlight outliers for review
6 Satisfaction Heatmap Matrix

Business Question: How do satisfaction scores relate?

  • Department on rows, satisfaction type on columns
  • Color intensity = average score
  • Identify low-satisfaction areas
7 Feature Importance Bar Chart

Business Question: What factors drive attrition most?

  • Features on Y-axis, importance on X-axis
  • Top 10 predictive features
  • From Random Forest or similar model
8 KPI Cards Card

Business Question: What are key workforce metrics?

  • Total Employees, Attrition Rate, Avg Tenure
  • Avg Salary, Satisfaction Score, Gender Ratio
  • Comparison to previous period
Dashboard Best Practices
Color Strategy
  • Red for attrition and high risk
  • Green for retention and positive
  • Blue for neutral demographics
  • Consistent palette across pages
Interactivity
  • Cross-filtering between visuals
  • Department and role slicers
  • Gender and age filters
  • Drill-through to employee details
Executive Focus
  • Lead with attrition KPIs
  • Highlight high-risk segments
  • Show actionable insights
  • Cost of turnover visibility
07

Submission Requirements

Create a Google Drive folder with the exact name shown below and share the link with view access:

Required Folder Name
HR-Analytics-Project-[YourName]
Example: HR-Analytics-Project-JohnSmith
Required Project Structure
HR-Analytics-Project-[YourName]/
├── data/
│   └── hr_data.csv                   # Original dataset (download from above)
├── notebooks/
│   ├── 01_data_exploration.ipynb     # EDA and data cleaning
│   ├── 02_attrition_analysis.ipynb   # Attrition patterns analysis
│   ├── 03_predictive_model.ipynb     # Attrition prediction model
│   └── 04_compensation_analysis.ipynb # Salary and compensation analysis
├── powerbi/
│   └── hr_dashboard.pbix             # Your Power BI dashboard file
├── screenshots/
│   ├── dashboard_overview.png        # Overview page screenshot
│   ├── dashboard_attrition.png       # Attrition analysis screenshot
│   └── dashboard_compensation.png    # Compensation page screenshot
├── docs/
│   └── executive_summary.pdf         # Executive summary document (1-2 pages)
└── README.md                         # REQUIRED - see contents below
Do Include
  • All Python notebooks with outputs
  • Power BI file (.pbix) with all pages
  • Clear documentation in README
  • Dashboard screenshots
  • Executive summary PDF
  • Original dataset in data folder
Do Not Include
  • Virtual environment folders (.venv)
  • Checkpoint files (.ipynb_checkpoints)
  • Personal or sensitive information
  • Broken file links or formulas
  • Unfinished or draft versions
Submit Your Project

Enter your Google Drive folder link - we will verify your files automatically

08

Grading Rubric

Your project will be graded on the following criteria. Total: 600 points.

Criteria Points Description
Data Exploration & Cleaning 75 Thorough EDA, data quality checks, and proper data preparation
Attrition Analysis 100 Comprehensive analysis of attrition patterns and root causes
Predictive Modeling 100 Classification model with proper evaluation and feature importance
Compensation Analysis 75 Salary equity analysis and compensation benchmarking
Power BI Dashboard 125 Professional visualizations, interactive features, clean design
Documentation 50 Complete README, executive summary, clear explanations
Recommendations 75 Actionable, data-driven retention recommendations
Total 600
Grading Scale
Excellent
540-600

90-100%

Good
480-539

80-89%

Satisfactory
420-479

70-79%

Needs Work
<420

<70%

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
09

Pre-Submission Checklist

Python Notebook Requirements
Power BI Dashboard Requirements
Documentation Requirements
Submission Requirements