Project Overview
This capstone project focuses on HR analytics techniques used by People Analytics teams worldwide. You will work with realistic employee data containing demographics, performance, and attrition information to uncover workforce trends, identify attrition risk factors, and build predictive models. Your goal is to deliver actionable insights that help HR leaders retain top talent and optimize workforce strategies.
Attrition Analysis
Identify why employees leave and risk factors
Workforce Demographics
Analyze employee distribution and diversity
Predictive Modeling
Build models to predict employee attrition
Compensation Analysis
Evaluate salary equity and benchmarks
Learning Objectives
Technical Skills
- Perform exploratory data analysis on HR data
- Build classification models for attrition prediction
- Calculate key HR metrics and KPIs
- Create statistical analyses and hypothesis tests
- Design interactive Power BI dashboards
Business Skills
- Interpret attrition patterns and root causes
- Develop employee retention strategies
- Analyze compensation fairness and equity
- Present insights to HR leadership
- Make data-driven workforce recommendations
Business Scenario
TechFlow Solutions Inc.
You have been hired as an HR Data Analyst at TechFlow Solutions, a mid-size technology company with 1,500 employees across multiple departments. The company has been experiencing higher than industry-average employee turnover, and the CHRO wants to understand the root causes and develop data-driven retention strategies.
"Our attrition rate has climbed to 18% this year, well above the tech industry average of 13%. We are losing talented employees and spending millions on recruitment and training. I need a comprehensive analysis that helps us understand who is leaving, why they are leaving, and which employees are at risk so we can take proactive action."
Business Questions to Answer
- What is our current attrition rate by department?
- Which demographics are most likely to leave?
- What are the top reasons for employee departures?
- What is our age and tenure distribution?
- How diverse is our workforce by department?
- What are the education levels across roles?
- Which employees are at high risk of leaving?
- What factors best predict attrition?
- How accurate can our predictions be?
- Is there pay equity across demographics?
- How does compensation relate to attrition?
- What is the relationship between satisfaction and tenure?
The Dataset
You will work with a comprehensive HR dataset containing employee demographics, job information, performance metrics, satisfaction scores, and attrition status for workforce analytics.
Kaggle Datasets (Recommended)
Download real HR analytics datasets from Kaggle for authentic analysis experience. These datasets require a free Kaggle account to download.
Primary Datasets (Use These)
Dataset Structure
| File Name | Records | Description | Key Columns |
|---|---|---|---|
employees.csv |
1,470 | Employee master data with demographics and job info | EmployeeID, Age, Gender, Department, JobRole, Attrition |
performance.csv |
1,470 | Performance ratings and satisfaction scores | EmployeeID, PerformanceRating, JobSatisfaction, WorkLifeBalance |
compensation.csv |
1,470 | Salary and compensation details | EmployeeID, MonthlyIncome, HourlyRate, PercentSalaryHike |
Employee Data Schema
| Column | Type | Description |
|---|---|---|
EmployeeNumber | Integer | Unique employee identifier |
Age | Integer | Employee age in years |
Gender | String | Male or Female |
Department | String | Department name (Sales, R&D, HR) |
JobRole | String | Specific job title |
MonthlyIncome | Integer | Monthly salary in dollars |
YearsAtCompany | Integer | Tenure in years |
Attrition | String | Yes/No - whether employee left |
Sample Data Preview
Understanding the data structure is crucial. Here is what a typical employee record looks like:
| EmployeeNumber | Age | Department | JobRole | MonthlyIncome | Attrition |
|---|---|---|---|---|---|
| 1001 | 35 | Sales | Sales Executive | $5,993 | No |
| 1002 | 28 | R&D | Research Scientist | $4,120 | Yes |
| 1003 | 42 | HR | HR Manager | $8,500 | No |
Project Requirements
Your deliverables must include Python notebooks with analysis, an attrition prediction model, a Power BI dashboard, and comprehensive documentation with actionable recommendations.
Data Exploration & Cleaning
Start with comprehensive exploratory data analysis:
- Load and validate the HR dataset
- Check for missing values and data types
- Convert categorical variables appropriately
- Create derived columns (AgeBand, TenureBand, IncomeLevel)
- Generate summary statistics and distributions
Attrition Analysis
Analyze Attrition Patterns:
- Overall Rate: Calculate company-wide attrition percentage
- By Department: Compare attrition across departments
- By Demographics: Age, gender, education analysis
Key Questions:
- Which departments have highest turnover?
- What tenure length shows most attrition?
- How does satisfaction relate to leaving?
Predictive Modeling
Build Attrition Prediction Model:
- Encode categorical features (LabelEncoder/OneHotEncoder)
- Split data into training and test sets (80/20)
- Handle class imbalance (SMOTE or class weights)
- Train classification models (Logistic Regression, Random Forest)
Model Evaluation:
- Calculate accuracy, precision, recall, F1-score
- Create confusion matrix visualization
- Generate feature importance rankings
Compensation Analysis
Salary Equity Analysis:
- By Gender: Compare average salary across genders
- By Department: Salary distribution by department
- By Role: Compensation benchmarks by job role
Attrition vs Compensation:
- Compare salaries of leavers vs stayers
- Analyze salary hike percentages impact
- Identify underpaid high-risk segments
Power BI Dashboard
Create an interactive HR analytics dashboard with the following pages:
- Workforce Overview: Total employees, demographics, KPI cards
- Attrition Analysis: Rates by department, demographics, trends
- Compensation: Salary distribution, equity analysis
- Risk Assessment: High-risk employee identification
- Satisfaction: Job satisfaction and work-life balance
Dashboard Requirements:
- Interactive slicers (department, gender, job role)
- Drill-through for employee details
- Conditional formatting for risk indicators
- Professional color scheme with HR theme
Insights and Recommendations
Key Deliverables:
- 5-7 data-driven insights about workforce patterns
- Top risk factors for employee attrition
- Retention strategy recommendations by segment
- Compensation adjustment suggestions
- Executive summary document (1-2 pages)
Recommended Workflow
Data Prep
2-3 hours- Download dataset
- EDA and cleaning
- Feature engineering
Analysis
4-5 hours- Attrition analysis
- Predictive modeling
- Compensation review
Dashboard
4-5 hours- Power BI visuals
- Documentation
- Submit project
HR KPI Specifications
Calculate and display the following Key Performance Indicators for HR analytics. Implement these in both Python (for analysis) and Power BI (for visualization).
- Attrition Rate: (Leavers / Total Employees) * 100
- Voluntary Turnover: Employees who chose to leave
- Attrition by Department: Rate per department
- Tenure at Exit: Avg years before leaving
- Monthly Attrition: Leavers per month trend
- Total Headcount: Count of active employees
- Average Age: Mean age of workforce
- Average Tenure: Mean years at company
- Gender Ratio: Male to Female distribution
- Department Distribution: Employees per dept
- Average Salary: Mean monthly income
- Salary by Department: Avg per department
- Salary by Role: Avg per job role
- Gender Pay Gap: Male vs Female salary diff
- Salary Hike %: Average increase percentage
- Job Satisfaction: Avg score (1-4 scale)
- Work-Life Balance: Avg score (1-4 scale)
- Environment Satisfaction: Avg score (1-4 scale)
- Relationship Satisfaction: Avg score (1-4 scale)
- Job Involvement: Engagement level score
Attrition Risk Categories
| Risk Level | Criteria | Description | Recommended Action |
|---|---|---|---|
| High Risk | Prediction > 70% | Strong indicators of leaving soon | Immediate retention intervention |
| Medium Risk | Prediction 40-70% | Some concerning patterns present | Proactive engagement and check-ins |
| Low Risk | Prediction < 40% | Stable with positive indicators | Continue regular engagement |
Required Visualizations
Create the following visualizations in your Power BI dashboard. All charts should be interactive, professionally formatted, and tell a clear HR analytics story.
Business Question: Which departments have highest turnover?
- Departments on Y-axis, attrition rate on X-axis
- Sort descending by attrition rate
- Add company average reference line
Business Question: Which age groups are leaving most?
- Age bands on X-axis (18-25, 26-35, etc.)
- Attrition count or rate on Y-axis
- Color code by risk level
Business Question: What is our workforce gender mix?
- Male vs Female as segments
- Total headcount in center
- Show percentage and count
Business Question: How long do employees stay?
- Years at company on X-axis
- Employee count on Y-axis
- Add average tenure reference line
Business Question: How does pay vary by department?
- Departments on X-axis
- Salary distribution showing median, quartiles
- Highlight outliers for review
Business Question: How do satisfaction scores relate?
- Department on rows, satisfaction type on columns
- Color intensity = average score
- Identify low-satisfaction areas
Business Question: What factors drive attrition most?
- Features on Y-axis, importance on X-axis
- Top 10 predictive features
- From Random Forest or similar model
Business Question: What are key workforce metrics?
- Total Employees, Attrition Rate, Avg Tenure
- Avg Salary, Satisfaction Score, Gender Ratio
- Comparison to previous period
Dashboard Best Practices
Color Strategy
- Red for attrition and high risk
- Green for retention and positive
- Blue for neutral demographics
- Consistent palette across pages
Interactivity
- Cross-filtering between visuals
- Department and role slicers
- Gender and age filters
- Drill-through to employee details
Executive Focus
- Lead with attrition KPIs
- Highlight high-risk segments
- Show actionable insights
- Cost of turnover visibility
Submission Requirements
Create a Google Drive folder with the exact name shown below and share the link with view access:
Required Folder Name
HR-Analytics-Project-[YourName]
Required Project Structure
HR-Analytics-Project-[YourName]/
├── data/
│ └── hr_data.csv # Original dataset (download from above)
├── notebooks/
│ ├── 01_data_exploration.ipynb # EDA and data cleaning
│ ├── 02_attrition_analysis.ipynb # Attrition patterns analysis
│ ├── 03_predictive_model.ipynb # Attrition prediction model
│ └── 04_compensation_analysis.ipynb # Salary and compensation analysis
├── powerbi/
│ └── hr_dashboard.pbix # Your Power BI dashboard file
├── screenshots/
│ ├── dashboard_overview.png # Overview page screenshot
│ ├── dashboard_attrition.png # Attrition analysis screenshot
│ └── dashboard_compensation.png # Compensation page screenshot
├── docs/
│ └── executive_summary.pdf # Executive summary document (1-2 pages)
└── README.md # REQUIRED - see contents below
Do Include
- All Python notebooks with outputs
- Power BI file (.pbix) with all pages
- Clear documentation in README
- Dashboard screenshots
- Executive summary PDF
- Original dataset in data folder
Do Not Include
- Virtual environment folders (.venv)
- Checkpoint files (.ipynb_checkpoints)
- Personal or sensitive information
- Broken file links or formulas
- Unfinished or draft versions
Enter your Google Drive folder link - we will verify your files automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 600 points.
| Criteria | Points | Description |
|---|---|---|
| Data Exploration & Cleaning | 75 | Thorough EDA, data quality checks, and proper data preparation |
| Attrition Analysis | 100 | Comprehensive analysis of attrition patterns and root causes |
| Predictive Modeling | 100 | Classification model with proper evaluation and feature importance |
| Compensation Analysis | 75 | Salary equity analysis and compensation benchmarking |
| Power BI Dashboard | 125 | Professional visualizations, interactive features, clean design |
| Documentation | 50 | Complete README, executive summary, clear explanations |
| Recommendations | 75 | Actionable, data-driven retention recommendations |
| Total | 600 |
Grading Scale
Excellent
540-60090-100%
Good
480-53980-89%
Satisfactory
420-47970-79%
Needs Work
<420<70%
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your Project