Project Overview
This capstone project brings together everything you have learned in the Java DSA course. You will work with real task scheduling data containing 100 system tasks with priorities, dependencies, deadlines, and resource requirements. Your goal is to build a professional task scheduling system that applies data structures and algorithms to solve classic scheduling problems and optimize task execution order.
Priority Queue & Heap
Min/Max heap implementation from scratch
Job Scheduling
SJF, Round Robin, Priority algorithms
Load Balancing
Distribute tasks across resources efficiently
Optimization
Minimize wait time, maximize throughput
Learning Objectives
Technical Skills
- Build min-heap and max-heap from scratch
- Implement classic job scheduling algorithms (SJF, RR, Priority)
- Design load balancing strategies for distributed systems
- Optimize for throughput, latency, and resource utilization
- Apply heaps to solve real scheduling problems
Problem-Solving Skills
- Translate OS scheduling concepts to code
- Analyze time and space complexity trade-offs
- Compare scheduling algorithm performance metrics
- Handle priority inversion and starvation
- Test algorithms with real Google cluster data
Problem Scenario
CloudOps Systems
You have been hired as a Systems Engineer at CloudOps Systems, a cloud infrastructure company that manages thousands of automated tasks across distributed servers. The company runs critical operations including database backups, security scans, data processing pipelines, and deployment workflows that must be scheduled efficiently while respecting priorities, dependencies, and resource constraints.
"We process hundreds of tasks daily across multiple servers. We need a scheduling system that can handle task priorities, respect dependencies between tasks, meet strict deadlines, and allocate resources efficiently. Can you build a scheduler that solves these classic computer science problems?"
Problems to Solve
- Process tasks by priority (highest first)
- Use min-heap for earliest deadline first
- Implement max-heap for highest priority first
- Support dynamic priority updates
- Build dependency graph from task data
- Topological sort for valid execution order
- Detect circular dependencies (cycles)
- Find critical path for completion time
- Schedule tasks to meet deadlines
- Maximize number of tasks completed on time
- Minimize total lateness
- Handle impossible schedules gracefully
- Assign tasks to servers with limited resources
- Interval scheduling for non-overlapping tasks
- Multi-resource constraint satisfaction
- Load balancing across servers
The Dataset
You will work with a comprehensive task scheduling dataset containing 100 system tasks. Download the CSV file containing task data for analysis:
Dataset Download
Download the task scheduler dataset. We provide a curated sample, or you can use the full Google Cluster Dataset from Kaggle for extended analysis.
Original Data Source
This project uses the Google Cluster Data from Kaggle - real traces from Google's production cluster containing millions of job scheduling events. It includes task priorities, resource requirements, scheduling constraints, and execution metrics from actual Google data centers.
Dataset Schema
Each row represents a system task with the following columns:
| Column | Type | Description |
|---|---|---|
task_id | String | Unique task identifier (e.g., T001, T002) |
task_name | String | Descriptive name of the task |
priority | Integer | Priority level 1-5 (5 = highest priority) |
duration_minutes | Integer | Time required to complete task |
deadline | DateTime | Task must complete by this time |
arrival_time | DateTime | When task becomes available |
dependencies | String | Comma-separated task IDs that must complete first |
category | String | Task category (System, Security, Analytics, etc.) |
status | String | Current status (Pending, Running, Completed) |
resource_required | String | Server/resource needed (Server A, B, C, GPU) |
cpu_cores | Integer | CPU cores required (1-16) |
memory_mb | Integer | Memory required in MB (64-16384) |
Tasks are distributed across the following categories:
- System (15 tasks)
- Security (12 tasks)
- Analytics (10 tasks)
- Deployment (8 tasks)
- Monitoring (8 tasks)
- Database (7 tasks)
- Network (6 tasks)
- Testing (8 tasks)
- Performance (6 tasks)
- Communication (4 tasks)
- Compliance (5 tasks)
- Others (11 tasks)
For classic scheduling problems, focus on these key fields:
// Example: Task representation for scheduling
class Task {
String taskId; // Unique identifier
int priority; // 1-5 (use for heap ordering)
int duration; // Duration in minutes
LocalDateTime deadline; // Must complete by
List<String> dependencies; // Tasks that must finish first
String resource; // Required server/resource
}
// This becomes input for:
// - Priority Queue Scheduling (max-heap by priority)
// - Earliest Deadline First (min-heap by deadline)
// - Topological Sort (dependency graph)
// - Interval Scheduling (resource allocation)
Sample Data Preview
Here is what the data looks like from task_scheduler.csv:
| Task ID | Task Name | Priority | Duration | Deadline | Dependencies | Resource |
|---|---|---|---|---|---|---|
| T001 | Database Backup | 5 | 45 min | 2024-01-15 18:00 | - | Server A |
| T003 | Data Processing | 4 | 120 min | 2024-01-15 14:00 | T001 | Server A |
| T015 | ML Model Training | 5 | 240 min | 2024-01-16 08:00 | T010 | GPU Server |
Project Requirements
Your project must include all of the following components. Structure your code with clear organization, documentation, and comprehensive testing.
Scheduling Algorithms (5 Required)
Implement the following classic scheduling algorithms:
- Priority Queue Scheduler: Process tasks by priority using max-heap
- Earliest Deadline First (EDF): Schedule by deadline using min-heap
- Topological Sort Scheduler: Respect task dependencies
- Deadline Scheduling: Maximize tasks completed on time (greedy)
- Task Scheduling with Cooldown: LeetCode 621 - CPU interval scheduling
Data Structures Implementation
Build the following custom data structures:
- TaskPriorityQueue: Custom heap with update and remove operations
- DependencyGraph: Directed graph for task dependencies
- TaskRegistry: Hash map with O(1) task lookup by ID
- ResourceManager: Track server availability and allocation
Testing and Validation
Comprehensive test coverage:
- Unit Tests: JUnit tests for each algorithm and data structure
- Edge Cases: Empty task list, single task, circular dependencies, impossible deadlines
- Performance Tests: Verify time complexity with large inputs
- Integration Tests: Test with actual CSV data
Documentation and Analysis
Technical documentation:
- Algorithm Explanations: Describe approach for each scheduling problem
- Complexity Analysis: Time and space complexity for each solution
- Trade-offs: Compare different scheduling strategies
- Results: Scheduling outcomes on provided dataset
Algorithm Specifications
Implement the following algorithms with optimal time and space complexity. Each algorithm should be thoroughly tested and documented.
Process tasks in order of priority using a max-heap.
// Example:
// tasks = [(T1, p=3), (T2, p=5), (T3, p=1)]
// Output: T2 -> T1 -> T3
public List<Task> scheduleByPriority(List<Task> tasks) {
PriorityQueue<Task> maxHeap =
new PriorityQueue<>((a, b) -> b.priority - a.priority);
maxHeap.addAll(tasks);
List<Task> schedule = new ArrayList<>();
while (!maxHeap.isEmpty()) {
schedule.add(maxHeap.poll());
}
return schedule;
}
Order tasks respecting dependencies (Kahn's Algorithm).
// Example:
// T1 -> T2 -> T3 (T2 depends on T1)
// Output: [T1, T2, T3]
public List<Task> topologicalSort(List<Task> tasks) {
Map<String, Integer> inDegree = new HashMap<>();
Queue<Task> queue = new LinkedList<>();
// Initialize in-degrees
for (Task t : tasks) {
inDegree.put(t.id, t.dependencies.size());
if (t.dependencies.isEmpty()) {
queue.offer(t);
}
}
// BFS processing...
}
Schedule tasks by earliest deadline using min-heap.
// Schedule tasks to minimize missed deadlines
public List<Task> scheduleByDeadline(List<Task> tasks) {
PriorityQueue<Task> minHeap =
new PriorityQueue<>(
Comparator.comparing(t -> t.deadline));
minHeap.addAll(tasks);
List<Task> schedule = new ArrayList<>();
while (!minHeap.isEmpty()) {
schedule.add(minHeap.poll());
}
return schedule;
}
Schedule tasks with cooldown constraint.
// Given tasks and cooldown n, find minimum time
// tasks = ["A","A","A","B","B","B"], n = 2
// Output: 8 (A -> B -> idle -> A -> B -> idle -> A -> B)
public int leastInterval(char[] tasks, int n) {
int[] freq = new int[26];
for (char c : tasks) freq[c - 'A']++;
Arrays.sort(freq);
int maxFreq = freq[25];
int idleSlots = (maxFreq - 1) * n;
for (int i = 24; i >= 0; i--) {
idleSlots -= Math.min(freq[i], maxFreq - 1);
}
return tasks.length + Math.max(0, idleSlots);
}
Detect circular dependencies in task graph.
// Return true if circular dependency exists
public boolean hasCycle(List<Task> tasks) {
Map<String, Integer> state = new HashMap<>();
// 0 = unvisited, 1 = visiting, 2 = visited
for (Task t : tasks) {
if (dfs(t, state, taskMap)) {
return true; // Cycle found
}
}
return false;
}
private boolean dfs(Task t, Map<String, Integer> state) {
if (state.get(t.id) == 1) return true; // Back edge
if (state.get(t.id) == 2) return false;
state.put(t.id, 1);
// ... check dependencies
}
Maximize tasks completed before deadlines (greedy).
// Maximize number of tasks completed on time
public int maxTasksOnTime(List<Task> tasks) {
// Sort by deadline
Collections.sort(tasks,
Comparator.comparing(t -> t.deadline));
PriorityQueue<Task> scheduled =
new PriorityQueue<>((a,b) -> b.duration - a.duration);
int currentTime = 0;
for (Task t : tasks) {
scheduled.offer(t);
currentTime += t.duration;
if (currentTime > t.deadline) {
currentTime -= scheduled.poll().duration;
}
}
return scheduled.size();
}
Data Structure Specifications
Implement the following data structures to support efficient scheduling operations. Focus on optimal time complexity for the core operations.
Custom priority queue with update and remove by ID.
insert(Task t)- O(log n)extractMax()- O(log n)updatePriority(id, newPriority)- O(log n)remove(id)- O(log n)peek()- O(1)
Directed graph for task dependency management.
addTask(Task t)- O(1)addDependency(from, to)- O(1)getDependencies(id)- O(1)getDependents(id)- O(1)hasCycle()- O(V + E)topologicalOrder()- O(V + E)
Fast lookup table for tasks by various keys.
register(Task t)- O(1)getById(id)- O(1)getByCategory(category)- O(1)getByResource(resource)- O(1)getByPriority(priority)- O(1)updateStatus(id, status)- O(1)
Track and allocate server resources to tasks.
addResource(Resource r)- O(1)allocate(task, resource)- O(1)release(task)- O(1)isAvailable(resource, time)- O(log n)getAvailableSlot(resource, duration)- O(log n)
Submission Guidelines
Submit your project via GitHub. Ensure your repository is public and follows the structure below.
task-scheduler-system/
├── src/
│ ├── main/java/
│ │ ├── models/
│ │ │ └── Task.java
│ │ ├── datastructures/
│ │ │ ├── TaskPriorityQueue.java
│ │ │ ├── DependencyGraph.java
│ │ │ ├── TaskRegistry.java
│ │ │ └── ResourceManager.java
│ │ ├── algorithms/
│ │ │ ├── PriorityScheduler.java
│ │ │ ├── TopologicalScheduler.java
│ │ │ ├── DeadlineScheduler.java
│ │ │ ├── TaskSchedulerWithCooldown.java
│ │ │ └── CycleDetector.java
│ │ └── Main.java
│ └── test/java/
│ ├── datastructures/
│ └── algorithms/
├── data/
│ └── task_scheduler.csv
├── docs/
│ ├── algorithm-analysis.md
│ └── complexity-report.md
├── README.md
└── pom.xml (or build.gradle)
README Requirements
- Project overview and objectives
- Setup instructions (compilation, dependencies)
- Algorithm descriptions with examples
- Time and space complexity analysis
- Usage examples with sample outputs
- Test cases documentation and results
- Performance benchmarks on the dataset
Submission Checklist
- All 5 scheduling algorithms implemented
- All 4 data structures implemented
- JUnit test suite with 90%+ coverage
- Edge cases handled (cycles, empty, impossible)
- Code properly documented with Javadoc
- README with all required sections
- Complexity analysis document
Grading Rubric
Your project will be evaluated on the following criteria. Maximum score is 500 points.
| Category | Criteria | Points |
|---|---|---|
| Algorithms |
|
150 |
| Data Structures |
|
100 |
| Testing |
|
100 |
| Documentation |
|
100 |
| Code Quality |
|
50 |
| Total | 500 | |
Excellent
Outstanding implementation
Good
Meets all requirements
Needs Work
Missing key requirements