Project 6: Task Scheduler System | Java DSA Course

Project Overview

This capstone project brings together everything you have learned in the Java DSA course. You will work with real task scheduling data containing 100 system tasks with priorities, dependencies, deadlines, and resource requirements. Your goal is to build a professional task scheduling system that applies data structures and algorithms to solve classic scheduling problems and optimize task execution order.

Skills Applied: This project tests your proficiency in Priority Queues & Heaps (min-heap, max-heap implementations), Job Scheduling Algorithms (SJF, Round Robin, Priority Scheduling), Load Balancing (resource distribution, optimization), and Graph Algorithms (topological sort for dependencies).

Priority Queue & Heap

Min/Max heap implementation from scratch

Job Scheduling

SJF, Round Robin, Priority algorithms

Load Balancing

Distribute tasks across resources efficiently

Optimization

Minimize wait time, maximize throughput

Learning Objectives

Technical Skills

Build min-heap and max-heap from scratch
Implement classic job scheduling algorithms (SJF, RR, Priority)
Design load balancing strategies for distributed systems
Optimize for throughput, latency, and resource utilization
Apply heaps to solve real scheduling problems

Problem-Solving Skills

Translate OS scheduling concepts to code
Analyze time and space complexity trade-offs
Compare scheduling algorithm performance metrics
Handle priority inversion and starvation
Test algorithms with real Google cluster data

Ready to submit? Already completed the project? Submit your work now!

Submit Now

Problem Scenario

CloudOps Systems

You have been hired as a Systems Engineer at CloudOps Systems, a cloud infrastructure company that manages thousands of automated tasks across distributed servers. The company runs critical operations including database backups, security scans, data processing pipelines, and deployment workflows that must be scheduled efficiently while respecting priorities, dependencies, and resource constraints.

"We process hundreds of tasks daily across multiple servers. We need a scheduling system that can handle task priorities, respect dependencies between tasks, meet strict deadlines, and allocate resources efficiently. Can you build a scheduler that solves these classic computer science problems?"

Alex Rivera, Chief Operations Officer

Problems to Solve

Priority Scheduling

Process tasks by priority (highest first)
Use min-heap for earliest deadline first
Implement max-heap for highest priority first
Support dynamic priority updates

Dependency Resolution

Build dependency graph from task data
Topological sort for valid execution order
Detect circular dependencies (cycles)
Find critical path for completion time

Deadline Scheduling

Schedule tasks to meet deadlines
Maximize number of tasks completed on time
Minimize total lateness
Handle impossible schedules gracefully

Resource Allocation

Assign tasks to servers with limited resources
Interval scheduling for non-overlapping tasks
Multi-resource constraint satisfaction
Load balancing across servers

Pro Tip: Think in terms of data structure selection! Each scheduling problem maps to specific data structures: heaps for priority, graphs for dependencies, greedy for deadlines, and interval trees for resource allocation.

The Dataset

You will work with a comprehensive task scheduling dataset containing 100 system tasks. Download the CSV file containing task data for analysis:

Dataset Download

Download the task scheduler dataset. We provide a curated sample, or you can use the full Google Cluster Dataset from Kaggle for extended analysis.

Google Cluster Data (Kaggle) Sample Data (CSV)

Google cluster traces with real job scheduling data from production systems

Original Data Source

This project uses the Google Cluster Data from Kaggle - real traces from Google's production cluster containing millions of job scheduling events. It includes task priorities, resource requirements, scheduling constraints, and execution metrics from actual Google data centers.

View on Kaggle Google Official Repo

Dataset Info: 29-day trace | 12,500+ machines | Millions of tasks | Includes: CPU/Memory requests, priorities, constraints, scheduling events

Dataset Schema

Each row represents a system task with the following columns:

Column	Type	Description
`task_id`	String	Unique task identifier (e.g., T001, T002)
`task_name`	String	Descriptive name of the task
`priority`	Integer	Priority level 1-5 (5 = highest priority)
`duration_minutes`	Integer	Time required to complete task
`deadline`	DateTime	Task must complete by this time
`arrival_time`	DateTime	When task becomes available
`dependencies`	String	Comma-separated task IDs that must complete first
`category`	String	Task category (System, Security, Analytics, etc.)
`status`	String	Current status (Pending, Running, Completed)
`resource_required`	String	Server/resource needed (Server A, B, C, GPU)
`cpu_cores`	Integer	CPU cores required (1-16)
`memory_mb`	Integer	Memory required in MB (64-16384)

Tasks are distributed across the following categories:

System (15 tasks)
Security (12 tasks)
Analytics (10 tasks)
Deployment (8 tasks)

Monitoring (8 tasks)
Database (7 tasks)
Network (6 tasks)
Testing (8 tasks)

Performance (6 tasks)
Communication (4 tasks)
Compliance (5 tasks)
Others (11 tasks)

For classic scheduling problems, focus on these key fields:

// Example: Task representation for scheduling
class Task {
    String taskId;       // Unique identifier
    int priority;        // 1-5 (use for heap ordering)
    int duration;        // Duration in minutes
    LocalDateTime deadline;    // Must complete by
    List<String> dependencies; // Tasks that must finish first
    String resource;     // Required server/resource
}

// This becomes input for:
// - Priority Queue Scheduling (max-heap by priority)
// - Earliest Deadline First (min-heap by deadline)
// - Topological Sort (dependency graph)
// - Interval Scheduling (resource allocation)

Sample Data Preview

Here is what the data looks like from task_scheduler.csv:

Task ID	Task Name	Priority	Duration	Deadline	Dependencies	Resource
T001	Database Backup	5	45 min	2024-01-15 18:00	-	Server A
T003	Data Processing	4	120 min	2024-01-15 14:00	T001	Server A
T015	ML Model Training	5	240 min	2024-01-16 08:00	T010	GPU Server

Dataset Stats: 100 tasks | 5 priority levels | 4 servers | 12 categories | Complex dependency chains

For Algorithms: Build dependency graph, use priority field for heaps, deadline for scheduling

Data Quality Note: The dataset contains realistic scheduling challenges: circular dependency detection needed, impossible deadline scenarios, resource conflicts, and varying task durations. Your algorithms should handle these gracefully.

Project Requirements

Your project must include all of the following components. Structure your code with clear organization, documentation, and comprehensive testing.

Scheduling Algorithms (5 Required)

Implement the following classic scheduling algorithms:

Priority Queue Scheduler: Process tasks by priority using max-heap
Earliest Deadline First (EDF): Schedule by deadline using min-heap
Topological Sort Scheduler: Respect task dependencies
Deadline Scheduling: Maximize tasks completed on time (greedy)
Task Scheduling with Cooldown: LeetCode 621 - CPU interval scheduling

Deliverable: Java classes implementing each algorithm with optimal time complexity.

Data Structures Implementation

Build the following custom data structures:

TaskPriorityQueue: Custom heap with update and remove operations
DependencyGraph: Directed graph for task dependencies
TaskRegistry: Hash map with O(1) task lookup by ID
ResourceManager: Track server availability and allocation

Deliverable: Custom data structure classes with proper encapsulation and documentation.

Testing and Validation

Comprehensive test coverage:

Unit Tests: JUnit tests for each algorithm and data structure
Edge Cases: Empty task list, single task, circular dependencies, impossible deadlines
Performance Tests: Verify time complexity with large inputs
Integration Tests: Test with actual CSV data

Deliverable: Test suite with 90%+ code coverage and documented test cases.

Documentation and Analysis

Technical documentation:

Algorithm Explanations: Describe approach for each scheduling problem
Complexity Analysis: Time and space complexity for each solution
Trade-offs: Compare different scheduling strategies
Results: Scheduling outcomes on provided dataset

Deliverable: README.md with comprehensive documentation and analysis.

Algorithm Specifications

Implement the following algorithms with optimal time and space complexity. Each algorithm should be thoroughly tested and documented.

Priority Queue Scheduler (Medium)

Process tasks in order of priority using a max-heap.

// Example:
// tasks = [(T1, p=3), (T2, p=5), (T3, p=1)]
// Output: T2 -> T1 -> T3

public List<Task> scheduleByPriority(List<Task> tasks) {
    PriorityQueue<Task> maxHeap = 
        new PriorityQueue<>((a, b) -> b.priority - a.priority);
    
    maxHeap.addAll(tasks);
    List<Task> schedule = new ArrayList<>();
    
    while (!maxHeap.isEmpty()) {
        schedule.add(maxHeap.poll());
    }
    return schedule;
}

Time: O(n log n) Space: O(n)

Topological Sort (Medium)

Order tasks respecting dependencies (Kahn's Algorithm).

// Example:
// T1 -> T2 -> T3 (T2 depends on T1)
// Output: [T1, T2, T3]

public List<Task> topologicalSort(List<Task> tasks) {
    Map<String, Integer> inDegree = new HashMap<>();
    Queue<Task> queue = new LinkedList<>();
    
    // Initialize in-degrees
    for (Task t : tasks) {
        inDegree.put(t.id, t.dependencies.size());
        if (t.dependencies.isEmpty()) {
            queue.offer(t);
        }
    }
    // BFS processing...
}

Time: O(V + E) Space: O(V)

Earliest Deadline First (Medium)

Schedule tasks by earliest deadline using min-heap.

// Schedule tasks to minimize missed deadlines

public List<Task> scheduleByDeadline(List<Task> tasks) {
    PriorityQueue<Task> minHeap = 
        new PriorityQueue<>(
            Comparator.comparing(t -> t.deadline));
    
    minHeap.addAll(tasks);
    List<Task> schedule = new ArrayList<>();
    
    while (!minHeap.isEmpty()) {
        schedule.add(minHeap.poll());
    }
    return schedule;
}

Time: O(n log n) Space: O(n)

Task Scheduler (Hard - LeetCode 621)

Schedule tasks with cooldown constraint.

// Given tasks and cooldown n, find minimum time
// tasks = ["A","A","A","B","B","B"], n = 2
// Output: 8 (A -> B -> idle -> A -> B -> idle -> A -> B)

public int leastInterval(char[] tasks, int n) {
    int[] freq = new int[26];
    for (char c : tasks) freq[c - 'A']++;
    
    Arrays.sort(freq);
    int maxFreq = freq[25];
    int idleSlots = (maxFreq - 1) * n;
    
    for (int i = 24; i >= 0; i--) {
        idleSlots -= Math.min(freq[i], maxFreq - 1);
    }
    return tasks.length + Math.max(0, idleSlots);
}

Time: O(n) Space: O(1)

Cycle Detection (Medium)

Detect circular dependencies in task graph.

// Return true if circular dependency exists

public boolean hasCycle(List<Task> tasks) {
    Map<String, Integer> state = new HashMap<>();
    // 0 = unvisited, 1 = visiting, 2 = visited
    
    for (Task t : tasks) {
        if (dfs(t, state, taskMap)) {
            return true; // Cycle found
        }
    }
    return false;
}

private boolean dfs(Task t, Map<String, Integer> state) {
    if (state.get(t.id) == 1) return true; // Back edge
    if (state.get(t.id) == 2) return false;
    state.put(t.id, 1);
    // ... check dependencies
}

Time: O(V + E) Space: O(V)

Deadline Scheduling (Hard)

Maximize tasks completed before deadlines (greedy).

// Maximize number of tasks completed on time

public int maxTasksOnTime(List<Task> tasks) {
    // Sort by deadline
    Collections.sort(tasks, 
        Comparator.comparing(t -> t.deadline));
    
    PriorityQueue<Task> scheduled = 
        new PriorityQueue<>((a,b) -> b.duration - a.duration);
    int currentTime = 0;
    
    for (Task t : tasks) {
        scheduled.offer(t);
        currentTime += t.duration;
        if (currentTime > t.deadline) {
            currentTime -= scheduled.poll().duration;
        }
    }
    return scheduled.size();
}

Time: O(n log n) Space: O(n)

Data Structure Specifications

Implement the following data structures to support efficient scheduling operations. Focus on optimal time complexity for the core operations.

TaskPriorityQueue

Custom priority queue with update and remove by ID.

insert(Task t) - O(log n)
extractMax() - O(log n)
updatePriority(id, newPriority) - O(log n)
remove(id) - O(log n)
peek() - O(1)

Hint: Use a HashMap to track positions in the heap array for O(1) lookup.

DependencyGraph

Directed graph for task dependency management.

addTask(Task t) - O(1)
addDependency(from, to) - O(1)
getDependencies(id) - O(1)
getDependents(id) - O(1)
hasCycle() - O(V + E)
topologicalOrder() - O(V + E)

Hint: Use adjacency list with both forward and reverse edges.

TaskRegistry

Fast lookup table for tasks by various keys.

register(Task t) - O(1)
getById(id) - O(1)
getByCategory(category) - O(1)
getByResource(resource) - O(1)
getByPriority(priority) - O(1)
updateStatus(id, status) - O(1)

Hint: Use multiple HashMaps for different index types.

ResourceManager

Track and allocate server resources to tasks.

addResource(Resource r) - O(1)
allocate(task, resource) - O(1)
release(task) - O(1)
isAvailable(resource, time) - O(log n)
getAvailableSlot(resource, duration) - O(log n)

Hint: Use TreeMap for time-based availability tracking.

Submission Guidelines

Submit your project via GitHub. Ensure your repository is public and follows the structure below.

Repository Structure

task-scheduler-system/
├── src/
│   ├── main/java/
│   │   ├── models/
│   │   │   └── Task.java
│   │   ├── datastructures/
│   │   │   ├── TaskPriorityQueue.java
│   │   │   ├── DependencyGraph.java
│   │   │   ├── TaskRegistry.java
│   │   │   └── ResourceManager.java
│   │   ├── algorithms/
│   │   │   ├── PriorityScheduler.java
│   │   │   ├── TopologicalScheduler.java
│   │   │   ├── DeadlineScheduler.java
│   │   │   ├── TaskSchedulerWithCooldown.java
│   │   │   └── CycleDetector.java
│   │   └── Main.java
│   └── test/java/
│       ├── datastructures/
│       └── algorithms/
├── data/
│   └── task_scheduler.csv
├── docs/
│   ├── algorithm-analysis.md
│   └── complexity-report.md
├── README.md
└── pom.xml (or build.gradle)

README Requirements

Project overview and objectives
Setup instructions (compilation, dependencies)
Algorithm descriptions with examples
Time and space complexity analysis
Usage examples with sample outputs
Test cases documentation and results
Performance benchmarks on the dataset

Submission Checklist

All 5 scheduling algorithms implemented
All 4 data structures implemented
JUnit test suite with 90%+ coverage
Edge cases handled (cycles, empty, impossible)
Code properly documented with Javadoc
README with all required sections
Complexity analysis document

Submit Your Project

Grading Rubric

Your project will be evaluated on the following criteria. Maximum score is 500 points.

Category	Criteria	Points
Algorithms	Priority Queue Scheduler (30 pts) Topological Sort Scheduler (30 pts) Earliest Deadline First (30 pts) Task Scheduler with Cooldown (40 pts) Cycle Detection (20 pts)	150
Data Structures	TaskPriorityQueue with update (30 pts) DependencyGraph (25 pts) TaskRegistry (20 pts) ResourceManager (25 pts)	100
Testing	Unit test coverage (40 pts) Edge case handling (30 pts) Integration tests with CSV data (30 pts)	100
Documentation	README completeness (25 pts) Code documentation/Javadoc (25 pts) Complexity analysis (25 pts) Algorithm explanations (25 pts)	100
Code Quality	Clean code principles (15 pts) Proper OOP design (15 pts) Error handling (10 pts) Project structure (10 pts)	50
Total		500

Excellent

450-500

Outstanding implementation

Good

350-449

Meets all requirements

Needs Work

< 350

Missing key requirements

Task Scheduler System

What You Will Build

Contents

Project Overview

Priority Queue & Heap

Job Scheduling

Load Balancing

Optimization

Learning Objectives

Technical Skills

Problem-Solving Skills

Problem Scenario

CloudOps Systems

Problems to Solve

The Dataset

Dataset Download

Original Data Source

Dataset Schema

Sample Data Preview

Project Requirements

Scheduling Algorithms (5 Required)

Data Structures Implementation

Testing and Validation

Documentation and Analysis

Algorithm Specifications

Data Structure Specifications

Submission Guidelines

README Requirements

Submission Checklist

Grading Rubric

Excellent

Good

Needs Work

Task Scheduler System

What You Will Build

Contents

Project Overview

Priority Queue & Heap

Job Scheduling

Load Balancing

Optimization

Learning Objectives

Technical Skills

Problem-Solving Skills

Problem Scenario

CloudOps Systems

Problems to Solve

The Dataset

Dataset Download

Original Data Source

Dataset Schema

1 task_scheduler.csv - Task Scheduling Data (100 tasks)

2 Task Categories & Distribution

3 For Scheduling Algorithms - Key Fields

Sample Data Preview

Project Requirements

Scheduling Algorithms (5 Required)

Data Structures Implementation

Testing and Validation

Documentation and Analysis

Algorithm Specifications

Data Structure Specifications

Submission Guidelines

README Requirements

Submission Checklist

Grading Rubric

Excellent

Good

Needs Work