Advanced Graphs

Topological Sort

What is Topological Sort?

Topological Sort is a linear ordering of vertices in a DAG (Directed Acyclic Graph) such that for every directed edge u → v, vertex u comes before v in the ordering. Think of it as arranging tasks so that all prerequisites are completed before their dependent tasks. It's like getting dressed: you must put on socks before shoes, underwear before pants - there's a required order based on dependencies!

Topological Sort is everywhere in computing: build systems (compiling files in the right order), package managers (npm, pip installing dependencies first), course scheduling (prerequisites before advanced courses), spreadsheet calculations (formulas depending on other cells), and task scheduling (project management). The algorithm only works on DAGs because cycles would mean A depends on B which depends on A - impossible to order!

DAG Required

Only works on Directed Acyclic Graphs. If there's a cycle (A→B→C→A), no valid ordering exists! Always check for cycles first using in-degree or DFS.

Multiple Valid Orders

There can be many valid topological orders! If A→C and B→C, both [A,B,C] and [B,A,C] are valid. The algorithm gives ONE valid ordering.

Two Approaches

Kahn's (BFS): Process nodes with 0 incoming edges. DFS: Add nodes to result after visiting all descendants. Both are O(V+E).

Real-World Analogy

Imagine you're getting dressed in the morning. You can't put on shoes before socks, or a jacket before a shirt. Topological sort finds a valid order: underwear → pants → socks → shoes → shirt → jacket. If someone said "you must wear the jacket before the shirt AND the shirt before the jacket," that's a cycle - impossible!

Visual: Topological Sort Example DAG Ordering

Course Prerequisites DAG

Kahn's Algorithm Steps

1. Find in-degree 0: Math
2. Remove Math: Physics, CS101
3. Remove Physics: AI
4. Remove CS101: DS, Acct
5. Remove AI, DS: ML

Result Order

Math Physics CS101 AI DS ML

Real-world: This is exactly how build systems (Maven, npm) determine compilation order!

Kahn's Algorithm (BFS)

// Kahn's Algorithm - BFS approach
// Time: O(V + E), Space: O(V)
List topologicalSort(int n, List> adj) {
    int[] inDegree = new int[n];
    
    // Calculate in-degrees
    for (int u = 0; u < n; u++) {
        for (int v : adj.get(u)) {
            inDegree[v]++;  // v has one more prerequisite
        }
    }
}

Counts how many edges point TO each vertex. In course scheduling, this is the number of prerequisites each course has.

    // Add all vertices with 0 in-degree (no prerequisites)
    Queue queue = new LinkedList<>();
    for (int i = 0; i < n; i++) {
        if (inDegree[i] == 0) {
            queue.offer(i);  // Can start immediately!
        }
    }

Finds all nodes with no incoming edges - these have no dependencies and can be processed first.

    List result = new ArrayList<>();
    
    while (!queue.isEmpty()) {
        int u = queue.poll();       // Get next ready vertex
        result.add(u);               // Add to result
        
        for (int v : adj.get(u)) {   // Process neighbors
            inDegree[v]--;           // Remove dependency
            if (inDegree[v] == 0) {  // All prerequisites done?
                queue.offer(v);      // Ready to process!
            }
        }
    }

Removes each node and decrements neighbors' in-degrees. When a neighbor's in-degree hits 0, all its prerequisites are complete.

    // Check for cycle - if we couldn't process all vertices
    if (result.size() != n) {
        return Collections.emptyList();  // Cycle detected!
    }
    
    return result;  // Valid topological order
}

If some vertices remain unprocessed, there's a circular dependency (cycle). No valid ordering exists!

// Course Schedule - Can finish all courses?
boolean canFinish(int numCourses, int[][] prerequisites) {
    List> adj = new ArrayList<>();
    int[] inDegree = new int[numCourses];
    
    // Build adjacency list
    for (int i = 0; i < numCourses; i++) adj.add(new ArrayList<>());
    for (int[] pre : prerequisites) {
        adj.get(pre[1]).add(pre[0]);  // pre[1] -> pre[0]
        inDegree[pre[0]]++;
    }
    
    // BFS from courses with no prerequisites
    Queue queue = new LinkedList<>();
    for (int i = 0; i < numCourses; i++) {
        if (inDegree[i] == 0) queue.offer(i);
    }
    
    int count = 0;
    while (!queue.isEmpty()) {
        int course = queue.poll();
        count++;
        for (int next : adj.get(course)) {
            if (--inDegree[next] == 0) queue.offer(next);
        }
    }
    
    return count == numCourses;  // All courses completable?
}

Determines if all courses can be finished given prerequisites. Returns false if circular dependencies exist.

DFS Approach

// Topological Sort using DFS
List topoSortDFS(int n, List> adj) {
    boolean[] visited = new boolean[n];  // Track visited vertices
    Stack stack = new Stack<>(); // Store finish order
}

Creates a visited array to avoid revisiting nodes, and a stack to store vertices in reverse topological order.

    // Start DFS from each unvisited vertex
    for (int i = 0; i < n; i++) {
        if (!visited[i]) {
            dfs(i, adj, visited, stack);
        }
    }

Ensures all vertices are processed, even in disconnected graphs. Each unvisited vertex starts a new DFS tree.

    // Pop stack to get topological order
    List result = new ArrayList<>();
    while (!stack.isEmpty()) {
        result.add(stack.pop());
    }
    return result;
}

Stack contains vertices in reverse finish order. Popping gives us correct topological order (dependencies before dependents).

void dfs(int u, List> adj, boolean[] visited, Stack stack) {
    visited[u] = true;  // Mark as visited
    
    // Visit all neighbors first (process dependencies)
    for (int v : adj.get(u)) {
        if (!visited[v]) {
            dfs(v, adj, visited, stack);
        }
    }
    
    stack.push(u);  // Push AFTER all descendants are processed
}

Push vertex to stack AFTER visiting all descendants. This ensures all dependencies are processed before the current vertex, giving correct topological order when we pop.

Union-Find (Disjoint Set Union)

What is Union-Find (Disjoint Set Union)?

Union-Find (also called Disjoint Set Union or DSU) is a data structure that efficiently tracks a collection of non-overlapping (disjoint) sets. It supports two key operations: Find (determine which set an element belongs to) and Union (merge two sets into one). With two clever optimizations - path compression and union by rank - each operation runs in nearly O(1) time, specifically O(α(n)) where α is the inverse Ackermann function (practically constant).

Union-Find is the secret weapon behind many graph algorithms! It powers Kruskal's MST (detecting cycles efficiently), network connectivity (are two computers connected?), social networks (finding friend circles), image processing (connecting pixels of same color), and dynamic graph connectivity. It answers "are X and Y connected?" in near-constant time after processing edges!

Find Operation

Returns the "representative" (root) of the set containing element x. If find(a) == find(b), then a and b are in the same set (connected). Uses path compression for speed!

Union Operation

Merges the sets containing elements x and y. After union(a,b), find(a) == find(b). Uses union by rank to keep trees balanced and shallow!

Near O(1) Time

Path Compression: Flatten tree during find. Union by Rank: Attach smaller tree under larger. Together: O(α(n)) ≈ O(1) per operation!

Real-World Analogy

Imagine a school where students form friend groups. Initially everyone is alone. When two people become friends, their entire friend circles merge. Union-Find tracks this: Find = "Who is the leader of your friend group?" Union = "Your group and their group are now one big group!" The optimizations ensure finding the leader is super fast, even in a school of millions!

Visual: Union-Find Operations Path Compression

Before Path Compression

After find(3)

Path Compression

During find(), make every node point directly to root. Future finds become O(1)!

Union by Rank

Attach smaller tree under larger tree to keep height minimal.

Real-world: Social networks use this to find friend circles and suggest connections!

class UnionFind {
    private int[] parent;  // parent[i] = parent of element i
    private int[] rank;    // rank[i] = approximate height of subtree
    private int count;     // Number of connected components
    
    public UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        count = n;  // Initially n separate components
        
        for (int i = 0; i < n; i++) {
            parent[i] = i;  // Each element is its own parent
            rank[i] = 0;    // Initial rank is 0
        }
    }
}

Creates n separate sets where each element points to itself. Think of it as n separate friend groups of size 1. The rank array helps keep trees balanced during union operations.

    // Find with path compression - O(α(n)) ≈ O(1)
    public int find(int x) {
        if (parent[x] != x) {
            parent[x] = find(parent[x]);  // Path compression magic!
        }
        return parent[x];
    }

Finds the root (representative) of x's set. The magic is path compression: while finding root, we make every node on the path point directly to root. Future finds for these nodes become O(1)!

    // Union by rank - keeps trees balanced
    public boolean union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        
        if (rootX == rootY) return false;  // Already in same set
        
        // Attach smaller tree under larger tree
        if (rank[rootX] < rank[rootY]) {
            parent[rootX] = rootY;
        } else if (rank[rootX] > rank[rootY]) {
            parent[rootY] = rootX;
        } else {
            parent[rootY] = rootX;
            rank[rootX]++;  // Increase rank only when equal
        }
        
        count--;  // One less component
        return true;
    }

Merges two sets by attaching smaller tree under larger one. This keeps tree height minimal (logarithmic). Returns false if already connected - useful for cycle detection! Decrements component count on successful merge.

    // Check if x and y are in the same set
    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
    
    // Get number of connected components
    public int getCount() {
        return count;
    }
}

Helper methods for common queries. connected() checks if two elements belong to same set by comparing their roots. getCount() returns how many separate groups exist - starts at n and decreases with each union.

// Number of Connected Components in Graph
int countComponents(int n, int[][] edges) {
    UnionFind uf = new UnionFind(n);
    
    for (int[] edge : edges) {
        uf.union(edge[0], edge[1]);  // Connect endpoints
    }
    
    return uf.getCount();  // Remaining separate components
}

Classic application: count how many "islands" or separate groups exist after processing all edges. Start with n components, each successful union reduces count by 1. Final count = number of disconnected groups.

// Detect Cycle in Undirected Graph
boolean hasCycle(int n, int[][] edges) {
    UnionFind uf = new UnionFind(n);
    
    for (int[] edge : edges) {
        // If already connected, adding edge creates cycle!
        if (!uf.union(edge[0], edge[1])) {
            return true;  // Cycle found
        }
    }
    
    return false;  // No cycle
}

Elegant cycle detection! If two nodes are already in the same set (union returns false), adding an edge between them creates a cycle. This is exactly how Kruskal's MST algorithm avoids cycles when adding edges.

Practice Questions: Union-Find

Problem: Given an n x n matrix isConnected where isConnected[i][j] = 1 if city i and city j are directly connected. Return the total number of provinces (connected groups).

Example: [[1,1,0],[1,1,0],[0,0,1]] → 2 (cities 0,1 form one group, city 2 is alone)

Show Solution

int findCircleNum(int[][] isConnected) {
    int n = isConnected.length;
    int[] parent = new int[n];
    int[] rank = new int[n];
    
    for (int i = 0; i < n; i++) parent[i] = i;
    
    int provinces = n;
    
    for (int i = 0; i < n; i++) {
        for (int j = i + 1; j < n; j++) {
            if (isConnected[i][j] == 1) {
                if (union(parent, rank, i, j)) {
                    provinces--;
                }
            }
        }
    }
    
    return provinces;
}

int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

boolean union(int[] parent, int[] rank, int x, int y) {
    int px = find(parent, x), py = find(parent, y);
    if (px == py) return false;
    
    if (rank[px] < rank[py]) parent[px] = py;
    else if (rank[px] > rank[py]) parent[py] = px;
    else { parent[py] = px; rank[px]++; }
    
    return true;
}

Problem: A tree with n nodes has exactly n-1 edges. Given a graph with n nodes and n edges (one extra), find the edge that can be removed to make it a tree.

Example: [[1,2],[1,3],[2,3]] → [2,3] (removing this edge leaves a valid tree)

Hint: Process edges in order. The first edge that connects two already-connected nodes is the answer.

Show Solution

int[] findRedundantConnection(int[][] edges) {
    int n = edges.length;
    int[] parent = new int[n + 1];
    
    for (int i = 1; i <= n; i++) parent[i] = i;
    
    for (int[] edge : edges) {
        int u = edge[0], v = edge[1];
        int pu = find(parent, u), pv = find(parent, v);
        
        if (pu == pv) {
            return edge;  // Already connected - this creates cycle!
        }
        
        parent[pu] = pv;  // Union
    }
    
    return new int[0];
}

int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

Problem: Given a list of accounts where each account is [name, email1, email2, ...]. If two accounts share an email, they belong to the same person. Merge all accounts.

Example: [["John","a@mail","b@mail"],["John","c@mail"],["John","a@mail","d@mail"]] → First and third accounts merge (share "a@mail")

Show Solution

List> accountsMerge(List> accounts) {
    Map emailToName = new HashMap<>();
    Map parent = new HashMap<>();
    
    // Initialize: each email is its own parent
    for (List account : accounts) {
        String name = account.get(0);
        for (int i = 1; i < account.size(); i++) {
            String email = account.get(i);
            emailToName.put(email, name);
            parent.put(email, email);
        }
    }
    
    // Union: connect all emails in same account
    for (List account : accounts) {
        String firstEmail = account.get(1);
        for (int i = 2; i < account.size(); i++) {
            union(parent, firstEmail, account.get(i));
        }
    }
    
    // Group emails by root parent
    Map> groups = new HashMap<>();
    for (String email : parent.keySet()) {
        String root = find(parent, email);
        groups.computeIfAbsent(root, k -> new TreeSet<>()).add(email);
    }
    
    // Build result
    List> result = new ArrayList<>();
    for (String root : groups.keySet()) {
        List merged = new ArrayList<>();
        merged.add(emailToName.get(root));
        merged.addAll(groups.get(root));
        result.add(merged);
    }
    
    return result;
}

String find(Map parent, String x) {
    if (!parent.get(x).equals(x)) {
        parent.put(x, find(parent, parent.get(x)));
    }
    return parent.get(x);
}

void union(Map parent, String x, String y) {
    parent.put(find(parent, x), find(parent, y));
}

Minimum Spanning Tree

What is a Minimum Spanning Tree (MST)?

A Minimum Spanning Tree (MST) is a subset of edges from a weighted, connected, undirected graph that connects all vertices together with the minimum possible total edge weight, while forming no cycles. It's called a "tree" because it has exactly V-1 edges for V vertices (the minimum needed to connect everything without cycles), and "spanning" because it spans (reaches) every vertex.

MST is crucial for real-world optimization! Telecom companies use it to lay minimum cable to connect all cities, electrical grids to minimize wire costs, road planning to connect villages with minimum road construction, network design to connect computers cheaply, and clustering algorithms in machine learning. Two famous algorithms solve this: Kruskal's (greedy edge selection) and Prim's (greedy vertex growth).

Kruskal's Algorithm

Edge-centric: Sort all edges by weight. Greedily add the smallest edge that doesn't create a cycle (use Union-Find to detect). Stop when V-1 edges added. O(E log E).

Prim's Algorithm

Vertex-centric: Start from any vertex. Repeatedly add the cheapest edge connecting MST to a new vertex. Use a min-heap for efficiency. O((V+E) log V).

MST Properties

Has exactly V-1 edges. May not be unique (multiple MSTs with same total weight possible). All vertices connected. No cycles. Minimum total weight.

Real-World Analogy

Imagine you're a telecom company that needs to connect 10 cities with fiber optic cables. You could connect every city to every other city (expensive!), but you only need enough cables so everyone can reach everyone else. MST finds the cheapest way to lay cables connecting all cities - like finding the cheapest road network that still lets you drive between any two cities!

Visual: MST Algorithms Comparison Kruskal vs Prim

Original Weighted Graph

Minimum Spanning Tree

Kruskal's

Sort edges: 1,2,3,4,5,6. Add smallest that doesn't create cycle.

Prim's

Start from A, always add cheapest edge to a new vertex.

Real-world: Telecom companies use MST to lay minimum cable connecting all cities!

Kruskal's Algorithm (Union-Find)

// Kruskal's Algorithm - O(E log E)
int kruskalMST(int n, int[][] edges) {
    // Sort edges by weight (greedy approach)
    Arrays.sort(edges, (a, b) -> a[2] - b[2]);
    
    UnionFind uf = new UnionFind(n);
    int mstWeight = 0;
    int edgesUsed = 0;
}

Sort all edges from smallest to largest weight - this is the greedy foundation. Initialize Union-Find to track connected components and detect cycles. We'll greedily pick smallest edges that don't create cycles.

    for (int[] edge : edges) {
        int u = edge[0], v = edge[1], weight = edge[2];
        
        // Add edge only if it doesn't create a cycle
        if (uf.union(u, v)) {
            mstWeight += weight;
            edgesUsed++;
            
            if (edgesUsed == n - 1) break;  // MST complete!
        }
    }

For each edge (in sorted order), try to add it. Union-Find's union() returns true only if endpoints are in different components - meaning no cycle! Stop when we have V-1 edges (minimum needed to connect V vertices).

    // Return MST weight, or -1 if graph is disconnected
    return edgesUsed == n - 1 ? mstWeight : -1;
}

If we couldn't add V-1 edges, the graph is disconnected (no MST exists). Otherwise return total weight of all edges in MST. This handles edge case of disconnected input graphs.

// Variant: Get actual MST edges (not just total weight)
List getMSTEdges(int n, int[][] edges) {
    Arrays.sort(edges, (a, b) -> a[2] - b[2]);
    
    UnionFind uf = new UnionFind(n);
    List mst = new ArrayList<>();
    
    for (int[] edge : edges) {
        if (uf.union(edge[0], edge[1])) {
            mst.add(edge);  // Store the actual edge
            if (mst.size() == n - 1) break;
        }
    }
    
    return mst;  // List of edges in MST
}

Sometimes you need the actual edges, not just total weight. This variant stores each selected edge in a list. Useful when you need to know which connections to build (e.g., which cables to lay between cities).

Prim's Algorithm (Priority Queue)

// Prim's Algorithm - O((V + E) log V)
int primMST(int n, List> adj) {
    boolean[] inMST = new boolean[n];  // Track vertices in MST
    // Min-heap: {weight, vertex}
    PriorityQueue pq = new PriorityQueue<>((a, b) -> a[0] - b[0]);
    
    pq.offer(new int[]{0, 0});  // Start from vertex 0 with weight 0
    int mstWeight = 0;
    int edgesUsed = 0;
}

Initialize a min-heap (priority queue) sorted by edge weight. Start from any vertex (here vertex 0) with weight 0. The inMST array prevents adding the same vertex twice, which would create a cycle.

    while (!pq.isEmpty() && edgesUsed < n) {
        int[] curr = pq.poll();  // Get minimum weight edge
        int weight = curr[0], u = curr[1];
        
        if (inMST[u]) continue;  // Skip if already in MST
        
        inMST[u] = true;      // Add vertex to MST
        mstWeight += weight;   // Add edge weight to total
        edgesUsed++;
    }

Extract the minimum weight edge from heap. If the vertex is already in MST, skip it (we found a cheaper path earlier). Otherwise, add vertex to MST and accumulate weight. This greedy choice guarantees minimum total weight.

        // Add all edges to non-MST neighbors
        for (int[] edge : adj.get(u)) {
            int v = edge[0], w = edge[1];
            if (!inMST[v]) {
                pq.offer(new int[]{w, v});  // Add to heap
            }
        }
    }
    
    return edgesUsed == n ? mstWeight : -1;  // -1 if disconnected
}

For the newly added vertex, push all its edges to non-MST vertices into the heap. The heap automatically sorts them, so next iteration picks the globally cheapest edge connecting MST to a new vertex. Return -1 if graph is disconnected.

// LeetCode: Min Cost to Connect All Points (Manhattan Distance)
int minCostConnectPoints(int[][] points) {
    int n = points.length;
    boolean[] inMST = new boolean[n];
    PriorityQueue pq = new PriorityQueue<>((a, b) -> a[0] - b[0]);
    
    pq.offer(new int[]{0, 0});  // Start from point 0
    int cost = 0;
    int count = 0;
    
    while (count < n) {
        int[] curr = pq.poll();
        int d = curr[0], u = curr[1];
        
        if (inMST[u]) continue;
        inMST[u] = true;
        cost += d;
        count++;
        
        // Add edges to ALL other points (complete graph)
        for (int v = 0; v < n; v++) {
            if (!inMST[v]) {
                int dist = Math.abs(points[u][0] - points[v][0]) + 
                           Math.abs(points[u][1] - points[v][1]);
                pq.offer(new int[]{dist, v});
            }
        }
    }
    return cost;
}

Classic interview problem! Connect all coordinate points with minimum total cable. Uses Manhattan distance (|x1-x2| + |y1-y2|) as edge weight. It's a complete graph (every point connects to every other), so we compute distances on-the-fly instead of storing adjacency list.

Practice Questions: Minimum Spanning Tree

Problem: Given points array where points[i] = [xi, yi]. The cost to connect two points is the Manhattan distance. Return minimum cost to connect all points.

Example: [[0,0],[2,2],[3,10],[5,2],[7,0]] → 20

Show Solution (Prim's)

int minCostConnectPoints(int[][] points) {
    int n = points.length;
    boolean[] inMST = new boolean[n];
    // {cost, pointIndex}
    PriorityQueue pq = new PriorityQueue<>((a, b) -> a[0] - b[0]);
    
    pq.offer(new int[]{0, 0});  // Start from point 0
    int totalCost = 0;
    int edgesUsed = 0;
    
    while (edgesUsed < n) {
        int[] curr = pq.poll();
        int cost = curr[0], u = curr[1];
        
        if (inMST[u]) continue;
        
        inMST[u] = true;
        totalCost += cost;
        edgesUsed++;
        
        // Add edges to all non-MST points
        for (int v = 0; v < n; v++) {
            if (!inMST[v]) {
                int dist = Math.abs(points[u][0] - points[v][0]) + 
                           Math.abs(points[u][1] - points[v][1]);
                pq.offer(new int[]{dist, v});
            }
        }
    }
    
    return totalCost;
}

Problem: There are n cities labeled 1 to n. Given connections where connections[i] = [city1, city2, cost]. Return minimum cost to connect all cities, or -1 if impossible.

Example: n = 3, connections = [[1,2,5],[1,3,6],[2,3,1]] → 6 (use edges 2-3 and 1-2)

Show Solution (Kruskal's)

int minimumCost(int n, int[][] connections) {
    // Sort edges by cost
    Arrays.sort(connections, (a, b) -> a[2] - b[2]);
    
    int[] parent = new int[n + 1];
    for (int i = 1; i <= n; i++) parent[i] = i;
    
    int totalCost = 0;
    int edgesUsed = 0;
    
    for (int[] conn : connections) {
        int u = conn[0], v = conn[1], cost = conn[2];
        
        int pu = find(parent, u), pv = find(parent, v);
        
        if (pu != pv) {
            parent[pu] = pv;  // Union
            totalCost += cost;
            edgesUsed++;
            
            if (edgesUsed == n - 1) break;  // MST complete
        }
    }
    
    return edgesUsed == n - 1 ? totalCost : -1;
}

int find(int[] parent, int x) {
    if (parent[x] != x) {
        parent[x] = find(parent, parent[x]);
    }
    return parent[x];
}

Problem: There are n houses. You can either build a well inside a house (cost wells[i]) or lay pipes between houses (pipes[j] = [house1, house2, cost]). Find minimum cost to supply water to all houses.

Key Insight: Add a virtual node 0 (water source). Connect node 0 to each house with cost = wells[i]. Now it's a standard MST problem!

Show Solution

int minCostToSupplyWater(int n, int[] wells, int[][] pipes) {
    // Create edges: virtual node 0 → house i with cost wells[i-1]
    List edges = new ArrayList<>();
    
    for (int i = 0; i < n; i++) {
        edges.add(new int[]{0, i + 1, wells[i]});
    }
    
    for (int[] pipe : pipes) {
        edges.add(pipe);
    }
    
    // Kruskal's MST
    Collections.sort(edges, (a, b) -> a[2] - b[2]);
    
    int[] parent = new int[n + 1];
    for (int i = 0; i <= n; i++) parent[i] = i;
    
    int totalCost = 0;
    
    for (int[] edge : edges) {
        int u = edge[0], v = edge[1], cost = edge[2];
        int pu = find(parent, u), pv = find(parent, v);
        
        if (pu != pv) {
            parent[pu] = pv;
            totalCost += cost;
        }
    }
    
    return totalCost;
}

int find(int[] parent, int x) {
    if (parent[x] != x) parent[x] = find(parent, parent[x]);
    return parent[x];
}

Strongly Connected Components

What is a Strongly Connected Component (SCC)?

A Strongly Connected Component (SCC) is a maximal group of vertices in a directed graph where every vertex can reach every other vertex through directed edges. "Strongly connected" means bidirectional reachability: if you can go from A to B, you can also go from B to A (possibly via different paths). "Maximal" means you can't add any more vertices while maintaining this property. Unlike regular connected components (for undirected graphs), SCCs only apply to directed graphs.

SCCs reveal the core structure of directed graphs! They're used in web page analysis (finding tightly linked website clusters), social network analysis (detecting communities with mutual follows), compiler optimization (finding loops in control flow), 2-SAT problems, and deadlock detection. The famous algorithms are Kosaraju's (two DFS passes) and Tarjan's (single DFS with low-link values).

Bidirectional Reach

Within an SCC, you can get from ANY vertex to ANY other vertex. A→B→C→A forms one SCC because you can reach any vertex from any other by following directed edges.

Kosaraju's Algorithm

Two DFS passes: 1) DFS on original graph, record finish times. 2) DFS on reversed graph in decreasing finish order. Each DFS tree = one SCC. O(V+E).

SCC Condensation

You can "condense" a graph by replacing each SCC with a single node. The resulting graph is always a DAG! This enables topological sort on the condensed graph.

Real-World Analogy

Imagine a city's one-way street system. An SCC is like a neighborhood where you can drive from any intersection to any other intersection (even though streets are one-way) - you might take a roundabout route, but you can always get there. Some intersections might only connect one-way to another neighborhood - those are in different SCCs. Finding SCCs tells you which neighborhoods have full internal connectivity!

Kosaraju's Algorithm

// Kosaraju's Algorithm - O(V + E)
class Kosaraju {
    List> adj;         // Original graph
    List> reverseAdj;  // Reversed graph
    boolean[] visited;                // Track visited vertices
    Stack finishOrder;       // Vertices ordered by finish time
}

Set up data structures for both original and reversed graph. The key insight: we need to process vertices in a specific order (finish time from first DFS), then run DFS on reversed graph. This isolates each SCC.

    List> findSCCs(int n, List> graph) {
        adj = graph;
        visited = new boolean[n];
        finishOrder = new Stack<>();
        
        // Step 1: DFS on original graph, record finish order
        for (int i = 0; i < n; i++) {
            if (!visited[i]) {
                dfs1(i);
            }
        }
    }
    
    void dfs1(int u) {
        visited[u] = true;
        for (int v : adj.get(u)) {
            if (!visited[v]) dfs1(v);
        }
        finishOrder.push(u);  // Push AFTER all descendants done
    }

First DFS on original graph records "finish times" by pushing to stack after exploring all descendants. Vertices that finish last are pushed last (top of stack). This ordering is crucial - it ensures we process "source" SCCs first in step 3.

        // Step 2: Build reverse graph (flip all edges)
        reverseAdj = new ArrayList<>();
        for (int i = 0; i < n; i++) {
            reverseAdj.add(new ArrayList<>());
        }
        for (int u = 0; u < n; u++) {
            for (int v : adj.get(u)) {
                reverseAdj.get(v).add(u);  // Reverse: v -> u
            }
        }

Reverse all edge directions. If original had A→B, reversed has B→A. Why? In reversed graph, you can only reach vertices that could originally reach YOU. Combined with finish order, this traps DFS within each SCC.

        // Step 3: DFS on reverse graph in finish order
        Arrays.fill(visited, false);
        List> sccs = new ArrayList<>();
        
        while (!finishOrder.isEmpty()) {
            int v = finishOrder.pop();  // Process by finish order
            if (!visited[v]) {
                List scc = new ArrayList<>();
                dfs2(v, scc);
                sccs.add(scc);  // Each DFS tree = one SCC!
            }
        }
        
        return sccs;
    }

Process vertices in decreasing finish time (pop from stack). Each DFS on reversed graph explores exactly one SCC - it can't "escape" to other SCCs because edges pointing out are now pointing in, and we process in correct order.

    void dfs2(int u, List scc) {
        visited[u] = true;
        scc.add(u);  // Add to current SCC
        
        for (int v : reverseAdj.get(u)) {
            if (!visited[v]) {
                dfs2(v, scc);
            }
        }
    }
}

Simple DFS that collects all vertices reachable in reversed graph from starting point. Since we process in finish order on reversed graph, all vertices reached in one DFS call form exactly one strongly connected component.

// Count number of SCCs in a directed graph
int countSCCs(int n, List> adj) {
    Kosaraju k = new Kosaraju();
    return k.findSCCs(n, adj).size();
}

Simple wrapper to count SCCs. Each element in the returned list is one SCC. Useful for analyzing graph structure - a graph with 1 SCC means every vertex can reach every other vertex (strongly connected graph).

Practice Questions: Strongly Connected Components

Problem: Given a directed graph with n vertices and edges, count the number of strongly connected components.

Example: n = 5, edges = [[0,1],[1,2],[2,0],[1,3],[3,4]] → 3 SCCs: {0,1,2}, {3}, {4}

Show Solution (Kosaraju's)

int countSCCs(int n, int[][] edges) {
    List> adj = new ArrayList<>();
    List> revAdj = new ArrayList<>();
    
    for (int i = 0; i < n; i++) {
        adj.add(new ArrayList<>());
        revAdj.add(new ArrayList<>());
    }
    
    for (int[] e : edges) {
        adj.get(e[0]).add(e[1]);
        revAdj.get(e[1]).add(e[0]);
    }
    
    // Step 1: Get finish order
    boolean[] visited = new boolean[n];
    Stack stack = new Stack<>();
    
    for (int i = 0; i < n; i++) {
        if (!visited[i]) dfs1(i, adj, visited, stack);
    }
    
    // Step 2: Process in reverse finish order on reversed graph
    Arrays.fill(visited, false);
    int sccCount = 0;
    
    while (!stack.isEmpty()) {
        int node = stack.pop();
        if (!visited[node]) {
            dfs2(node, revAdj, visited);
            sccCount++;
        }
    }
    
    return sccCount;
}

void dfs1(int u, List> adj, boolean[] visited, Stack stack) {
    visited[u] = true;
    for (int v : adj.get(u)) {
        if (!visited[v]) dfs1(v, adj, visited, stack);
    }
    stack.push(u);
}

void dfs2(int u, List> revAdj, boolean[] visited) {
    visited[u] = true;
    for (int v : revAdj.get(u)) {
        if (!visited[v]) dfs2(v, revAdj, visited);
    }
}

Problem: Find all critical connections (bridges) in a network. A connection is critical if removing it disconnects two servers.

Example: n = 4, connections = [[0,1],[1,2],[2,0],[1,3]] → [[1,3]]

Hint: Use Tarjan's algorithm with discovery time and low-link values.

Show Solution (Tarjan's)

List> criticalConnections(int n, List> connections) {
    List> adj = new ArrayList<>();
    for (int i = 0; i < n; i++) adj.add(new ArrayList<>());
    
    for (List conn : connections) {
        adj.get(conn.get(0)).add(conn.get(1));
        adj.get(conn.get(1)).add(conn.get(0));
    }
    
    int[] disc = new int[n];  // Discovery time
    int[] low = new int[n];   // Lowest reachable discovery time
    Arrays.fill(disc, -1);
    
    List> bridges = new ArrayList<>();
    int[] time = {0};
    
    for (int i = 0; i < n; i++) {
        if (disc[i] == -1) {
            dfs(i, -1, adj, disc, low, time, bridges);
        }
    }
    
    return bridges;
}

void dfs(int u, int parent, List> adj, 
         int[] disc, int[] low, int[] time, List> bridges) {
    disc[u] = low[u] = time[0]++;
    
    for (int v : adj.get(u)) {
        if (v == parent) continue;
        
        if (disc[v] == -1) {
            dfs(v, u, adj, disc, low, time, bridges);
            low[u] = Math.min(low[u], low[v]);
            
            // If low[v] > disc[u], edge u-v is a bridge
            if (low[v] > disc[u]) {
                bridges.add(Arrays.asList(u, v));
            }
        } else {
            low[u] = Math.min(low[u], disc[v]);
        }
    }
}

Problem: Given a directed graph, find the minimum number of edges to add to make it strongly connected.

Key Insight: After finding SCCs, condense the graph into a DAG of SCCs. Count nodes with in-degree 0 (sources) and out-degree 0 (sinks). Answer = max(sources, sinks), unless graph is already one SCC.

Show Solution

int minEdgesToMakeSCC(int n, int[][] edges) {
    // Step 1: Find all SCCs using Kosaraju's
    List> sccs = findSCCs(n, edges);
    
    if (sccs.size() == 1) return 0;  // Already strongly connected
    
    // Step 2: Map each node to its SCC index
    int[] sccId = new int[n];
    for (int i = 0; i < sccs.size(); i++) {
        for (int node : sccs.get(i)) {
            sccId[node] = i;
        }
    }
    
    // Step 3: Build condensed graph and count in/out degrees
    int sccCount = sccs.size();
    Set hasIncoming = new HashSet<>();
    Set hasOutgoing = new HashSet<>();
    
    for (int[] edge : edges) {
        int u = sccId[edge[0]], v = sccId[edge[1]];
        if (u != v) {
            hasOutgoing.add(u);
            hasIncoming.add(v);
        }
    }
    
    int sources = sccCount - hasIncoming.size();  // SCCs with no incoming edges
    int sinks = sccCount - hasOutgoing.size();    // SCCs with no outgoing edges
    
    return Math.max(sources, sinks);
}

Bipartite Graphs and Matching

What is a Bipartite Graph?

A Bipartite Graph is a graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to a vertex in V. In other words, the graph can be two-colored: no two adjacent vertices share the same color. Think of it like a dating app where you can only match users from Group A with users from Group B - no matches within the same group!

Bipartite graphs model many real-world scenarios: job assignments (workers to tasks), student-course enrollment, stable matching (hospital-resident matching), recommendation systems (users to products), and network flow problems. The key insight is that bipartite graphs have no odd-length cycles - this is how we detect them using BFS/DFS two-coloring!

Two-Colorable

If you can color all vertices with 2 colors such that no edge connects same-colored vertices, it's bipartite. Odd cycles make this impossible.

Two Disjoint Sets

Vertices split into sets U and V. All edges go between sets, never within a set. Perfect for modeling "matching" problems.

Maximum Matching

Find the largest set of edges with no shared vertices. Used in job assignment, resource allocation, and pairing problems.

Real-World Example:

Imagine a job fair where students (Set A) meet companies (Set B). Each student can interview with multiple companies, and each company can interview multiple students. This is naturally bipartite - students only connect to companies, never to other students. Maximum bipartite matching helps us maximize the number of job offers where each student gets at most one job and each position is filled by at most one student!

Bipartite Check - BFS Two-Coloring

// Check if graph is bipartite using BFS two-coloring
// Time: O(V + E), Space: O(V)
boolean isBipartite(int[][] graph) {
    int n = graph.length;
    int[] color = new int[n];  // 0 = uncolored, 1 = red, -1 = blue

Method signature and color array initialization. We use a color array where 0 means uncolored, 1 represents red, and -1 represents blue. This allows us to easily assign opposite colors using negation.

    // Handle disconnected components
    for (int start = 0; start < n; start++) {
        if (color[start] != 0) continue;  // Already colored
        
        Queue queue = new LinkedList<>();
        queue.offer(start);
        color[start] = 1;  // Color first vertex red

Loop through all vertices to handle disconnected components. If a vertex is already colored, skip it. Otherwise, initialize a BFS queue and color the starting vertex red (1) to begin the two-coloring process for this component.

        while (!queue.isEmpty()) {
            int u = queue.poll();
            
            for (int v : graph[u]) {
                if (color[v] == 0) {
                    // Uncolored - give opposite color
                    color[v] = -color[u];
                    queue.offer(v);
                } else if (color[v] == color[u]) {
                    // Same color as neighbor - NOT bipartite!
                    return false;
                }
            }
        }
    }

BFS traversal to color all vertices. For each uncolored neighbor, assign the opposite color using negation (-color[u]). If we find a neighbor with the same color as the current vertex, we've detected an odd-length cycle, which means the graph cannot be bipartite.

    return true;  // Successfully two-colored
}

If we successfully color all vertices without any same-color conflicts, the graph is bipartite. All vertices can be divided into two groups where no two vertices in the same group are adjacent.

Bipartite Check - DFS Approach

// Check if graph is bipartite using DFS
boolean isBipartiteDFS(int[][] graph) {
    int n = graph.length;
    int[] color = new int[n];  // 0 = uncolored

Main method signature and initialization. We create a color array where 0 indicates an uncolored vertex. The DFS approach uses recursion instead of an explicit queue to traverse the graph.

    for (int i = 0; i < n; i++) {
        if (color[i] == 0 && !dfs(graph, color, i, 1)) {
            return false;
        }
    }
    
    return true;
}

Iterate through all vertices to handle disconnected components. For each uncolored vertex, start a DFS with color 1 (red). If any DFS returns false (conflict found), the graph is not bipartite. Return true only if all components are successfully two-colored.

boolean dfs(int[][] graph, int[] color, int u, int c) {
    color[u] = c;

DFS helper method that takes the current vertex u and the color c to assign. First, color the current vertex with the given color before exploring its neighbors.

    for (int v : graph[u]) {
        if (color[v] == 0) {
            // Uncolored - try to color with opposite
            if (!dfs(graph, color, v, -c)) {
                return false;
            }
        } else if (color[v] == c) {
            // Same color conflict!
            return false;
        }
    }

Explore all neighbors of the current vertex. If a neighbor is uncolored, recursively try to color it with the opposite color (-c). If the recursive call fails or if a neighbor already has the same color as the current vertex, we've found a conflict indicating an odd cycle.

    return true;
}

If all neighbors are successfully colored without conflicts, return true. This propagates back up the recursion stack, confirming that this subtree of the DFS is bipartite.

Maximum Bipartite Matching - Hungarian Algorithm

// Maximum Bipartite Matching using DFS (Kuhn's Algorithm)
// Given bipartite graph with left vertices [0, m) and right vertices [0, n)
// adj[u] contains neighbors of left vertex u in the right set
class BipartiteMatching {
    int m, n;  // m = left set size, n = right set size
    List> adj;  // Adjacency list for left vertices
    int[] matchL, matchR;  // matchL[u] = right vertex matched to u, matchR[v] = left vertex matched to v
    boolean[] visited;

Class structure for bipartite matching. We maintain two match arrays: matchL[u] stores which right vertex is matched to left vertex u, and matchR[v] stores which left vertex is matched to right vertex v. The visited array prevents revisiting vertices during augmenting path search.

    int maxMatching(int m, int n, List> adj) {
        this.m = m;
        this.n = n;
        this.adj = adj;
        
        matchL = new int[m];
        matchR = new int[n];
        Arrays.fill(matchL, -1);
        Arrays.fill(matchR, -1);
        
        int matching = 0;

Main method initialization. Store graph dimensions and adjacency list. Initialize both match arrays with -1 to indicate all vertices start unmatched. The matching counter will track the total number of matched pairs found.

        // Try to find augmenting path from each unmatched left vertex
        for (int u = 0; u < m; u++) {
            visited = new boolean[n];
            if (dfs(u)) {
                matching++;
            }
        }
        
        return matching;
    }

Iterate through all left vertices and try to find an augmenting path for each. Reset the visited array for each search to allow vertices to be reconsidered. If DFS finds a valid augmenting path, increment the matching count. Return the total number of matched pairs.

    // Try to find augmenting path starting from left vertex u
    boolean dfs(int u) {
        for (int v : adj.get(u)) {
            if (visited[v]) continue;
            visited[v] = true;

DFS helper to find an augmenting path. For each neighbor v of left vertex u, mark it as visited to prevent cycles. Skip already visited vertices during this search iteration.

            // If v is unmatched OR we can find alternate path for v's current match
            if (matchR[v] == -1 || dfs(matchR[v])) {
                matchL[u] = v;
                matchR[v] = u;
                return true;
            }
        }
        
        return false;
    }
}

The key matching logic. If right vertex v is unmatched (matchR[v] == -1), we can directly match u with v. If v is already matched, we recursively try to find an alternate match for v's current partner. If either succeeds, update both match arrays and return true. Return false if no augmenting path exists.

// Example usage: Job Assignment Problem
int maxJobAssignments(int workers, int jobs, int[][] canDo) {
    // canDo[i] = list of jobs worker i can do
    List> adj = new ArrayList<>();
    for (int[] jobList : canDo) {
        adj.add(new ArrayList<>());
        for (int job : jobList) {
            adj.get(adj.size() - 1).add(job);
        }
    }
    
    BipartiteMatching bm = new BipartiteMatching();
    return bm.maxMatching(workers, jobs, adj);
}

Practical example: Job Assignment Problem. Given workers on the left and jobs on the right, where canDo[i] lists jobs that worker i can perform, find the maximum number of job assignments where each worker gets at most one job and each job is assigned to at most one worker.

Practice Questions: Bipartite Graphs

Problem: Given an undirected graph represented as adjacency list, return true if the graph is bipartite.

Example: graph = [[1,2,3],[0,2],[0,1,3],[0,2]] → false (nodes 0,1,2 form a triangle - odd cycle)

Example: graph = [[1,3],[0,2],[1,3],[0,2]] → true (can split into {0,2} and {1,3})

Show Solution

boolean isBipartite(int[][] graph) {
    int n = graph.length;
    int[] color = new int[n];  // 0 = unvisited
    
    for (int i = 0; i < n; i++) {
        if (color[i] != 0) continue;
        
        Queue queue = new LinkedList<>();
        queue.offer(i);
        color[i] = 1;
        
        while (!queue.isEmpty()) {
            int u = queue.poll();
            for (int v : graph[u]) {
                if (color[v] == 0) {
                    color[v] = -color[u];
                    queue.offer(v);
                } else if (color[v] == color[u]) {
                    return false;
                }
            }
        }
    }
    
    return true;
}

Problem: Given n people and dislikes array where dislikes[i] = [a, b] means person a and b dislike each other. Can we split everyone into two groups such that no two people in the same group dislike each other?

Example: n = 4, dislikes = [[1,2],[1,3],[2,4]] → true (Group 1: {1,4}, Group 2: {2,3})

Show Solution

boolean possibleBipartition(int n, int[][] dislikes) {
    List> adj = new ArrayList<>();
    for (int i = 0; i <= n; i++) adj.add(new ArrayList<>());
    
    for (int[] d : dislikes) {
        adj.get(d[0]).add(d[1]);
        adj.get(d[1]).add(d[0]);
    }
    
    int[] color = new int[n + 1];
    
    for (int i = 1; i <= n; i++) {
        if (color[i] != 0) continue;
        
        Queue queue = new LinkedList<>();
        queue.offer(i);
        color[i] = 1;
        
        while (!queue.isEmpty()) {
            int u = queue.poll();
            for (int v : adj.get(u)) {
                if (color[v] == 0) {
                    color[v] = -color[u];
                    queue.offer(v);
                } else if (color[v] == color[u]) {
                    return false;
                }
            }
        }
    }
    
    return true;
}

Problem: There are m boys and n girls at a party. grid[i][j] = 1 means boy i can invite girl j. Each boy can invite at most one girl, and each girl can accept at most one invitation. Return the maximum number of accepted invitations.

Example: grid = [[1,1,1],[1,0,1],[0,0,1]] → 3

Hint: This is maximum bipartite matching!

Show Solution

int maximumInvitations(int[][] grid) {
    int m = grid.length, n = grid[0].length;
    int[] matchGirl = new int[n];  // matchGirl[j] = boy matched to girl j
    Arrays.fill(matchGirl, -1);
    
    int result = 0;
    
    for (int boy = 0; boy < m; boy++) {
        boolean[] visited = new boolean[n];
        if (canMatch(grid, boy, visited, matchGirl)) {
            result++;
        }
    }
    
    return result;
}

boolean canMatch(int[][] grid, int boy, boolean[] visited, int[] matchGirl) {
    for (int girl = 0; girl < grid[0].length; girl++) {
        if (grid[boy][girl] == 0 || visited[girl]) continue;
        
        visited[girl] = true;
        
        // If girl is unmatched OR her current match can find someone else
        if (matchGirl[girl] == -1 || canMatch(grid, matchGirl[girl], visited, matchGirl)) {
            matchGirl[girl] = boy;
            return true;
        }
    }
    
    return false;
}

Key Takeaways

Topological Sort = DAG

Kahn's (BFS with in-degree) or DFS with stack. O(V+E).

Union-Find = Connectivity

Path compression + rank gives nearly O(1). Great for dynamic connectivity.

MST: Kruskal vs Prim

Kruskal: Sort edges + Union-Find. Prim: PriorityQueue like Dijkstra.

SCC = Kosaraju/Tarjan

Two DFS passes (Kosaraju) or one pass with low-link (Tarjan).

Cycle Detection Matters

TopoSort fails on cycles. Union-Find detects cycles in undirected graphs.

Choose Algorithm Wisely

Kruskal for sparse, Prim for dense graphs. Know trade-offs.

What You'll Learn

Contents

Topological Sort

What is Topological Sort?

DAG Required

Multiple Valid Orders

Two Approaches

Real-World Analogy

Visual: Topological Sort Example DAG Ordering

Kahn's Algorithm Steps

Result Order

Kahn's Algorithm (BFS)

DFS Approach

Union-Find (Disjoint Set Union)

What is Union-Find (Disjoint Set Union)?

Find Operation

Union Operation

Near O(1) Time

Real-World Analogy

Visual: Union-Find Operations Path Compression

Path Compression

Union by Rank

Practice Questions: Union-Find

Easy Number of Provinces (Friend Circles)

Medium Redundant Connection - Find the extra edge

Hard Accounts Merge - Group emails by user

Minimum Spanning Tree

What is a Minimum Spanning Tree (MST)?

Kruskal's Algorithm

Prim's Algorithm

MST Properties

Real-World Analogy

Visual: MST Algorithms Comparison Kruskal vs Prim

Kruskal's

Prim's

Kruskal's Algorithm (Union-Find)

Prim's Algorithm (Priority Queue)

Practice Questions: Minimum Spanning Tree

Medium Min Cost to Connect All Points

Medium Connecting Cities With Minimum Cost

Hard Optimize Water Distribution in a Village

Strongly Connected Components

What is a Strongly Connected Component (SCC)?

Bidirectional Reach

Kosaraju's Algorithm

SCC Condensation

Real-World Analogy

Kosaraju's Algorithm

Practice Questions: Strongly Connected Components

Medium Count Strongly Connected Components

Hard Critical Connections in a Network (Bridges)

Hard Minimum Edges to Make Graph Strongly Connected

Bipartite Graphs and Matching

What is a Bipartite Graph?

Two-Colorable

Two Disjoint Sets

Maximum Matching

Bipartite Check - BFS Two-Coloring

Bipartite Check - DFS Approach

Maximum Bipartite Matching - Hungarian Algorithm

Practice Questions: Bipartite Graphs

Medium Is Graph Bipartite?

Medium Possible Bipartition (Party Groups)

Hard Maximum Number of Accepted Invitations

Key Takeaways

Topological Sort = DAG

Union-Find = Connectivity

MST: Kruskal vs Prim

SCC = Kosaraju/Tarjan

Cycle Detection Matters

Choose Algorithm Wisely

Knowledge Check