📖 Chapter 8.2 ⏱️ ~80 min read 🎯 Gold

Chapter 8.2: Topological Sort & DAG DP

📝 Before You Continue: This chapter requires Chapter 5.1 (graph representation), Chapter 5.2 (BFS/DFS), and Chapter 6.1–6.2 (DP fundamentals). You should be comfortable with adjacency lists and basic memoization.

A directed acyclic graph (DAG) is a directed graph with no cycles. DAGs model dependency relationships — tasks, prerequisites, build systems, puzzle states — and support a special algorithm called topological sort that orders vertices such that every edge goes from earlier to later.

Learning objectives:

  • Implement topological sort via Kahn's (BFS-based) and DFS-based algorithms
  • Detect cycles in directed graphs
  • Apply DP on DAGs: longest path, counting paths, critical path analysis
  • Find Strongly Connected Components (SCCs) using Tarjan's or Kosaraju's algorithm
  • Build condensation DAGs and apply DAG DP on general directed graphs
  • Solve 2-SAT problems by reducing to SCC
  • Recognize "ordering under constraints" problems as topological sort

8.2.0 What Is a DAG?

A directed acyclic graph has edges with direction and no cycles:

DAG (valid):          NOT a DAG (has cycle):
  A → B → D              A → B
  ↓       ↓              ↑   ↓
  C ──────►E              D ← C

DAGs arise naturally in:

  • Prerequisites: course A must come before course B
  • Build systems: compile module A before module B
  • State machines: puzzle states where you can't return to a previous state
  • Task scheduling: event A must happen before event B

The key operation on a DAG is topological ordering: arrange all vertices in a linear sequence such that for every directed edge (u → v), vertex u appears before v.

DAG:                  Valid topological orderings:
A → B → D             A, C, B, D, E
↓       ↑             A, B, C, D, E
C ──────►             (multiple valid orderings may exist)

8.2.1 Kahn's Algorithm (BFS-based Topological Sort)

Core idea: Repeatedly remove vertices with in-degree 0 (no prerequisites). When a vertex is removed, its successors' in-degrees decrease by 1; any that reach 0 are added to the queue.

#include <bits/stdc++.h>
using namespace std;

// Returns topological order, or empty vector if cycle detected
vector<int> topoSort(int n, vector<vector<int>>& adj) {
    // Step 1: compute in-degrees
    vector<int> indegree(n, 0);
    for (int u = 0; u < n; u++)
        for (int v : adj[u])
            indegree[v]++;

    // Step 2: enqueue all sources (in-degree 0)
    queue<int> q;
    for (int i = 0; i < n; i++)
        if (indegree[i] == 0)
            q.push(i);

    vector<int> order;
    while (!q.empty()) {
        int u = q.front(); q.pop();
        order.push_back(u);

        for (int v : adj[u]) {
            indegree[v]--;             // remove edge u → v
            if (indegree[v] == 0)      // v's last prerequisite is done
                q.push(v);
        }
    }

    // If order doesn't contain all vertices, there's a cycle
    if ((int)order.size() != n) return {};  // cycle detected
    return order;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    vector<vector<int>> adj(n);
    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        u--; v--;
        adj[u].push_back(v);
    }

    vector<int> order = topoSort(n, adj);
    if (order.empty()) {
        cout << "Cycle detected — no topological order\n";
    } else {
        for (int v : order) cout << v + 1 << " ";
        cout << "\n";
    }

    return 0;
}

Complexity: O(V + E)

💡 Cycle detection: If Kahn's produces fewer than N vertices in the output, there's a cycle. The "stuck" vertices are part of the cycle (their in-degrees never reach 0).

Tracing Kahn's Algorithm

Graph: 0→1, 0→2, 1→3, 2→3, 3→4

Initial in-degrees: [0, 1, 1, 2, 1]
Queue: [0]

Pop 0 → order=[0], decrease indegree of 1→0, 2→0
Queue: [1, 2]

Pop 1 → order=[0,1], decrease indegree of 3→1
Pop 2 → order=[0,1,2], decrease indegree of 3→0
Queue: [3]

Pop 3 → order=[0,1,2,3], decrease indegree of 4→0
Queue: [4]

Pop 4 → order=[0,1,2,3,4]

All 5 vertices processed → valid topological order!

8.2.2 DFS-based Topological Sort

An alternative using DFS: process vertices in reverse finish order.

#include <bits/stdc++.h>
using namespace std;

vector<int> adj_list[100001];
vector<int> topo_order;
int color[100001];  // 0=white(unvisited), 1=gray(in stack), 2=black(done)
bool has_cycle = false;

void dfs(int u) {
    color[u] = 1;  // mark as in-progress
    for (int v : adj_list[u]) {
        if (color[v] == 1) {
            has_cycle = true;  // back edge → cycle!
            return;
        }
        if (color[v] == 0)
            dfs(v);
    }
    color[u] = 2;           // mark as fully processed
    topo_order.push_back(u);  // add to order AFTER finishing subtree
}

int main() {
    int n, m;
    cin >> n >> m;
    // ... read edges ...

    for (int i = 0; i < n; i++)
        if (color[i] == 0)
            dfs(i);

    reverse(topo_order.begin(), topo_order.end());  // ← KEY: reverse finish order
    // topo_order is now a valid topological ordering
}

Why reverse finish order? In DFS, a vertex finishes (gets added to the list) only after all reachable vertices are processed. So later vertices in DFS finish first; reversing gives the topological order.

⚠️ Kahn's vs DFS: Both work. Kahn's is slightly easier to reason about and naturally detects cycles via count. DFS-based is sometimes cleaner for recursive implementations.


8.2.3 DP on DAGs

Once you have a topological ordering, you can run DP on the DAG efficiently: process vertices in topological order and update each vertex's state based on its predecessors.

Key insight: In a topological ordering, when you process vertex v, all predecessors of v have already been processed. So dp[v] can be computed from dp[predecessors].

Longest Path in a DAG

// Longest path ending at each vertex
vector<int> dp(n, 0);  // dp[v] = longest path ending at v

// Process in topological order
for (int u : topo_order) {
    for (int v : adj[u]) {
        dp[v] = max(dp[v], dp[u] + edge_weight[u][v]);
        //           ↑ current best    ↑ extend path through u→v
    }
}

int ans = *max_element(dp.begin(), dp.end());

Counting Paths from Source to Each Vertex

vector<long long> cnt(n, 0);
cnt[source] = 1;  // one way to reach the source

for (int u : topo_order) {
    for (int v : adj[u]) {
        cnt[v] += cnt[u];  // add all ways to reach u, extended by u→v
        cnt[v] %= MOD;     // if answer needs mod
    }
}
// cnt[t] = number of paths from source to t

USACO-Style Example: Critical Path (Earliest Completion Time)

Tasks 1..N with durations. Task v cannot start until all prerequisite tasks are complete. Find the earliest time each task can start.

// earliest_start[v] = max over all predecessors u of (earliest_start[u] + duration[u])
vector<int> earliest(n, 0);

for (int u : topo_order) {
    for (int v : adj[u]) {
        earliest[v] = max(earliest[v], earliest[u] + duration[u]);
    }
}

// Total project completion time = max(earliest[v] + duration[v]) over all v
int finish_time = 0;
for (int v = 0; v < n; v++)
    finish_time = max(finish_time, earliest[v] + duration[v]);

8.2.4 DAG DP Problem Patterns in USACO Gold

Pattern 1: State Transition as DAG

Many DP problems can be visualized as a DAG where:

  • Vertices = DP states
  • Edges = transitions between states
  • DAG property = transitions only go "forward" (no cycles)

Recognizing this turns the DP recurrence into an explicit graph problem.

Pattern 2: Ordering/Scheduling

"N jobs with precedence constraints (job A must finish before job B). Find the order to schedule them / minimum number of stages / critical path."

Direct application of topological sort + DAG DP.

Pattern 3: Counting Paths with Constraints

"How many valid sequences of choices exist, given that choice B cannot follow choice A?"

Model choices as vertices, constraints as directed edges (A → B means "A before B is forbidden"), then count paths in the resulting DAG.

Pattern 4: Shortest/Longest Path in DAG

Shortest path in a DAG can be solved in O(V+E) via toposort + DP — faster than Dijkstra's O(E log V). If the graph has no negative edges but is also a DAG, prefer this approach.

// Shortest path from source s in a DAG (can handle negative weights!)
vector<int> dist(n, INT_MAX);
dist[s] = 0;

for (int u : topo_order) {
    if (dist[u] == INT_MAX) continue;
    for (auto [v, w] : adj[u]) {
        dist[v] = min(dist[v], dist[u] + w);
    }
}

8.2.5 Strongly Connected Components (SCCs)

A Strongly Connected Component (SCC) of a directed graph is a maximal set of vertices such that every vertex in the set is reachable from every other vertex in the set.

Example:
  0 → 1 → 2
  ↑   ↓   ↓
  └── 3   4

SCCs: {0, 1, 3} and {2} and {4}
- 0→1→3→0: all mutually reachable → one SCC
- 2 can be reached from SCC {0,1,3}, but can't return → separate SCC
- 4 similarly isolated

Why SCCs matter in USACO Gold:

  • Condensation DAG: After contracting each SCC to a single node, the result is always a DAG. This lets you apply DAG DP on graphs with cycles.
  • 2-SAT: Boolean satisfiability with 2 literals per clause reduces to SCC.
  • Reachability queries: "Can u reach v?" → check if they're in the same SCC, or if SCC(u) can reach SCC(v) in condensation DAG.

Tarjan's SCC Algorithm

Core idea: One DFS pass. Maintain a stack and two arrays:

  • disc[v]: DFS discovery time of v
  • low[v]: the smallest discovery time reachable from the subtree of v via at most one "back edge"

When low[v] == disc[v], v is the root of an SCC — pop all vertices above v from the stack.

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> adj[MAXN];

int disc[MAXN], low[MAXN], timer_val = 0;
bool on_stack[MAXN];
stack<int> stk;

int scc_id[MAXN];   // which SCC does each vertex belong to?
int scc_count = 0;

void dfs(int u) {
    disc[u] = low[u] = ++timer_val;
    stk.push(u);
    on_stack[u] = true;

    for (int v : adj[u]) {
        if (disc[v] == 0) {          // tree edge: v unvisited
            dfs(v);
            low[u] = min(low[u], low[v]);
        } else if (on_stack[v]) {    // back edge within current SCC
            low[u] = min(low[u], disc[v]);
        }
        // cross/forward edges (on_stack[v]==false, disc[v]!=0): ignore
    }

    // If u is the root of an SCC
    if (low[u] == disc[u]) {
        scc_count++;
        while (true) {
            int v = stk.top(); stk.pop();
            on_stack[v] = false;
            scc_id[v] = scc_count;
            if (v == u) break;
        }
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v; u--; v--;
        adj[u].push_back(v);
    }

    for (int i = 0; i < n; i++)
        if (disc[i] == 0)
            dfs(i);

    cout << "Number of SCCs: " << scc_count << "\n";
    for (int i = 0; i < n; i++)
        cout << "vertex " << i << " → SCC " << scc_id[i] << "\n";

    return 0;
}

Complexity: O(V + E) — single DFS pass.

💡 Note on SCC numbering: Tarjan's algorithm assigns SCC IDs in reverse topological order of the condensation DAG. SCC with ID 1 is a sink in the DAG; SCC with the highest ID is a source.


Kosaraju's SCC Algorithm

Core idea: Two DFS passes.

  1. Pass 1: Run DFS on original graph; push vertices to a stack in finish order.
  2. Pass 2: On the transposed graph (all edges reversed), process vertices in reverse finish order (stack order). Each DFS tree in pass 2 is exactly one SCC.
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> adj[MAXN];      // original graph
vector<int> radj[MAXN];     // transposed (reversed) graph
bool visited[MAXN];
int scc_id[MAXN];
stack<int> finish_order;

// Pass 1: DFS on original graph, record finish order
void dfs1(int u) {
    visited[u] = true;
    for (int v : adj[u])
        if (!visited[v])
            dfs1(v);
    finish_order.push(u);   // push when fully finished
}

// Pass 2: DFS on transposed graph, assign SCC
void dfs2(int u, int id) {
    visited[u] = true;
    scc_id[u] = id;
    for (int v : radj[u])
        if (!visited[v])
            dfs2(v, id);
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v; u--; v--;
        adj[u].push_back(v);
        radj[v].push_back(u);   // reverse edge
    }

    // Pass 1: fill finish_order
    fill(visited, visited + n, false);
    for (int i = 0; i < n; i++)
        if (!visited[i])
            dfs1(i);

    // Pass 2: process in reverse finish order on transposed graph
    fill(visited, visited + n, false);
    int scc_count = 0;
    while (!finish_order.empty()) {
        int u = finish_order.top(); finish_order.pop();
        if (!visited[u])
            dfs2(u, ++scc_count);
    }

    cout << "Number of SCCs: " << scc_count << "\n";
    return 0;
}

Complexity: O(V + E) — two DFS passes.

Tarjan's vs Kosaraju's

Tarjan'sKosaraju's
Passes1 DFS2 DFS
SpaceStack + arraysOriginal + reversed graph
SCC orderReverse topologicalTopological
Typical preferenceCompetitive programmingEasier to understand

💡 In USACO: Tarjan's is slightly more concise and is the preferred choice for competitive programming. Kosaraju's is easier to understand and debug.


Condensation DAG + DP

After finding SCCs, you can build the condensation DAG and run DP on it.

// Build condensation DAG after Tarjan's
// scc_id[v] = which SCC vertex v belongs to (1-indexed, reverse topo order)
// scc_count = total number of SCCs

vector<int> scc_adj[MAXN];   // edges in condensation DAG
set<pair<int,int>> seen;      // avoid duplicate edges

for (int u = 0; u < n; u++) {
    for (int v : adj[u]) {
        if (scc_id[u] != scc_id[v]) {
            // Edge between different SCCs
            auto e = make_pair(scc_id[u], scc_id[v]);
            if (!seen.count(e)) {
                seen.insert(e);
                scc_adj[scc_id[u]].push_back(scc_id[v]);
            }
        }
    }
}

// Now scc_adj is a DAG — apply DAG DP
// e.g., longest path in condensation DAG weighted by SCC size
vector<int> scc_size(scc_count + 1, 0);
for (int i = 0; i < n; i++) scc_size[scc_id[i]]++;

// DAG DP: dp[v] = max vertices on any path ending at SCC v
vector<int> dp(scc_count + 1, 0);
// Process in topological order (Tarjan gives reverse topo, so process 1..scc_count)
for (int u = 1; u <= scc_count; u++) {
    dp[u] += scc_size[u];
    for (int v : scc_adj[u]) {
        dp[v] = max(dp[v], dp[u]);
    }
}
int ans = *max_element(dp.begin() + 1, dp.end());

8.2.6 Difference Constraints (Bonus)

A system of difference constraints is a set of inequalities of the form:

x_j - x_i ≤ w_{ij}

These appear in USACO when problems ask: "assign values to variables such that all pairwise difference constraints are satisfied, and find the minimum/maximum assignment."

Key insight: A system of difference constraints is equivalent to a shortest path problem!

Transform each constraint x_j - x_i ≤ w into a directed edge i → j with weight w.

Then:

  • If the constraint graph has no negative cycles → a feasible solution exists
  • The tightest feasible solution is given by shortest paths from a source vertex
// Solve a system of difference constraints
// Constraints: x[b] - x[a] <= w  →  edge a→b with weight w
// Returns minimum valid assignment (x[i] = dist from virtual source)
// or empty vector if infeasible (negative cycle)

vector<long long> solve_difference_constraints(
        int n,                         // n variables x[0..n-1]
        vector<tuple<int,int,int>>& constraints  // {a, b, w}: x[b]-x[a]<=w
) {
    // Add virtual source s = n, with edges s→i weight 0 for all i
    // (allows all x[i] >= 0 and gives a common reference point)
    int s = n;
    vector<tuple<int,int,int>> edges = constraints;
    for (int i = 0; i < n; i++)
        edges.push_back({s, i, 0});   // x[i] - x[s] <= 0 → x[i] <= 0

    // Run Bellman-Ford from source s
    vector<long long> dist(n + 1, 0);  // source dist = 0

    for (int iter = 0; iter < n; iter++) {
        for (auto [u, v, w] : edges) {
            if (dist[u] + w < dist[v])
                dist[v] = dist[u] + w;
        }
    }

    // Check for negative cycles
    for (auto [u, v, w] : edges) {
        if (dist[u] + w < dist[v])
            return {};  // negative cycle → infeasible
    }

    return vector<long long>(dist.begin(), dist.begin() + n);
}

USACO Pattern: Problems that say "assign times/positions such that A is at least D after B" map directly to difference constraints.


8.2.7 2-SAT (Two-Satisfiability)

2-SAT is one of the most important applications of Tarjan's SCC. It solves the problem of assigning boolean values (true/false) to N variables such that a conjunction of 2-literal clauses is satisfied.

Problem Form

You have N boolean variables x₁, x₂, ..., xₙ (each can be true or false). You are given M clauses, each of the form:

(xᵢ = aᵢ) OR (xⱼ = aⱼ)

where aᵢ, aⱼ ∈ {true, false}.

Goal: Find an assignment satisfying all clauses, or report no solution exists.

💡 USACO disguise: "Choose for each group A or B. If you choose A from group i, you must choose B from group j." That's 2-SAT!

Building the Implication Graph

Key transformation: An OR clause (p OR q) is logically equivalent to two implications:

¬p → q      (if p is false, then q must be true)
¬q → p      (if q is false, then p must be true)

For each variable xᵢ, create two nodes: 2i (xᵢ = true) and 2i+1 (xᵢ = false, i.e., ¬xᵢ).

Variable xᵢ → node 2i   (xᵢ is TRUE)
              node 2i+1  (xᵢ is FALSE, ¬xᵢ)

For clause (xᵢ = a) OR (xⱼ = b):

  • Let p = node for xᵢ = a, ¬p = node for xᵢ = ¬a
  • Let q = node for xⱼ = b, ¬q = node for xⱼ = ¬b
  • Add edges: ¬p → q and ¬q → p

2-SAT Implementation

#include <bits/stdc++.h>
using namespace std;

struct TwoSat {
    int n;
    vector<vector<int>> adj, radj;
    vector<int> order, comp;
    vector<bool> visited;

    TwoSat(int n) : n(n), adj(2*n), radj(2*n), comp(2*n), visited(2*n) {}

    // Add clause: (var u is val_u) OR (var v is val_v)
    // val = true  → use node 2*var
    // val = false → use node 2*var+1
    void add_clause(int u, bool val_u, int v, bool val_v) {
        // ¬(u=val_u) → (v=val_v)
        adj[2*u + !val_u].push_back(2*v + val_v);
        radj[2*v + val_v].push_back(2*u + !val_u);
        // ¬(v=val_v) → (u=val_u)
        adj[2*v + !val_v].push_back(2*u + val_u);
        radj[2*u + val_u].push_back(2*v + !val_v);
    }

    // Force variable u to take value val (add unit clause: u=val is forced)
    // Equivalent to: add_clause(u, val, u, val)
    // i.e., either u=val OR u=val → u must be val
    void force(int u, bool val) {
        // ¬val → val  (i.e., if ¬val then val, which forces val)
        adj[2*u + !val].push_back(2*u + val);
        radj[2*u + val].push_back(2*u + !val);
    }

    void dfs1(int v) {
        visited[v] = true;
        for (int u : adj[v])
            if (!visited[u]) dfs1(u);
        order.push_back(v);
    }

    void dfs2(int v, int c) {
        comp[v] = c;
        for (int u : radj[v])
            if (comp[u] == -1) dfs2(u, c);
    }

    // Returns true if satisfiable; fills result[] with the solution
    bool solve(vector<bool>& result) {
        // Kosaraju's SCC on the implication graph
        fill(visited.begin(), visited.end(), false);
        for (int v = 0; v < 2*n; v++)
            if (!visited[v]) dfs1(v);

        fill(comp.begin(), comp.end(), -1);
        int c = 0;
        for (int i = (int)order.size()-1; i >= 0; i--) {
            if (comp[order[i]] == -1)
                dfs2(order[i], c++);
        }

        result.resize(n);
        for (int i = 0; i < n; i++) {
            // If xᵢ and ¬xᵢ are in the same SCC → contradiction → infeasible
            if (comp[2*i] == comp[2*i+1]) return false;
            // Choose: xᵢ = true iff SCC(xᵢ) comes later than SCC(¬xᵢ)
            // (Kosaraju assigns higher comp IDs to later SCCs in topo order)
            result[i] = comp[2*i] > comp[2*i+1];
        }
        return true;
    }
};

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;  // n variables, m clauses

    TwoSat sat(n);

    for (int i = 0; i < m; i++) {
        // Read clause: (x[u] = a) OR (x[v] = b)
        // where u,v are 0-indexed, a,b are 0 or 1
        int u, a, v, b;
        cin >> u >> a >> v >> b;
        sat.add_clause(u, a, v, b);
    }

    vector<bool> result;
    if (sat.solve(result)) {
        for (int i = 0; i < n; i++)
            cout << "x[" << i << "] = " << result[i] << "\n";
    } else {
        cout << "UNSATISFIABLE\n";
    }

    return 0;
}

Complexity: O(N + M) — Kosaraju's SCC on the implication graph.

Why This Works

The fundamental insight: if xᵢ and ¬xᵢ end up in the same SCC, then the implication graph contains a path from xᵢ to ¬xᵢ AND from ¬xᵢ to xᵢ. This means xᵢ forces ¬xᵢ and ¬xᵢ forces xᵢ — a contradiction. No valid assignment exists.

If no variable is in the same SCC as its negation, we can always construct a valid assignment by choosing based on topological order.

Tracing a Small Example

3 variables: x₀, x₁, x₂
Clauses:
  (x₀ OR x₁)       → ¬x₀→x₁,  ¬x₁→x₀
  (¬x₀ OR x₂)      → x₀→x₂,   ¬x₂→¬x₀
  (¬x₁ OR ¬x₂)     → x₁→¬x₂,  x₂→¬x₁

Implication graph (nodes: 0=x₀T, 1=x₀F, 2=x₁T, 3=x₁F, 4=x₂T, 5=x₂F):
  1→2, 3→0  (from x₀ OR x₁)
  0→4, 5→1  (from ¬x₀ OR x₂)
  2→5, 4→3  (from ¬x₁ OR ¬x₂)

SCC analysis:
  No variable xi has comp[2i] == comp[2i+1] → SATISFIABLE
  Topological order gives: x₀=false, x₁=true, x₂=false  (one valid assignment)

USACO 2-SAT Patterns

Pattern 1: "Choose exactly one of A or B for each group"

For group i: variable xᵢ = true means "choose A", false means "choose B"
Constraint "if i chooses A then j must choose B":
  add_clause(i, false, j, false)  // ¬(i=A) OR ¬(j=A)

Pattern 2: "At most one of {A, B, C, D} is true" (n-variable at-most-one)

// Chain encoding: if xᵢ is true, then x_{i+1}..x_{n-1} are false
// For each pair (i, j) with i < j: add_clause(i, false, j, false)
// Naive is O(n²) clauses; chain trick gives O(n):
// Introduce auxiliary variables yᵢ = "at least one of x₀..xᵢ is true"
// y₀ = x₀
// yᵢ = yᵢ₋₁ OR xᵢ  →  model as 2-SAT implications

Pattern 3: "Assign each of N elements to side L or R with constraints"

xᵢ = true → element i is on the Left
xᵢ = false → element i is on the Right
Constraint "i and j cannot both be Left": add_clause(i, false, j, false)
Constraint "i must be Left if j is Right": add_clause(i, true, j, true)
  → but this is actually forcing: ¬(i=false) OR ¬(j=false)
  → wait: "i=Left OR j=Left" → add_clause(i, true, j, true)

💡 思路陷阱(Pitfall Patterns)

这一节整理 Gold 解题中最常见的判断失误——看起来应该用某种方法,实际上走错了方向。

陷阱 1:把含环的有向图当 DAG 处理

错误判断: "这题有拓扑顺序,用 toposort + DP 就行了" 实际情况: 图中可能有环(SCC),直接 toposort 会漏掉部分节点

反例:有向图 A→B→C→A→D
错误:toposort 输出 [D](只有 3 个节点在环外)
正确:先用 Tarjan 找 SCC,把 {A,B,C} 缩成一个超级节点,再在缩点 DAG 上做 DP

识别信号: 题目说"有向图"但没说"无环"→ 先检测环或求 SCC,再决定是否能用 toposort


陷阱 2:2-SAT 误认为是普通贪心

错误判断: "每个位置选 A 或 B,有约束,贪心从左到右扫一遍" 实际情况: 约束之间有传递性,贪心无法保证全局一致

反例:5个组,约束如下(若选 A₁ 则必须选 B₂,若选 A₂ 则必须选 B₃...)
贪心:组1选A₁ → 组2选B₂ → 组3选A₃ → 组4选B₄ → 组5可能无解
2-SAT:建立完整蕴含图,SCC 分析一次性得出全局一致解

识别信号: 每个位置/元素有两种选择 + 成对约束("如果...则...")→ 考虑 2-SAT


⚠️ Common Mistakes

  1. Confusing directed and undirected cycles: Topological sort only applies to directed graphs. In undirected graphs, any connected component has a spanning tree — no "cycle detection" needed.

  2. Off-by-one in DP initialization: For "count paths from source," initialize cnt[source] = 1, not cnt[source] = 0. For "longest path," initialize all dp[v] = 0 (not -∞) if you want the length in edges, or properly handle the case where paths may not exist.

  3. Forgetting to handle unreachable vertices: If dist[u] == INT_MAX in DAG shortest path, skip that vertex — extending from an unreachable vertex gives garbage values.

  4. Stack overflow on large DFS-based toposort: For N = 10⁵ with deep chains, recursive DFS may stack overflow. Prefer Kahn's (BFS-based) for large inputs.

  5. Using topological sort on a graph with cycles: Kahn's will silently return a partial ordering. Always check that order.size() == n.


📋 Chapter Summary

📌 Key Takeaways

ConceptSummary
DAGDirected graph with no cycles; models dependency/ordering
Topological sortLinear order where all edges go left→right; O(V+E)
Kahn's algorithmBFS-based; uses in-degree counts; naturally detects cycles
DFS toposortAdd vertex to result after DFS finishes it; reverse at end
Cycle detectionKahn's: count < N means cycle; DFS: gray→gray edge means cycle
DP on DAGProcess states in topo order; dp[v] depends only on dp[predecessors]
Longest pathdp[v] = max(dp[u] + weight) for all predecessors u of v
Counting pathscnt[v] = sum of cnt[u] for all predecessors u of v
SCC (Tarjan's)One DFS; disc[]/low[]/stack; O(V+E); gives reverse topo order
SCC (Kosaraju's)Two DFS on G and Gᵀ; O(V+E); gives topological order
Condensation DAGContract each SCC to one node; result is always a DAG
2-SATN boolean vars + 2-literal clauses; build implication graph → SCC; O(N+M)
Difference constraintsx[j]-x[i]≤w → edge i→j; feasibility = no negative cycle (Bellman-Ford)

❓ FAQ

Q: Is every tree a DAG? A: A rooted tree (with edges pointing from parent to child) is a DAG. An unrooted tree is not directed, so the question doesn't apply directly. If you root a tree, yes, it's a DAG.

Q: Can topological sort have multiple valid orderings? A: Yes. If two vertices have no dependency between them, either can come first. The unique ordering exists only if the DAG is a simple path (chain).

Q: Dijkstra vs DAG shortest path — when to use which? A: If the graph is a DAG, use toposort + DP: O(V+E), handles negative weights, simpler. If the graph has cycles but no negative edges, use Dijkstra: O(E log V). If there are cycles and negative edges, use Bellman-Ford: O(VE).

Q: How do I solve "minimum number of passes/phases to complete all tasks"? A: This is the "longest path" in the DAG (critical path). The minimum number of phases is 1 + length of the longest path.

🔗 Connections to Later Chapters

  • Ch.8.3 (Tree DP): Tree DP is DP on a special DAG (rooted tree). The techniques here generalize directly.
  • Ch.6.3 (Advanced DP): Bitmask DP states often form a DAG (transitions only go from subsets to supersets).
  • Ch.5.2 (BFS/DFS): Both algorithms used here are extensions of BFS/DFS from Ch.5.2.

🏋️ Practice Problems

🟢 Easy

8.2-E1. Course Schedule (LeetCode 207 equivalent) N courses, M prerequisites. Given prerequisite pairs (a, b) meaning "must take b before a," determine if you can complete all courses (i.e., if no cycle exists).

Hint

Run Kahn's algorithm. If the output contains all N courses, no cycle → possible. Otherwise, there's a cycle → impossible.


8.2-E2. Longest Path in a DAG Given N vertices, M directed edges with weights, and a source vertex S. Find the longest path starting from S.

Hint

Topological sort, then process vertices in topo order. Initialize dp[S] = 0, dp[others] = -∞. For each edge u→v: dp[v] = max(dp[v], dp[u] + w).


🟡 Medium

8.2-M1. Count Paths in Grid (Grid DP as DAG) In an N×M grid, you can move right or down. Some cells are blocked. Count the number of paths from (1,1) to (N,M).

Hint

The grid is a DAG (movements only go right/down). Process cells in row-major order (which is already topological order). cnt[i][j] = cnt[i-1][j] + cnt[i][j-1] if cell (i,j) is not blocked.


8.2-M2. Task Scheduling with Dependencies (USACO-style) N tasks with durations, M dependency edges. Each task can only start when all its prerequisites are complete. All tasks run in parallel when possible. Find the minimum total time to complete all tasks.

Hint

This is the critical path method (CPM). Run Kahn's toposort. For each vertex in topo order, compute earliest_start[v] = max(earliest_start[u] + duration[u]) over all predecessors u. Answer = max(earliest_start[v] + duration[v]).


🔴 Hard

8.2-H1. Counting Paths Modulo P (USACO Gold 2012 — Cow Rectangles) Given a DAG with N vertices (up to 10⁵) and M edges. For a source S and target T, count the number of paths from S to T modulo 10⁹+7. Some vertices are "marked"; count only paths that pass through at least one marked vertex.

Hint

Use inclusion-exclusion: (paths through ≥1 marked) = (all paths) − (paths through no marked vertices). For "paths through no marked vertices," simply remove marked vertices from the graph and recount.


🏆 Challenge

8.2-C1. SCC Condensation + DAG DP (Hard) Given a directed graph that may have cycles. Find the maximum number of vertices on any path in the condensation DAG (where each SCC is contracted to a single vertex).

Hint

Run Tarjan's or Kosaraju's to find SCCs and their sizes. Build the condensation DAG. Run longest-path DP on the condensation DAG, where each vertex's "value" is the SCC size. The answer is max dp[v].