📖 Chapter 6.2 ⏱️ ~110 min read 🎯 Advanced

Chapter 6.2: Classic DP Problems

📝 Before You Continue: Make sure you've mastered Chapter 6.1's core DP concepts — states, recurrences, and base cases. You should be able to implement Fibonacci and basic coin change from scratch.

In this chapter, we tackle three of the most important and widely-applied DP problems in competitive programming. Mastering these patterns will help you recognize and solve dozens of USACO problems.

6.2.1 Longest Increasing Subsequence (LIS)

Problem: Given an array A of N integers, find the length of the longest subsequence where elements are strictly increasing. A subsequence doesn't need to be contiguous.

Example: A = [3, 1, 8, 2, 5]

LIS: [1, 2, 5] → length 3
Or: [3, 8] → length 2 (not the longest)
Or: [1, 5] → length 2

💡 Key Insight: A subsequence can skip elements but must maintain relative order. The key DP insight: for each index i, ask "what's the longest increasing subsequence that ends at A[i]?" Then the answer is the maximum over all i.

LIS state transitions — A = [3, 1, 8, 2, 5]:

LIS State Transitions

💡 Transition rule: dp[i] = 1 + max(dp[j]) for all j < i where A[j] < A[i]. Each arrow represents "can extend the subsequence ending at j to include i".

LIS Visualization

The diagram above illustrates the LIS structure: arrows show which earlier elements each position can extend from, and highlighted elements form the longest increasing subsequence.

The diagram shows the array [3,1,4,1,5,9,2,6] with the LIS 1→4→5→6 highlighted in green. Each dp[i] value below the array shows the LIS length ending at that position. Arrows connect elements that extend the subsequence.

`O(N²)` DP Solution

State: dp[i] = length of the longest increasing subsequence ending at index i
Recurrence: dp[i] = 1 + max(dp[j]) for all j < i where A[j] < A[i]
Base case: dp[i] = 1 (a subsequence of just A[i])
Answer: max(dp[0], dp[1], ..., dp[N-1])

Step-by-step trace for A = [3, 1, 8, 2, 5]:

dp[0] = 1  (LIS ending at 3: just [3])

dp[1] = 1  (LIS ending at 1: just [1], since no j<1 with A[j]<1)

dp[2] = 2  (LIS ending at 8: A[0]=3 < 8 → dp[0]+1=2; A[1]=1 < 8 → dp[1]+1=2)
            Best: 2 ([3,8] or [1,8])

dp[3] = 2  (LIS ending at 2: A[1]=1 < 2 → dp[1]+1=2)
            Best: 2 ([1,2])

dp[4] = 3  (LIS ending at 5: A[1]=1 < 5 → dp[1]+1=2; A[3]=2 < 5 → dp[3]+1=3)
            Best: 3 ([1,2,5])

LIS length = max(dp) = 3

// Solution: LIS O(N²) — simple but too slow for N > 5000
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    vector<int> dp(n, 1);  // every element alone is a subsequence of length 1

    for (int i = 1; i < n; i++) {
        for (int j = 0; j < i; j++) {
            if (A[j] < A[i]) {              // A[j] can extend subsequence ending at A[i]
                dp[i] = max(dp[i], dp[j] + 1);  // ← KEY LINE
            }
        }
    }

    cout << *max_element(dp.begin(), dp.end()) << "\n";
    return 0;
}

Sample Input: 5 / 3 1 8 2 5 → Output: 3

Complexity Analysis:

Time: O(N²) — double loop
Space: O(N) — dp array

For N ≤ 5000, O(N²) is fast enough. For N up to 10^5, we need the O(N log N) approach.

`O(N log N)` LIS with Binary Search (Patience Sorting)

The key idea: instead of tracking exact dp values, maintain a tails array where tails[k] = the smallest possible tail element of any increasing subsequence of length k+1 seen so far.

Why is this useful? Because if we can maintain this array, we can use binary search to find where to place each new element.

💡 Key Insight (Patience Sorting): Imagine dealing cards to piles. Each pile is a decreasing sequence (like Solitaire). A card goes on the leftmost pile whose top is ≥ it. If no such pile exists, start a new pile. The number of piles equals the LIS length! The tails array is exactly the tops of these piles.

Step-by-step trace for A = [3, 1, 8, 2, 5]:

Process 3: tails = [], no element ≥ 3, so push: tails = [3]
  → LIS length so far: 1

Process 1: tails = [3], lower_bound(1) hits index 0 (3 ≥ 1), replace:
  tails = [1]
  → LIS length still 1; but now the best 1-length subsequence ends in 1 (better!)

Process 8: tails = [1], lower_bound(8) hits end, push: tails = [1, 8]
  → LIS length: 2 (e.g., [1, 8])

Process 2: tails = [1, 8], lower_bound(2) hits index 1 (8 ≥ 2), replace:
  tails = [1, 2]
  → LIS length still 2; but best 2-length subsequence now ends in 2 (better!)

Process 5: tails = [1, 2], lower_bound(5) hits end, push: tails = [1, 2, 5]
  → LIS length: 3 (e.g., [1, 2, 5]) ✓

Answer = tails.size() = 3

ASCII Patience Sorting Visualization:

Cards dealt: 3, 1, 8, 2, 5

After 3:    After 1:    After 8:    After 2:    After 5:
[3]         [1]         [1][8]      [1][2]      [1][2][5]
Pile 1      Pile 1      P1  P2      P1  P2      P1  P2  P3

Number of piles = LIS length = 3 ✓

// Solution: LIS O(N log N) — fast enough for N up to 10^5
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    vector<int> tails;  // tails[i] = smallest tail of any IS of length i+1

    for (int x : A) {
        // Find first tail >= x (for strictly increasing: use lower_bound)
        auto it = lower_bound(tails.begin(), tails.end(), x);

        if (it == tails.end()) {
            tails.push_back(x);   // x extends the longest subsequence
        } else {
            *it = x;              // ← KEY LINE: replace to maintain smallest possible tail
        }
    }

    cout << tails.size() << "\n";
    return 0;
}

⚠️ Note: tails doesn't store the actual LIS elements, just its length. The elements in tails are maintained in sorted order, which is why binary search works.

⚠️ Common Mistake: Using lower_bound gives LIS for strictly increasing (A[j] < A[i]). For non-decreasing (A[j] ≤ A[i]), use upper_bound instead.

Complexity Analysis:

Time: O(N log N) — N elements, each with O(log N) binary search
Space: O(N) — the tails array

LIS Application in USACO

Many USACO Silver problems reduce to LIS:

"Minimum number of groups to partition a sequence so each group is non-increasing" → same as LIS length (by Dilworth's theorem)
Sorting with restrictions often becomes LIS
2D LIS: sort by one dimension, find LIS of the other

🔗 Related Problem: USACO 2015 February Silver: "Censoring" — involves finding a pattern that's a subsequence.

6.2.2 The 0/1 Knapsack Problem

Problem: You have N items. Item i has weight w[i] and value v[i]. Your knapsack holds total weight W. Choose items to maximize total value without exceeding weight W. Each item can be used at most once (0/1 = take it or leave it).

Example:

Items: (weight=2, value=3), (weight=3, value=4), (weight=4, value=5), (weight=5, value=6)
W = 8
Best: take items 1+2+3 (weight 2+3+4=9 > 8), or items 1+2 (weight 5, value 7), or items 1+4 (weight 7, value 9), or items 2+4 (weight 8, value 10). Answer: 10.

Visual: Knapsack DP Table

The 2D table shows dp[item][capacity]. Each row adds one item, and each cell represents the best value achievable with that capacity. The answer (8) is in the bottom-right corner. Highlighted cells show where new items changed the optimal value.

Knapsack DP Table

This static reference shows the complete knapsack DP table with the take/skip decisions highlighted for each item at each capacity level.

DP Formulation

0/1 Knapsack decision — take or skip item i:

Knapsack Decision

💡 Key difference from unbounded knapsack: Because each item can only be used once, "take" reads from row dp[i-1], not the current row. This is why the 1D optimized version iterates weight in reverse order.

State: dp[i][w] = maximum value using items 1..i with total weight ≤ w
Recurrence:
- Don't take item i: dp[i][w] = dp[i-1][w]
- Take item i (only if w[i] ≤ w): dp[i][w] = dp[i-1][w - weight[i]] + value[i]
- Take the maximum: dp[i][w] = max(don't take, take)
Base case: dp[0][w] = 0 (no items = zero value)
Answer: dp[N][W]

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, W;
    cin >> n >> W;

    vector<int> weight(n + 1), value(n + 1);
    for (int i = 1; i <= n; i++) cin >> weight[i] >> value[i];

    // dp[i][w] = max value using first i items with weight limit w
    vector<vector<int>> dp(n + 1, vector<int>(W + 1, 0));

    for (int i = 1; i <= n; i++) {
        for (int w = 0; w <= W; w++) {
            dp[i][w] = dp[i-1][w];  // option 1: don't take item i

            if (weight[i] <= w) {    // option 2: take item i (if it fits)
                dp[i][w] = max(dp[i][w], dp[i-1][w - weight[i]] + value[i]);
            }
        }
    }

    cout << dp[n][W] << "\n";
    return 0;
}

Space-Optimized 0/1 Knapsack — `O(W)` Space

We only need the previous row dp[i-1], so we can use a 1D array. Crucial: iterate w from W down to 0 (otherwise item i is used multiple times):

vector<int> dp(W + 1, 0);

for (int i = 1; i <= n; i++) {
    // Iterate BACKWARDS to prevent using item i more than once
    for (int w = W; w >= weight[i]; w--) {
        dp[w] = max(dp[w], dp[w - weight[i]] + value[i]);
    }
}

cout << dp[W] << "\n";

Why backwards? When computing dp[w], we need dp[w - weight[i]] from the previous item's row (not current item's). Iterating backwards ensures dp[w - weight[i]] hasn't been updated by item i yet.

Unbounded Knapsack (Unlimited Items)

If each item can be used multiple times, iterate forwards:

for (int i = 1; i <= n; i++) {
    for (int w = weight[i]; w <= W; w++) {  // FORWARDS — allows reuse
        dp[w] = max(dp[w], dp[w - weight[i]] + value[i]);
    }
}

6.2.3 Grid Path Counting

Problem: Count the number of paths from the top-left corner (1,1) to the bottom-right corner (N,M) of a grid, moving only right or down. Some cells are blocked.

Example: 3×3 grid with no blockages → 6 paths (C(4,2) = 6).

Visual: Grid Path DP Values

Grid DP

Each cell shows the number of paths from (0,0) to that cell. The recurrence dp[i][j] = dp[i-1][j] + dp[i][j-1] adds paths arriving from above and from the left. The Pascal's triangle pattern emerges naturally when there are no obstacles.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    vector<string> grid(n);
    for (int r = 0; r < n; r++) cin >> grid[r];

    // dp[r][c] = number of paths to reach (r, c)
    vector<vector<long long>> dp(n, vector<long long>(m, 0));

    // Base case: starting cell (if not blocked)
    if (grid[0][0] != '#') dp[0][0] = 1;

    // Fill first row (can only come from the left)
    for (int c = 1; c < m; c++) {
        if (grid[0][c] != '#') dp[0][c] = dp[0][c-1];
    }

    // Fill first column (can only come from above)
    for (int r = 1; r < n; r++) {
        if (grid[r][0] != '#') dp[r][0] = dp[r-1][0];
    }

    // Fill rest of the grid
    for (int r = 1; r < n; r++) {
        for (int c = 1; c < m; c++) {
            if (grid[r][c] == '#') {
                dp[r][c] = 0;  // blocked — no paths through here
            } else {
                dp[r][c] = dp[r-1][c] + dp[r][c-1];  // from above + from left
            }
        }
    }

    cout << dp[n-1][m-1] << "\n";
    return 0;
}

Grid Maximum Value Path

Problem: Find the path from (1,1) to (N,M) (moving right or down) that maximizes the sum of values.

vector<vector<int>> val(n, vector<int>(m));
for (int r = 0; r < n; r++)
    for (int c = 0; c < m; c++)
        cin >> val[r][c];

vector<vector<long long>> dp(n, vector<long long>(m, 0));
dp[0][0] = val[0][0];

for (int c = 1; c < m; c++) dp[0][c] = dp[0][c-1] + val[0][c];
for (int r = 1; r < n; r++) dp[r][0] = dp[r-1][0] + val[r][0];

for (int r = 1; r < n; r++) {
    for (int c = 1; c < m; c++) {
        dp[r][c] = max(dp[r-1][c], dp[r][c-1]) + val[r][c];
    }
}

cout << dp[n-1][m-1] << "\n";

6.2.4 USACO DP Example: Hoof Paper Scissors

Problem (USACO 2019 January Silver): Bessie plays N rounds of Hoof-Paper-Scissors (like Rock-Paper-Scissors but with cow gestures). She knows the opponent's moves in advance. She can change her gesture at most K times. Maximize wins.

State: dp[i][j][g] = max wins in the first i rounds, having changed j times, currently playing gesture g.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;

    // 0=Hoof, 1=Paper, 2=Scissors
    vector<int> opp(n + 1);
    for (int i = 1; i <= n; i++) {
        char c; cin >> c;
        if (c == 'H') opp[i] = 0;
        else if (c == 'P') opp[i] = 1;
        else opp[i] = 2;
    }

    // dp[j][g] = max wins using j changes so far, currently playing gesture g
    // (2D since we process rounds iteratively)
    const int NEG_INF = -1e9;
    vector<vector<int>> dp(k + 1, vector<int>(3, NEG_INF));

    // Initialize: before round 1, 0 changes, any starting gesture
    for (int g = 0; g < 3; g++) dp[0][g] = 0;

    for (int i = 1; i <= n; i++) {
        vector<vector<int>> ndp(k + 1, vector<int>(3, NEG_INF));

        for (int j = 0; j <= k; j++) {
            for (int g = 0; g < 3; g++) {
                if (dp[j][g] == NEG_INF) continue;

                int win = (g == opp[i]) ? 1 : 0;  // do we win this round?

                // Option 1: don't change gesture
                ndp[j][g] = max(ndp[j][g], dp[j][g] + win);

                // Option 2: change gesture (costs 1 change)
                if (j < k) {
                    for (int ng = 0; ng < 3; ng++) {
                        if (ng != g) {
                            int nwin = (ng == opp[i]) ? 1 : 0;
                            ndp[j+1][ng] = max(ndp[j+1][ng], dp[j][g] + nwin);
                        }
                    }
                }
            }
        }

        dp = ndp;
    }

    int ans = 0;
    for (int j = 0; j <= k; j++)
        for (int g = 0; g < 3; g++)
            ans = max(ans, dp[j][g]);

    cout << ans << "\n";
    return 0;
}

6.2.5 Interval DP — Matrix Chain and Burst Balloons Patterns

Interval DP is a powerful DP technique where the state represents a contiguous subarray or subrange, and we combine solutions of smaller intervals to solve larger ones.

💡 Key Insight: When the optimal solution for a range [l, r] depends on how we split that range at some point k, and the sub-problems for [l, k] and [k+1, r] are independent, interval DP applies.

The Interval DP Framework

Interval DP fill order — must fill by increasing interval length:

Interval DP Fill Order

💡 Fill order is critical: You must fill by increasing interval length. When computing dp[l][r], all shorter sub-intervals dp[l][k] and dp[k+1][r] must already be computed.

State:   dp[l][r] = optimal solution for the subproblem on interval [l, r]
Base:    dp[i][i] = cost/value for a single element (often 0 or trivial)
Order:   Fill by increasing interval LENGTH (len = 1, 2, 3, ..., n)
         This ensures dp[l][k] and dp[k+1][r] are computed before dp[l][r]
Transition:
         dp[l][r] = min/max over all split points k in [l, r-1] of:
                    dp[l][k] + dp[k+1][r] + cost(l, k, r)
Answer:  dp[1][n]  (or dp[0][n-1] for 0-indexed)

Enumeration order matters! We enumerate by interval length, not by left endpoint. This guarantees all sub-intervals are solved before we need them.

Classic Example: Matrix Chain Multiplication

Problem: Given N matrices A₁, A₂, ..., Aₙ where matrix Aᵢ has dimensions dim[i-1] × dim[i], find the parenthesization that minimizes the total number of scalar multiplications.

Why DP? Different parenthesizations have wildly different costs:

(A₁A₂)A₃: cost = p×q×r + p×r×s (where shapes are p×q, q×r, r×s)
A₁(A₂A₃): cost = q×r×s + p×q×s

State: dp[l][r] = minimum multiplications to compute the product Aₗ × Aₗ₊₁ × ... × Aᵣ

Transition: Try every split point k ∈ [l, r-1]. When we split at k:

Left product Aₗ...Aₖ has cost dp[l][k], resulting shape dim[l-1] × dim[k]
Right product Aₖ₊₁...Aᵣ has cost dp[k+1][r], resulting shape dim[k] × dim[r]
Multiplying these two results costs dim[l-1] × dim[k] × dim[r]

// Solution: Matrix Chain Multiplication — O(N³) time, O(N²) space
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;  // number of matrices

    // dim[i-1] × dim[i] is the shape of matrix i (1-indexed)
    // So we need n+1 dimensions
    vector<int> dim(n + 1);
    for (int i = 0; i <= n; i++) cin >> dim[i];
    // Matrix i has shape dim[i-1] × dim[i]

    // dp[l][r] = min cost to compute product of matrices l..r
    vector<vector<long long>> dp(n + 1, vector<long long>(n + 1, 0));
    const long long INF = 1e18;

    // Fill dp by increasing interval length
    for (int len = 2; len <= n; len++) {          // interval length
        for (int l = 1; l + len - 1 <= n; l++) {  // left endpoint
            int r = l + len - 1;                   // right endpoint
            dp[l][r] = INF;

            // Try every split point k
            for (int k = l; k < r; k++) {
                long long cost = dp[l][k]                    // left subproblem
                               + dp[k+1][r]                  // right subproblem
                               + (long long)dim[l-1] * dim[k] * dim[r]; // merge cost
                dp[l][r] = min(dp[l][r], cost);
            }
        }
    }

    cout << dp[1][n] << "\n";  // min cost to multiply all n matrices
    return 0;
}

Complexity Analysis:

States: O(N²) — all pairs (l, r) with l ≤ r
Transition: O(N) per state — try all split points k
Total Time: O(N³)
Space: O(N²)

Example trace for N=4, dims = [10, 30, 5, 60, 10]:

Matrices: A1(10×30), A2(30×5), A3(5×60), A4(60×10)

len=2:
  dp[1][2] = dim[0]*dim[1]*dim[2] = 10*30*5 = 1500
  dp[2][3] = dim[1]*dim[2]*dim[3] = 30*5*60 = 9000
  dp[3][4] = dim[2]*dim[3]*dim[4] = 5*60*10 = 3000

len=3:
  dp[1][3]: try k=1: dp[1][1]+dp[2][3]+10*30*60 = 0+9000+18000 = 27000
             try k=2: dp[1][2]+dp[3][3]+10*5*60  = 1500+0+3000  = 4500
             dp[1][3] = 4500
  dp[2][4]: try k=2: dp[2][2]+dp[3][4]+30*5*10  = 0+3000+1500  = 4500
             try k=3: dp[2][3]+dp[4][4]+30*60*10 = 9000+0+18000 = 27000
             dp[2][4] = 4500

len=4:
  dp[1][4]: try k=1: dp[1][1]+dp[2][4]+10*30*10 = 0+4500+3000  = 7500
             try k=2: dp[1][2]+dp[3][4]+10*5*10  = 1500+3000+500 = 5000 ← min!
             try k=3: dp[1][3]+dp[4][4]+10*60*10 = 4500+0+6000  = 10500
             dp[1][4] = 5000

Answer: 5000 scalar multiplications (parenthesization: (A1 A2)(A3 A4))

Template Summary

// Generic Interval DP Template
// Assumes 1-indexed, n elements
void intervalDP(int n) {
    vector<vector<int>> dp(n + 1, vector<int>(n + 1, 0));

    // Base case: intervals of length 1
    for (int i = 1; i <= n; i++) dp[i][i] = base_case(i);

    // Fill by increasing length
    for (int len = 2; len <= n; len++) {
        for (int l = 1; l + len - 1 <= n; l++) {
            int r = l + len - 1;
            dp[l][r] = INF;  // or -INF for maximization

            for (int k = l; k < r; k++) {  // split at k (or k+1)
                int val = dp[l][k] + dp[k+1][r] + cost(l, k, r);
                dp[l][r] = min(dp[l][r], val);  // or max
            }
        }
    }
    // Answer is dp[1][n]
}

⚠️ Common Mistake: Iterating over left endpoint l in the outer loop and length in the inner loop. This is wrong — when you compute dp[l][r], the sub-intervals dp[l][k] and dp[k+1][r] must already be computed. Always iterate by length in the outer loop.

// WRONG — dp[l][k] might not be ready yet!
for (int l = 1; l <= n; l++)
    for (int r = l + 1; r <= n; r++)
        ...

// CORRECT — all shorter intervals are computed first
for (int len = 2; len <= n; len++)
    for (int l = 1; l + len - 1 <= n; l++) {
        int r = l + len - 1;
        ...
    }

6.2.6 Grouped Knapsack (分组背包)

Problem: N groups of items, group i contains cnt[i] items. You must pick at most one item per group (or zero). Maximize total value within weight W.

💡 Key difference from 0/1 knapsack: In 0/1, you decide item-by-item. In grouped, you decide group-by-group — which single item (if any) to pick from each group.

State and Recurrence

State: dp[w] = max value with capacity w (1D rolling, same as optimized 0/1)
Transition: For each group g, for each weight w (descending), try every item j in group g:
```
dp[w] = max(dp[w],  dp[w - weight[g][j]] + value[g][j])   for all j in group g
```
Critical: the inner loops over items within a group must be nested inside the weight loop — otherwise one group's items can be combined with each other.

Implementation

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, W;        // n groups, capacity W
    cin >> n >> W;

    // groups[i] = list of {weight, value} for items in group i
    vector<vector<pair<int,int>>> groups(n);
    for (int i = 0; i < n; i++) {
        int cnt; cin >> cnt;
        groups[i].resize(cnt);
        for (auto& [w, v] : groups[i]) cin >> w >> v;
    }

    vector<int> dp(W + 1, 0);

    for (int i = 0; i < n; i++) {            // for each group
        for (int w = W; w >= 0; w--) {       // iterate capacity DESCENDING
            for (auto [wi, vi] : groups[i]) { // try each item in the group
                if (w >= wi)
                    dp[w] = max(dp[w], dp[w - wi] + vi);
            }
        }
        // After processing group i:
        // dp[w] = best value choosing 0 or 1 item from groups 0..i, capacity ≤ w
    }

    cout << dp[W] << "\n";
    return 0;
}

Complexity: O(N × W × avg_group_size) — where N = number of groups.

Loop Order Explanation

For group g with items {A(w=2,v=3), B(w=3,v=5)}:

CORRECT — items nested INSIDE weight loop:
  for w = W..0:
    try A: dp[w] = max(dp[w], dp[w-2]+3)
    try B: dp[w] = max(dp[w], dp[w-3]+5)
  → At most one item from this group is picked per capacity level

WRONG — weight loop nested INSIDE items loop:
  try A:
    for w = W..0: dp[w] = max(dp[w], dp[w-2]+3)   ← A is "selected"
  try B:
    for w = W..0: dp[w] = max(dp[w], dp[w-3]+5)   ← B can ALSO be selected
  → Both A and B might be selected, violating "at most one per group"

USACO Pattern

"N categories, each category has several options with (cost, benefit); choose at most one from each category; maximize total benefit within budget" → directly apply grouped knapsack.

// Example: N machines, each has multiple upgrade options (cost, speed boost)
// Choose at most one upgrade per machine; maximize total speed within budget B
// → groups = machines, items = upgrade options per machine

6.2.7 Bounded Knapsack (多重背包)

Problem: N types of items, each type has cnt[i] copies available (not unlimited). Maximize value within weight W.

This lies between 0/1 (cnt=1) and unbounded (cnt=∞). Naive approach: expand each item into cnt[i] copies and run 0/1 knapsack → O(N × Σcnt × W), too slow when cnt is large.

Method 1: Binary Splitting (二进制拆分) — O(N log C × W)

Key idea: Any number k ≤ cnt[i] can be represented as a sum of powers of 2 plus a remainder. So split cnt[i] copies into groups of 1, 2, 4, 8, ..., remainder. Each group acts as a single "super-item".

cnt = 13 → groups: 1, 2, 4, 6  (1+2+4+6=13; any 0..13 representable as subset sum)
cnt = 7  → groups: 1, 2, 4     (1+2+4=7)
cnt = 10 → groups: 1, 2, 4, 3  (1+2+4+3=10)

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, W;
    cin >> n >> W;

    // Expand each item type into binary-split "super-items"
    vector<pair<int,int>> items;  // {weight, value} of each super-item

    for (int i = 0; i < n; i++) {
        int wi, vi, ci;
        cin >> wi >> vi >> ci;   // weight, value, count

        // Binary split ci into powers of 2 + remainder
        for (int k = 1; ci > 0; k *= 2) {
            int take = min(k, ci);  // take min(power_of_2, remaining)
            items.push_back({take * wi, take * vi});
            ci -= take;
        }
    }

    // Now run standard 0/1 knapsack on expanded items
    vector<int> dp(W + 1, 0);
    for (auto [w, v] : items) {
        for (int cap = W; cap >= w; cap--)
            dp[cap] = max(dp[cap], dp[cap - w] + v);
    }

    cout << dp[W] << "\n";
    return 0;
}

Complexity: O(Σ log(cnt[i]) × W) ≈ O(N log C × W) where C = max count.

Example: 3 item types, each with count 100 → 3 × 7 = 21 super-items (vs 300 naive).

Method 2: Monotonic Deque Optimization — O(N × W)

For very large counts (cnt up to 10⁶), binary splitting is still too slow. The optimal solution uses a monotonic deque.

Key insight: Group the DP array by residue mod w[i]. Within each residue class, transitions become a sliding window maximum problem.

// Bounded knapsack with monotonic deque — O(N * W)
// For each item type (wi, vi, ci):
void bounded_knapsack_deque(vector<int>& dp, int wi, int vi, int ci, int W) {
    // For each residue r = 0, 1, ..., wi-1:
    //   The relevant dp cells are: dp[r], dp[r+wi], dp[r+2wi], ...
    //   Transition: dp[r + k*wi] = max over j in [k-ci, k-1] of
    //               (dp[r + j*wi] + (k-j)*vi)
    //   Rewrite: dp[r+k*wi] - k*vi = max over j in [k-ci, k] of
    //               (dp[r+j*wi] - j*vi)
    //   → sliding window maximum on the sequence g[j] = dp[r+j*wi] - j*vi

    vector<int> prev = dp;  // snapshot before processing this item type
    for (int r = 0; r < wi; r++) {
        deque<int> dq;  // stores indices j (in the residue class), front = max
        int max_k = (W - r) / wi;

        for (int k = 0; k <= max_k; k++) {
            int idx = r + k * wi;
            int val = prev[idx] - k * vi;  // g[k] = dp[r+k*wi] - k*vi

            // Maintain deque: remove indices outside window [k-ci, k-1]
            while (!dq.empty() && dq.front() < k - ci) dq.pop_front();

            // Update dp[idx] from best in window
            if (!dq.empty()) {
                int j = dq.front();
                dp[idx] = max(dp[idx], prev[r + j * wi] + (k - j) * vi);
            }

            // Add current index to deque (maintain decreasing order of g[])
            while (!dq.empty() && prev[r + dq.back() * wi] - dq.back() * vi <= val)
                dq.pop_back();
            dq.push_back(k);
        }
    }
}

Complexity: O(N × W) — optimal for large counts.

💡 When to use which method:

cnt ≤ 1000, W ≤ 10⁵ → binary splitting (simpler to implement)

cnt up to 10⁶, W up to 10⁵ → monotonic deque (only O(NW))

Comparison: 0/1 vs Grouped vs Bounded vs Unbounded

             0/1         Grouped        Bounded       Unbounded
Count:       1           0 or 1         0..cnt[i]     ∞
Loop direction: ← desc  ← desc (items  ← desc (binary  → asc
             (0/1)       inside w loop)  split gives 0/1)
Key code:    for w: W→wi  for w: W→0:    expand then 0/1   for w: wi→W
                          for j in g:
                            dp[w] = max(...)

6.2.8 Advanced Knapsack Patterns (Gold Level)

Pattern 1: Two-Dimensional Knapsack (二维背包)

Items have two resource constraints (e.g., weight AND volume).

// dp[w][v] = max value with weight ≤ W and volume ≤ V
vector<vector<int>> dp(W + 1, vector<int>(V + 1, 0));

for (int i = 0; i < n; i++) {
    int wi, vi_vol, val;
    cin >> wi >> vi_vol >> val;

    // BOTH dimensions must iterate descending (0/1 constraint)
    for (int w = W; w >= wi; w--)
        for (int v = V; v >= vi_vol; v--)
            dp[w][v] = max(dp[w][v], dp[w - wi][v - vi_vol] + val);
}
cout << dp[W][V] << "\n";

USACO Example: "Select K cows with total weight ≤ W₁ and total cost ≤ W₂ to maximize some property."

Pattern 2: Knapsack with Exactly K Items

// dp[k][w] = max value choosing exactly k items with capacity ≤ w
vector<vector<int>> dp(K + 1, vector<int>(W + 1, -1));
dp[0][0] = 0;  // base: 0 items, 0 weight, 0 value

for (int i = 0; i < n; i++) {
    for (int k = K; k >= 1; k--)
        for (int w = W; w >= weight[i]; w--)
            if (dp[k-1][w - weight[i]] >= 0)
                dp[k][w] = max(dp[k][w], dp[k-1][w - weight[i]] + value[i]);
}

int ans = 0;
for (int w = 0; w <= W; w++)
    if (dp[K][w] >= 0) ans = max(ans, dp[K][w]);
cout << ans << "\n";

Pattern 3: Knapsack on a Tree (依赖背包)

Already covered in Ch.8.3 (§8.3.4c Tree Knapsack). The "connected subset from root" constraint means if you select a node, you must select its parent — which maps to a tree knapsack with O(NW) merging.

思路陷阱 — Knapsack Pitfalls

陷阱 1：分组背包把 items 循环放在外层

// 错误：先遍历组内物品，再遍历容量 → 相当于每个物品独立做 0/1 背包，组内可选多个
for (auto [wi, vi] : groups[g])       // ← 错误顺序
    for (int w = W; w >= wi; w--)
        dp[w] = max(dp[w], dp[w-wi]+vi);

// 正确：先遍历容量，再遍历组内物品 → 每个容量只选组内最优一个
for (int w = W; w >= 0; w--)          // ← 容量在外
    for (auto [wi, vi] : groups[g])
        if (w >= wi) dp[w] = max(dp[w], dp[w-wi]+vi);

陷阱 2：多重背包二进制拆分后忘记用 0/1 背包方向（倒序）

拆分出的"超级物品"仍然是 0/1 约束（每个超级物品最多用一次），必须倒序遍历容量。用正序会让超级物品被重复选取，等效于无限制背包。

⚠️ Common Mistakes in Chapter 6.2

LIS: using upper_bound for strictly increasing: For strictly increasing, use lower_bound. For non-decreasing, use upper_bound. Getting this wrong gives LIS length off by 1.
0/1 Knapsack: iterating weight forward: Iterating w from 0 to W (forward) allows using item i multiple times — that's unbounded knapsack, not 0/1. Always iterate backwards for 0/1.
Grid paths: forgetting to handle blocked cells: If grid[r][c] == '#', set dp[r][c] = 0 (not dp[r-1][c] + dp[r][c-1]).
Overflow in grid path counting: Even for small grids, the number of paths can be astronomically large. Use long long or modular arithmetic.
LIS: thinking tails contains the actual LIS: It doesn't! tails contains the smallest possible tail elements for subsequences of each length. The actual LIS must be reconstructed separately.
Grouped Knapsack: wrong loop nesting (items outside weight): Items loop must be inside the weight loop. If items are in the outer loop, each item is treated independently as a 0/1 item, allowing multiple items from the same group to be selected.
Bounded Knapsack after binary split: iterating weight forward: After binary splitting, super-items are still 0/1 — iterate weight descending. Forward iteration allows reusing the same super-item, giving a wrong result.
2D Knapsack: only one dimension reversed: Both weight and volume constraints require their loops to iterate descending in a 0/1 two-dimensional knapsack.

Chapter Summary

📌 Key Takeaways

Problem	State Definition	Recurrence	Complexity
LIS (`O(N²)`)	`dp[i]` = LIS length ending at A[i]	`dp[i]` = max(`dp[j]`+1), j<i and A[j]<A[i]	`O(N²)`
LIS (`O(N log N)`)	tails[k] = min tail of IS with length k+1	binary search + replace	`O(N log N)`
0/1 Knapsack (1D)	`dp[w]` = max value with capacity ≤ w	reverse iterate w	`O(NW)`
Unbounded Knapsack	`dp[w]` = max value with capacity ≤ w	forward iterate w	`O(NW)`
Grouped Knapsack	`dp[w]` = max value, at most 1 item/group	w descending, items loop inside w loop	`O(N×W×group_size)`
Bounded Knapsack	same as 0/1	binary split → 0/1 knapsack	`O(N log C × W)`
Bounded Knapsack (opt)	same as 0/1	monotonic deque per residue class	`O(NW)`
2D Knapsack	`dp[w][v]` = max value, two constraints	both dimensions reverse iterate	`O(N×W×V)`
Grid Path	`dp[r][c]` = path count to reach (r,c)	`dp[r-1][c]` + `dp[r][c-1]`	`O(RC)`

❓ FAQ

Q1: In the O(N log N) LIS solution, does the tails array store the actual LIS?

A: No! tails stores "the minimum tail element of increasing subsequences of each length". Its length equals the LIS length, but the elements themselves may not form a valid increasing subsequence. To reconstruct the actual LIS, you need to record each element's "predecessor".

Q2: Why does 0/1 knapsack require reverse iteration over w?

A: Because dp[w] needs the "previous row's" dp[w - weight[i]]. If iterating forward, dp[w - weight[i]] may already be updated by the current row (equivalent to using item i multiple times). Reverse iteration ensures each item is used at most once.

Q3: What is the only difference between unbounded knapsack (items usable unlimited times) and 0/1 knapsack code?

A: Just the inner loop direction. 0/1 knapsack: w from W down to weight[i] (reverse). Unbounded knapsack: w from weight[i] up to W (forward).

Q4: What if the grid path can also move up or left?

A: Then simple grid DP no longer works (because there would be cycles). You need BFS/DFS or more complex DP. Standard grid path DP only applies to "right/down only" movement.

🔗 Connections to Later Chapters

Chapter 3.3 (Sorting & Binary Search): binary search is the core of O(N log N) LIS — lower_bound on the tails array
Chapter 6.3 (Advanced DP): extends knapsack to bitmask DP (item sets → bitmask), extends grid DP to interval DP
Chapter 4.1 (Greedy): interval scheduling problems can sometimes be converted to LIS (via Dilworth's theorem)
LIS is extremely common in USACO Silver — 2D LIS, weighted LIS, LIS counting variants appear frequently

Practice Problems

Problem 6.2.1 — LIS Length 🟢 Easy Read N integers. Find the length of the longest strictly increasing subsequence.

Hint

Use the `O(N log N)` approach with `lower_bound` on the `tails` array. Answer is `tails.size()`.

✅ Full Solution

Core Idea: Maintain a tails array where tails[i] = smallest tail element of all increasing subsequences of length i+1. Binary search to find insertion position for each element.

Input: First line N, second line N integers.
Output: LIS length.

Sample:

Input: 6
       3 1 8 2 5 9
Output: 4
(LIS: 1 2 5 9  or  1 2 8 9  etc.)

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n; cin >> n;
    vector<int> a(n);
    for (int& x : a) cin >> x;

    vector<int> tails;
    for (int x : a) {
        // lower_bound: first position where tails[pos] >= x
        auto it = lower_bound(tails.begin(), tails.end(), x);
        if (it == tails.end()) tails.push_back(x);  // extend LIS
        else *it = x;                                 // replace for future
    }
    cout << tails.size() << "\n";
}

Trace for [3,1,8,2,5,9]:

x=3: tails=[3]
x=1: replace 3 → tails=[1]
x=8: extend  → tails=[1,8]
x=2: replace 8 → tails=[1,2]
x=5: extend  → tails=[1,2,5]
x=9: extend  → tails=[1,2,5,9]  length=4 ✓

Complexity: O(N log N) time, O(N) space.

Problem 6.2.2 — Number of LIS 🔴 Hard Read N integers. Find the number of distinct longest increasing subsequences. (Answer modulo 10^9+7.)

Hint

Maintain both `dp[i]` (LIS length ending at i) and `cnt[i]` (number of such LIS). When `dp[j]+1 > dp[i]`: update `dp[i]` and reset `cnt[i]` = `cnt[j]`. When equal: add `cnt[j]` to `cnt[i]`.

✅ Full Solution

Core Idea: O(N²) DP with two arrays. len[i] = length of longest IS ending at index i. cnt[i] = number of IS of that length ending at i.

Sample:

Input: 5
       1 3 5 4 7
Output: 2
(LIS length=4: [1,3,5,7] and [1,3,4,7])

#include <bits/stdc++.h>
using namespace std;
const int MOD = 1e9 + 7;
int main() {
    int n; cin >> n;
    vector<int> a(n);
    for (int& x : a) cin >> x;

    vector<int> len(n, 1);
    vector<long long> cnt(n, 1);

    for (int i = 1; i < n; i++) {
        for (int j = 0; j < i; j++) {
            if (a[j] < a[i]) {
                if (len[j] + 1 > len[i]) {
                    len[i] = len[j] + 1;   // found longer LIS
                    cnt[i] = cnt[j];        // reset count
                } else if (len[j] + 1 == len[i]) {
                    cnt[i] = (cnt[i] + cnt[j]) % MOD;  // same length: add
                }
            }
        }
    }

    int maxLen = *max_element(len.begin(), len.end());
    long long ans = 0;
    for (int i = 0; i < n; i++)
        if (len[i] == maxLen) ans = (ans + cnt[i]) % MOD;
    cout << ans << "\n";
}

Complexity: O(N²) time, O(N) space.

Problem 6.2.3 — 0/1 Knapsack 🟡 Medium N items with weights and values, capacity W. Find maximum value. (N, W ≤ 1000)

Hint

Space-optimized 1D dp: iterate items in outer loop, weights BACKWARDS (W down to weight[i]) in inner loop.

✅ Full Solution

Core Idea: dp[w] = max value with capacity w. For each item, update in reverse to prevent reuse.

Input: N W, then N lines: weight value.
Sample:

Input:
4 10
2 6
2 3
6 5
5 4
Output: 9
(items 1+2: weight=4, value=9)

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, W; cin >> n >> W;
    vector<int> wt(n), val(n);
    for (int i = 0; i < n; i++) cin >> wt[i] >> val[i];

    vector<int> dp(W + 1, 0);
    for (int i = 0; i < n; i++) {
        for (int w = W; w >= wt[i]; w--)  // REVERSE: prevent reuse
            dp[w] = max(dp[w], dp[w - wt[i]] + val[i]);
    }
    cout << dp[W] << "\n";
}

Complexity: O(N × W) time, O(W) space.

Problem 6.2.4 — Collect Stars 🟡 Medium An N×M grid has stars ('*') and obstacles ('#'). Moving only right or down from (1,1) to (N,M), collect as many stars as possible.

Hint

`dp[r][c]` = max stars collected to reach (r,c). For each cell, `dp[r][c]` = max(`dp[r-1][c]`, `dp[r][c-1]`) + (1 if grid[r][c]=='*').

✅ Full Solution

Core Idea: Standard grid DP. Handle obstacles with -INF (unreachable). Only propagate from valid cells.

Sample:

Input:
3 4
.*..
.**.
...*
Output: 3

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, m; cin >> n >> m;
    vector<string> g(n);
    for (auto& row : g) cin >> row;

    const int NEG = -1e9;
    vector<vector<int>> dp(n, vector<int>(m, NEG));

    // Start cell
    dp[0][0] = (g[0][0] == '*') ? 1 : 0;

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            if (i == 0 && j == 0) continue;
            if (g[i][j] == '#') continue;  // blocked
            int star = (g[i][j] == '*') ? 1 : 0;
            int best = NEG;
            if (i > 0 && dp[i-1][j] != NEG) best = max(best, dp[i-1][j]);
            if (j > 0 && dp[i][j-1] != NEG) best = max(best, dp[i][j-1]);
            if (best != NEG) dp[i][j] = best + star;
        }
    }
    cout << max(0, dp[n-1][m-1]) << "\n";
}

Complexity: O(N × M) time and space.

Problem 6.2.5 — Variations of Knapsack 🔴 Hard Variant B: Must fill the knapsack exactly (capacity W, must use exactly W weight).

Hint

Initialize `dp[0]` = 0, all other `dp[w]` = INF. Only states reachable from `dp[0]=0` will have finite values.

✅ Full Solution (Variant B — Exact Knapsack)

Core Idea: Same as standard 0/1 knapsack, but initialize dp[w] = INF for all w > 0. Only dp[0] = 0. Answer is dp[W] (INF = impossible).

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, W; cin >> n >> W;
    vector<int> wt(n), val(n);
    for (int i = 0; i < n; i++) cin >> wt[i] >> val[i];

    const int NEG = -1e9;
    vector<int> dp(W + 1, NEG);
    dp[0] = 0;  // only weight 0 is reachable initially

    for (int i = 0; i < n; i++) {
        for (int w = W; w >= wt[i]; w--) {
            if (dp[w - wt[i]] != NEG)
                dp[w] = max(dp[w], dp[w - wt[i]] + val[i]);
        }
    }
    if (dp[W] == NEG) cout << "impossible\n";
    else cout << dp[W] << "\n";
}

Key difference: Standard knapsack: dp[w] = 0 for all w (can leave capacity unused). Exact knapsack: dp[w] = -INF for w > 0 — only "exactly w filled" states propagate.

Complexity: O(N × W) time, O(W) space.

🏆 Challenge Problem: USACO 2019 January Silver: Grass Planting

✅ Solution Sketch

This combines interval and 1D DP. Key steps:

For each possible interval [l, r], compute the "benefit" of replanting it
dp[i] = max benefit using at most i non-overlapping intervals
Use a standard interval scheduling DP: dp[j] = max over all intervals ending ≤ j of (dp[start-1] + benefit)

The problem tests whether you can identify the interval DP structure under a farming metaphor.

Visual: LIS via Patience Sorting

LIS Patience Sort

This diagram illustrates LIS using the patience sorting analogy. Each "pile" represents a potential subsequence endpoint. The number of piles equals the LIS length. Binary search finds where each card goes in O(log N), giving an O(N log N) overall algorithm.

Visual: Knapsack DP Table

Knapsack DP Table

The 0/1 Knapsack DP table: rows = items considered, columns = capacity. Each cell shows the maximum value achievable. Blue cells show single-item contributions, green cells show combinations, and the starred cell is the optimal answer.

C++ for Competitive Programming: A USACO Guide