📖 Chapter 3.10 ⏱️ ~60 min read 🎯 Advanced

Chapter 3.10: Fenwick Tree (Binary Indexed Tree)

📝 Before You Continue: You should already know prefix sums (Chapter 3.2) and bitwise operations. This chapter complements Segment Tree (Chapter 3.9) — BIT code is shorter, with smaller constants, but supports fewer operations.

Fenwick Tree (also known as Binary Indexed Tree / BIT) is one of the most commonly used data structures in competitive programming: under 15 lines of code, yet supports point updates and prefix queries in O(log N) time.

3.10.1 The Core Idea: What Is `lowbit`?

Bitwise Principle of lowbit

For any positive integer x, lowbit(x) = x & (-x) returns the value of the lowest set bit in the binary representation of x.

x  =  6  →  binary: 0110
-x = -6  →  two's complement: 1010  (bitwise NOT + 1)
x & (-x) = 0010 = 2   ← lowest set bit corresponds to 2^1 = 2

Examples:

x	Binary	-x (two's complement)	x & (-x)	Meaning
1	0001	1111	0001 = 1	Manages 1 element
2	0010	1110	0010 = 2	Manages 2 elements
3	0011	1101	0001 = 1	Manages 1 element
4	0100	1100	0100 = 4	Manages 4 elements
6	0110	1010	0010 = 2	Manages 2 elements
8	1000	1000	1000 = 8	Manages 8 elements

BIT Tree Index Intuition

The elegance of BIT: tree[i] does not store a single element, but stores the sum of a range, with length exactly lowbit(i).

BIT 树形结构示意（n=8）：

flowchart BT
    T1["tree[1]\nA[1]\n管理 1 个"]
    T2["tree[2]\nA[1..2]\n管理 2 个"]
    T3["tree[3]\nA[3]\n管理 1 个"]
    T4["tree[4]\nA[1..4]\n管理 4 个"]
    T5["tree[5]\nA[5]\n管理 1 个"]
    T6["tree[6]\nA[5..6]\n管理 2 个"]
    T7["tree[7]\nA[7]\n管理 1 个"]
    T8["tree[8]\nA[1..8]\n管理 8 个"]
    T1 --> T2
    T3 --> T4
    T2 --> T4
    T5 --> T6
    T7 --> T8
    T6 --> T8
    T4 --> T8
    style T8 fill:#dbeafe,stroke:#3b82f6
    style T4 fill:#e0f2fe,stroke:#0284c7
    style T2 fill:#f0f9ff,stroke:#38bdf8
    style T6 fill:#f0f9ff,stroke:#38bdf8

查询 prefix(7) 的跳转路径：

flowchart LR
    Q7["i=7\n加 tree[7]=A[7]"] -->|"7-lowbit(7)=6"| Q6
    Q6["i=6\n加 tree[6]=A[5..6]"] -->|"6-lowbit(6)=4"| Q4
    Q4["i=4\n加 tree[4]=A[1..4]"] -->|"4-lowbit(4)=0"| Q0
    Q0(["i=0 停止\n共 3 步 = O(log 7)"])
    style Q0 fill:#dcfce7,stroke:#16a34a

💡 跳转规律： 查询时 i -= lowbit(i)（向下跳），更新时 i += lowbit(i)（向上跳）。每次跳转消除最低位的 1，最多 log N 步。

Index i:  1    2    3    4    5    6    7    8
Range managed by tree[i]:
  tree[1] = A[1]            (length lowbit(1)=1)
  tree[2] = A[1]+A[2]       (length lowbit(2)=2)
  tree[3] = A[3]            (length lowbit(3)=1)
  tree[4] = A[1]+...+A[4]   (length lowbit(4)=4)
  tree[5] = A[5]            (length lowbit(5)=1)
  tree[6] = A[5]+A[6]       (length lowbit(6)=2)
  tree[7] = A[7]            (length lowbit(7)=1)
  tree[8] = A[1]+...+A[8]   (length lowbit(8)=8)

更新位置 3 的跳转路径：

flowchart LR
    U3["i=3\n更新 tree[3]"] -->|"3+lowbit(3)=4"| U4
    U4["i=4\n更新 tree[4]"] -->|"4+lowbit(4)=8"| U8
    U8["i=8\n更新 tree[8]"] -->|"8+lowbit(8)=16>n"| U_end
    U_end(["i>n 停止\n共 3 步 = O(log N)"])
    style U_end fill:#dcfce7,stroke:#16a34a

When querying prefix sum prefix(7), jump up via i -= lowbit(i):

i=7: add tree[7] (manages A[7]), then 7 - lowbit(7) = 7 - 1 = 6
i=6: add tree[6] (manages A[5..6]), then 6 - lowbit(6) = 6 - 2 = 4
i=4: add tree[4] (manages A[1..4]), then 4 - lowbit(4) = 4 - 4 = 0, stop

Total 3 steps = O(log 7) ≈ 3 steps.

When updating position 3, jump up via i += lowbit(i):

i=3: update tree[3], then 3 + lowbit(3) = 3 + 1 = 4
i=4: update tree[4], then 4 + lowbit(4) = 4 + 4 = 8
i=8: update tree[8], 8 > n, stop

3.10.2 Point Update + Prefix Query — Complete Code

// ══════════════════════════════════════════════════════════════
// Fenwick Tree (Binary Indexed Tree) — Classic Implementation
// Supports: Point Update O(log N), Prefix Sum Query O(log N)
// Arrays are 1-INDEXED (critical!)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;

int n;
long long tree[MAXN];  // BIT array, 1-indexed

// ── lowbit: returns the value of the lowest set bit ──
// x & (-x) works because:
//   -x in two's complement = ~x + 1
//   The lowest set bit of x is preserved, all higher bits cancel out
// Example: x=6 (0110), -x=1010, x&(-x)=0010=2
inline int lowbit(int x) {
    return x & (-x);
}

// ── update: add val to position i ──
// Walk UP the tree: i += lowbit(i)
// Each ancestor that covers position i gets updated
void update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        tree[i] += val;
    // Time: O(log N) — at most log2(N) iterations
}

// ── query: return prefix sum A[1..i] ──
// Walk DOWN the tree: i -= lowbit(i)
// Decompose [1..i] into O(log N) non-overlapping ranges
long long query(int i) {
    long long sum = 0;
    for (; i > 0; i -= lowbit(i))
        sum += tree[i];
    return sum;
    // Time: O(log N) — at most log2(N) iterations
}

// ── build: initialize BIT from an existing array A[1..n] ──
// Method 1: N individual updates — O(N log N)
void build_slow(long long A[]) {
    fill(tree + 1, tree + n + 1, 0LL);
    for (int i = 1; i <= n; i++)
        update(i, A[i]);
}

// Method 2: O(N) build using the "direct parent" trick
void build_fast(long long A[]) {
    for (int i = 1; i <= n; i++) {
        tree[i] += A[i];
        int parent = i + lowbit(i);  // direct parent in BIT
        if (parent <= n)
            tree[parent] += tree[i];
    }
}

// ── Full Example ──
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int q;
    cin >> n >> q;

    long long A[MAXN] = {};
    for (int i = 1; i <= n; i++) cin >> A[i];
    build_fast(A);  // O(N) initialization

    while (q--) {
        int type;
        cin >> type;
        if (type == 1) {
            // Point update: A[i] += val
            int i; long long val;
            cin >> i >> val;
            update(i, val);
        } else {
            // Prefix query: sum of A[1..r]
            int r;
            cin >> r;
            cout << query(r) << "\n";
        }
    }
    return 0;
}

3.10.3 Range Query = prefix(r) - prefix(l-1)

Range query sum(l, r) is identical to the prefix sum technique:

// Range sum query: sum of A[l..r]
// Time: O(log N) — two prefix queries
long long range_query(int l, int r) {
    return query(r) - query(l - 1);
    // query(r)   = A[1] + A[2] + ... + A[r]
    // query(l-1) = A[1] + A[2] + ... + A[l-1]
    // difference = A[l] + A[l+1] + ... + A[r]
}

// Example usage:
// A = [3, 1, 4, 1, 5, 9, 2, 6]  (1-indexed)
// range_query(3, 6) = query(6) - query(2)
//                  = (3+1+4+1+5+9) - (3+1)
//                  = 23 - 4 = 19
// Verify: A[3]+A[4]+A[5]+A[6] = 4+1+5+9 = 19 ✓

3.10.4 Comparison: Prefix Sum vs BIT vs Segment Tree

Operation	Prefix Sum Array	Fenwick Tree (BIT)	Segment Tree
Build	`O(N)`	`O(N)` or `O(N log N)`	`O(N)`
Prefix Query	`O(1)`	`O(log N)`	`O(log N)`
Range Query	`O(1)`	`O(log N)`	`O(log N)`
Point Update	`O(N)` rebuild	`O(log N)` ✓	`O(log N)` ✓
Range Update	`O(N)`	`O(log N)` (Difference BIT)	`O(log N)` (lazy tag)
Range Min/Max	`O(1)` (sparse table)	❌ Not supported	✓ Supported
Code Complexity	Minimal	Simple (10 lines)	Complex (50+ lines)
Constant Factor	Smallest	Very small	Larger
Space	`O(N)`	`O(N)`	`O(4N)`

When to choose BIT?

✅ Only need prefix/range sum + point update
✅ Need extremely concise code (fewer bugs in contest)
✅ Counting inversions, merge sort counting problems
❌ Need range min/max → use Segment Tree
❌ Need complex range operations (range multiply, etc.) → use Segment Tree

3.10.5 Interactive Visualization: BIT Update Process

3.10.6 Range Update + Point Query (Difference BIT)

Standard BIT supports "point update + prefix query". Using the difference array technique, it can instead support "range update + point query".

Principle

Let difference array D[i] = A[i] - A[i-1] (D[1] = A[1]), then:

A[i] = D[1] + D[2] + ... + D[i] (i.e., A[i] is the prefix sum of D)
Adding val to all A[l..r] is equivalent to: D[l] += val; D[r+1] -= val

// ══════════════════════════════════════════════════════════════
// Difference BIT: Range Update + Point Query
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;
int n;
long long diff_bit[MAXN];  // BIT over difference array D[]

inline int lowbit(int x) { return x & (-x); }

// Update D[i] += val in the difference BIT
void diff_update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        diff_bit[i] += val;
}

// Query A[i] = sum of D[1..i] = prefix query on diff BIT
long long diff_query(int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i))
        s += diff_bit[i];
    return s;
}

// Range update: add val to all A[l..r]
// Equivalent to: D[l] += val, D[r+1] -= val
void range_update(int l, int r, long long val) {
    diff_update(l, val);       // D[l] += val
    diff_update(r + 1, -val);  // D[r+1] -= val
}

// Point query: return current value of A[i]
// A[i] = D[1] + D[2] + ... + D[i] = prefix_sum(D, i)
long long point_query(int i) {
    return diff_query(i);
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int q;
    cin >> n >> q;
    // Initialize: read A[i], build diff BIT
    for (int i = 1; i <= n; i++) {
        long long x; cin >> x;
        // D[i] = A[i] - A[i-1]: use two updates
        diff_update(i, x);
        if (i + 1 <= n) diff_update(i + 1, -x); // will be overridden by next iteration
        // Simpler: just set D[i] = A[i]-A[i-1] directly
    }
    // Better initialization using range_update for each element:
    // fill diff_bit with 0, then range_update(i, i, A[i]) for each i
    // Or: diff_update(1, A[1]); for i>=2: diff_update(i, A[i]-A[i-1])

    while (q--) {
        int type; cin >> type;
        if (type == 1) {
            int l, r; long long val;
            cin >> l >> r >> val;
            range_update(l, r, val);  // A[l..r] += val, O(log N)
        } else {
            int i; cin >> i;
            cout << point_query(i) << "\n";  // query A[i], O(log N)
        }
    }
    return 0;
}

Advanced: Range Update + Range Query (Dual BIT)

To support both range update + range query simultaneously, use two BITs:

// ══════════════════════════════════════════════════════════════
// Double BIT: Range Update + Range Query
// Formula: sum(1..r) = B1[r] * r - B2[r]
// where B1 is BIT over D[], B2 is BIT over (i-1)*D[i]
// ══════════════════════════════════════════════════════════════
long long B1[MAXN], B2[MAXN];  // Two BITs

inline int lowbit(int x) { return x & (-x); }

void add(long long* b, int i, long long v) {
    for (; i <= n; i += lowbit(i)) b[i] += v;
}
long long sum(long long* b, int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i)) s += b[i];
    return s;
}

// Range update: add val to A[l..r]
void range_add(int l, int r, long long val) {
    add(B1, l, val);
    add(B1, r + 1, -val);
    add(B2, l, val * (l - 1));     // compensate for prefix formula
    add(B2, r + 1, -val * r);
}

// Prefix sum A[1..r]
long long prefix_sum(int r) {
    return sum(B1, r) * r - sum(B2, r);
}

// Range sum A[l..r]
long long range_sum(int l, int r) {
    return prefix_sum(r) - prefix_sum(l - 1);
}

3.10.7 USACO-Style Problem: Counting Inversions with BIT

Problem Statement

Counting Inversions (O(N log N))

Given an integer array A of length N (distinct elements, range 1..N), count the number of inversions.

Inversion: a pair of indices (i, j) where i < j but A[i] > A[j].

Constraints: N ≤ 3×10⁵, requires O(N log N) solution.

Sample Input:

5
3 1 4 2 5

Sample Output:

Explanation: Inversions are (3,1), (3,2), (4,2), total 3 pairs.

Solution: BIT Inversion Count

// ══════════════════════════════════════════════════════════════
// Counting Inversions using Fenwick Tree — O(N log N)
//
// Key Idea:
//   Process A[i] from left to right.
//   For each A[i], the number of inversions with A[i] as the
//   RIGHT element = count of already-processed values > A[i]
//                 = (elements processed so far) - (elements <= A[i])
//                 = i-1 - prefix_query(A[i])
//   Sum over all i gives total inversions.
//
// BIT role: track frequency of seen values.
//   After seeing value v: update(v, +1)
//   Query # of values <= x: query(x)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const int MAXN = 300005;

int n;
int bit[MAXN];  // BIT for frequency counting; bit[v] tracks how many times v appeared

inline int lowbit(int x) { return x & (-x); }

// Add 1 to position v (we saw value v)
void update(int v) {
    for (; v <= n; v += lowbit(v))
        bit[v]++;
}

// Count how many values in [1..v] have been seen
int query(int v) {
    int cnt = 0;
    for (; v > 0; v -= lowbit(v))
        cnt += bit[v];
    return cnt;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n;
    
    ll inversions = 0;
    
    for (int i = 1; i <= n; i++) {
        int a;
        cin >> a;
        
        // Count inversions where a is the RIGHT element:
        // # of already-seen values GREATER than a
        // = (i-1 elements seen so far) - (# of seen values <= a)
        int less_or_equal = query(a);         // # of seen values in [1..a]
        int greater = (i - 1) - less_or_equal; // # of seen values in [a+1..n]
        inversions += greater;
        
        // Mark that we've now seen value a
        update(a);
    }
    
    cout << inversions << "\n";
    return 0;
}

/*
Trace for A = [3, 1, 4, 2, 5]:

i=1, a=3: seen=[], query(3)=0, greater=0-0=0. inversions=0. update(3).
i=2, a=1: seen=[3], query(1)=0, greater=1-0=1. inversions=1. update(1).
           (3 > 1: that's 1 inversion: (3,1) ✓)
i=3, a=4: seen=[3,1], query(4)=2, greater=2-2=0. inversions=1. update(4).
           (no element > 4 was seen before)
i=4, a=2: seen=[3,1,4], query(2)=1, greater=3-1=2. inversions=3. update(2).
           (3>2 and 4>2: 2 inversions: (3,2),(4,2) ✓)
i=5, a=5: seen=[3,1,4,2], query(5)=4, greater=4-4=0. inversions=3. update(5).

Final: 3 ✓
*/

Complexity Analysis:

Time: O(N log N) — N iterations, each O(log N) for update + query
Space: O(N) for BIT

Extension: If array elements are not in range 1..N, first apply coordinate compression before using BIT:

// Coordinate compression for arbitrary values
vector<int> A(n);
for (int i = 0; i < n; i++) cin >> A[i];

// Step 1: sort and deduplicate
vector<int> sorted_A = A;
sort(sorted_A.begin(), sorted_A.end());
sorted_A.erase(unique(sorted_A.begin(), sorted_A.end()), sorted_A.end());

// Step 2: replace each value with its rank (1-indexed)
for (int i = 0; i < n; i++) {
    A[i] = lower_bound(sorted_A.begin(), sorted_A.end(), A[i]) - sorted_A.begin() + 1;
    // A[i] is now in [1..M] where M = sorted_A.size()
}
// Now use BIT with n = sorted_A.size()

3.10.8 Common Mistakes

❌ Mistake 1: Wrong `lowbit` Implementation

// ❌ WRONG — common typo/confusion
int lowbit(int x) { return x & (x - 1); }  // This CLEARS the lowest bit, not returns it!
// x=6 (0110): x&(x-1) = 0110&0101 = 0100 = 4 (WRONG, should be 2)

int lowbit(int x) { return x % 2; }         // Only works for last bit, not lowbit value

// ✅ CORRECT
int lowbit(int x) { return x & (-x); }
// x=6: -6 = ...11111010 (two's complement)
// 0110 & 11111010 = 0010 = 2 ✓

Memory trick: x & (-x) reads as "x AND negative-x". -x is bitwise NOT plus 1, which clears all bits below the lowest set bit, flips all bits above it, and the AND operation keeps only the lowest set bit.

❌ Mistake 2: 0-indexed Array (the 0-index trap)

BIT must use 1-indexed arrays. 0-indexed causes infinite loops!

// ❌ WRONG — 0-indexed causes infinite loop!
// If i = 0: query loop: i -= lowbit(0) = 0 - (0 & 0) = 0 - 0 = 0 → infinite loop!
// (Actually lowbit(0) = 0 & 0 = 0, so i never decreases)

void query_WRONG(int i) {  // i is 0-indexed
    int s = 0;
    for (; i > 0; i -= lowbit(i))  // if i=0 initially, loop doesn't execute but
        s += bit[i];               // if called with i=0 during calculation... disaster
    return s;
}

// ❌ WRONG — forgetting +1 when converting to 1-indexed
int arr[n]; // 0-indexed A[0..n-1]
for (int i = 0; i < n; i++) {
    update(i, arr[i]);    // BUG: should be update(i+1, arr[i])
}

// ✅ CORRECT — always shift to 1-indexed
for (int i = 0; i < n; i++) {
    update(i + 1, arr[i]);  // convert 0-indexed i to 1-indexed i+1
}
// And remember: query(r+1) - query(l) for 0-indexed range [l, r]

❌ Mistake 3: Integer Overflow in Large Sum

// ❌ WRONG — tree[] should be long long for large sums
int tree[MAXN];   // overflow if sum > 2^31

// ✅ CORRECT
long long tree[MAXN];

// Also: when counting inversions, inversions can be up to N*(N-1)/2 ≈ 4.5×10^10 for N=3×10^5
// Always use long long for the result counter!
long long inversions = 0;  // ✅ not int!

❌ Mistake 4: Forgetting to Clear BIT Between Test Cases

// ❌ WRONG — in problems with multiple test cases
int T; cin >> T;
while (T--) {
    // forgot to clear tree[]!
    // Old data from previous test case corrupts results
    solve();
}

// ✅ CORRECT — reset before each test case
int T; cin >> T;
while (T--) {
    fill(tree + 1, tree + n + 1, 0LL);  // clear BIT
    solve();
}

3.10.9 Chapter Summary

📋 Formula Quick Reference

Operation	Code	Description
lowbit	`x & (-x)`	Value of lowest set bit of x
Point Update	`for(;i<=n;i+=lowbit(i)) t[i]+=v`	Propagate upward
Prefix Query	`for(;i>0;i-=lowbit(i)) s+=t[i]`	Decompose downward
Range Query	`query(r) - query(l-1)`	Difference formula
Range Update (Diff BIT)	`upd(l,+v); upd(r+1,-v)`	Difference array
Inversion Count	`(i-1) - query(a[i])`	Count when processing each element
Array must be	1-indexed	0-indexed → infinite loop

❓ FAQ

Q1: Both BIT and Segment Tree support prefix sum + point update. Which should I choose?

A: Use BIT whenever possible. BIT code is only 10 lines, has smaller constants (empirically 2-3x faster), and lower error probability. Only choose Segment Tree when you need range min/max (RMQ), range coloring, or more complex range operations. In contests, BIT is the "default weapon", Segment Tree is "heavy artillery".

Q2: Can BIT support Range Minimum Query (RMQ)?

A: Standard BIT cannot support RMQ, because the min operation has no "inverse" (cannot "undo" a merged min value like subtraction). For range min/max, use Segment Tree or Sparse Table. There is a "static BIT for RMQ" technique, but it only works without updates and has limited practical use.

Q3: Can BIT support 2D (2D BIT)?

A: Yes! 2D BIT solves 2D prefix sum + point update problems, with complexity O(log N × log M). The code structure uses two nested loops:
// 2D BIT update
void update2D(int x, int y, long long v) {
    for (int i = x; i <= N; i += lowbit(i))
        for (int j = y; j <= M; j += lowbit(j))
            bit[i][j] += v;
}
Less common in USACO, but occasionally needed for 2D coordinate counting problems.

3.10.10 Practice Problems

🟢 Easy 1: Range Sum Query (Single-point Update)

Given an array of length N, support two operations:

1 i x: Increase A[i] by x
2 l r: Query A[l] + A[l+1] + ... + A[r]

Constraints: N, Q ≤ 10⁵.

Hint: Direct BIT application. Use update(i, x) and query(r) - query(l-1).

🟢 Easy 2: Number of Elements Less Than K

Given N operations, each either inserts an integer (range 1..10⁶) or queries "how many of the currently inserted integers are ≤ K?"

Hint: BIT maintains a frequency array over the value domain. update(v, 1) inserts value v, query(K) is the answer.

🟡 Medium 1: Range Add, Point Query

Given an array of length N (initially all zeros), support two operations:

1 l r x: Add x to every element in A[l..r]
2 i: Query the current value of A[i]

Constraints: N, Q ≤ 3×10⁵.

Hint: Use Difference BIT (Section 3.10.6).

🟡 Medium 2: Counting Inversions (with Coordinate Compression)

Given an array of length N with elements in range 1..10⁹ (possibly repeated). Count the number of inversions.

Constraints: N ≤ 3×10⁵.

Hint: First apply coordinate compression, then use BIT counting (variant of Section 3.10.7). Note equal elements: (i,j) with i<j and A[i]>A[j] (strictly greater) counts as an inversion.

🔴 Hard: Range Add, Range Sum (Double BIT)

Given an array of length N, support two operations:

1 l r x: Add x to every element in A[l..r]
2 l r: Query A[l] + ... + A[r]

Constraints: N, Q ≤ 3×10⁵, elements and x can reach 10⁹.

Hint: Use Dual BIT (Dual BIT method at end of Section 3.10.6). Formula: prefix_sum(r) = B1[r] * r - B2[r], where B1 maintains the difference array and B2 maintains the weighted difference array. Derivation: let D[i] be the difference array, then A[1]+...+A[r] = Σᵢ₌₁ʳ Σⱼ₌₁ⁱ D[j] = Σⱼ₌₁ʳ D[j]*(r-j+1) = (r+1)Σ D[j] - Σ jD[j].

💡 Chapter Connection: BIT and Segment Tree are the two most commonly paired data structures in USACO. BIT handles 80% of scenarios with 1/5 the code of Segment Tree. After mastering BIT, return to Chapter 3.9 to learn Segment Tree lazy propagation—the territory BIT cannot reach.

C++ for Competitive Programming: A USACO Guide