📖 Chapter 3.8 ⏱️ ~55 min read 🎯 Intermediate

Chapter 3.8: Maps & Sets

Maps and sets are the workhorses of frequency counting, lookup, and tracking unique elements. In this chapter, we go deep into their practical use in USACO problems.

📝 Before You Continue: You should be comfortable with arrays and basic C++ STL (Chapter 2.4). Understanding hash tables conceptually (Chapter 3.7) will help, but is not strictly required — map and set are tree-based and work without hashing.

3.8.1 `map` vs `unordered_map` — Choosing Wisely

Visual: Map Internal Structure (BST)

Map Structure

std::map stores key-value pairs in a balanced BST (Red-Black tree). This gives O(log N) for all operations and keeps keys sorted automatically — great when you need lower_bound/upper_bound queries. Use unordered_map when you only need O(1) lookups and don't care about order.

The key structural difference between map and unordered_map:

map vs unordered_map

Feature	`map`	`unordered_map`
Underlying structure	Red-black tree	Hash table
Insert/lookup time	`O(log n)`	`O(1)` average, `O(n)` worst
Iterates in	Sorted key order	Arbitrary order
Min/Max key	Available via `.begin()`/`.rbegin()`	Not available
Keys must be	Comparable (has `<`)	Hashable
Use when	You need sorted keys or find min/max	You need fastest possible lookup

For most USACO problems, either works fine. Use unordered_map for speed when keys are integers or strings, map when you need ordered iteration.

Example: Frequency Map

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    unordered_map<int, int> freq;
    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        freq[x]++;   // increment count; creates with 0 if not present
    }

    // Find the element with highest frequency
    int maxFreq = 0, maxVal = INT_MIN;
    for (auto &[val, count] : freq) {   // structured binding (C++17)
        if (count > maxFreq || (count == maxFreq && val < maxVal)) {
            maxFreq = count;
            maxVal = val;
        }
    }

    cout << "Most frequent: " << maxVal << " (" << maxFreq << " times)\n";

    return 0;
}

3.8.2 Map Operations — Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    map<string, int> scores;

    // Insert
    scores["Alice"] = 95;
    scores["Bob"] = 87;
    scores["Charlie"] = 92;
    scores.insert({"Dave", 78});    // another way
    scores.emplace("Eve", 88);      // most efficient way

    // Lookup
    cout << scores["Alice"] << "\n";  // 95
    // WARNING: scores["Unknown"] creates it with value 0!

    // Safe lookup
    if (scores.count("Frank")) {
        cout << scores["Frank"] << "\n";
    } else {
        cout << "Frank not found\n";
    }

    // Using find() — returns iterator
    auto it = scores.find("Bob");
    if (it != scores.end()) {
        cout << it->first << ": " << it->second << "\n";  // Bob: 87
    }

    // Update
    scores["Alice"] += 5;    // Alice now has 100

    // Erase
    scores.erase("Charlie");

    // Iterate in sorted key order (map always gives sorted order)
    for (const auto &[name, score] : scores) {
        cout << name << ": " << score << "\n";
    }
    // Alice: 100
    // Bob: 87
    // Dave: 78
    // Eve: 88

    // Size and empty check
    cout << scores.size() << "\n";   // 4
    cout << scores.empty() << "\n";  // 0 (false)

    // Clear all entries
    scores.clear();

    return 0;
}

3.8.3 Set Operations — Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    set<int> s = {5, 3, 8, 1, 9, 2};
    // s = {1, 2, 3, 5, 8, 9} (always sorted!)

    // Insert
    s.insert(4);   // s = {1, 2, 3, 4, 5, 8, 9}
    s.insert(3);   // already there, no change

    // Erase
    s.erase(8);    // s = {1, 2, 3, 4, 5, 9}

    // Lookup
    cout << s.count(3) << "\n";  // 1 (exists)
    cout << s.count(7) << "\n";  // 0 (not found)

    // Iterator-based queries
    auto it = s.lower_bound(4);  // first element >= 4
    cout << *it << "\n";         // 4

    auto it2 = s.upper_bound(4); // first element > 4
    cout << *it2 << "\n";        // 5

    // Min and Max
    cout << *s.begin() << "\n";   // 1 (min)
    cout << *s.rbegin() << "\n";  // 9 (max)

    // Remove minimum
    s.erase(s.begin());   // removes 1
    cout << *s.begin() << "\n";  // 2

    // Iterate
    for (int x : s) cout << x << " ";
    cout << "\n";  // 2 3 4 5 9

    return 0;
}

3.8.4 USACO Problem: Cow IDs

Problem (USACO 2017 February Bronze): Bessie wants to find the N-th smallest number that doesn't appear in a set of "taken" IDs. Given a set of taken IDs and N, find the N-th available ID.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    set<int> taken;
    for (int i = 0; i < n; i++) {
        int x; cin >> x;
        taken.insert(x);
    }

    // For each query q, find the q-th positive integer NOT in taken
    while (q--) {
        int k; cin >> k;

        // Binary search: find smallest x such that x - (# taken values <= x) >= k
        int lo = 1, hi = 2e9;
        while (lo < hi) {
            int mid = lo + (hi - lo) / 2;
            // count of available numbers in [1, mid] = mid - (# taken values <= mid)
            int taken_count = (int)(taken.lower_bound(mid + 1) - taken.begin());
            int available = mid - taken_count;
            if (available >= k) hi = mid;
            else lo = mid + 1;
        }

        cout << lo << "\n";
    }

    return 0;
}

3.8.5 Multiset — Sorted Bag with Duplicates

A multiset is like a set, but allows duplicate values:

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> ms;
    ms.insert(3);
    ms.insert(1);
    ms.insert(3);   // duplicate allowed
    ms.insert(5);
    ms.insert(1);

    // ms = {1, 1, 3, 3, 5}

    cout << ms.count(3) << "\n";  // 2 (how many 3s)
    cout << ms.count(2) << "\n";  // 0

    // Remove ONE occurrence of 3
    ms.erase(ms.find(3));  // removes only one 3
    // ms = {1, 1, 3, 5}

    // Remove ALL occurrences of 1
    ms.erase(1);  // removes all 1s
    // ms = {3, 5}

    cout << *ms.begin() << "\n";   // 3 (min)
    cout << *ms.rbegin() << "\n";  // 5 (max)

    return 0;
}

Running Median with Two Multisets

Keep track of the median of a stream of numbers using a max-multiset (lower half) and a min-multiset (upper half):

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> lo;  // max-heap: lower half (use negation or reverse iterator)
    multiset<int> hi;  // min-heap: upper half

    // For simplicity, use two multisets where lo stores values in reverse
    // lo's maximum = lo.rbegin(); hi's minimum = hi.begin()

    int n;
    cin >> n;

    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;

        // Add to appropriate half
        if (lo.empty() || x <= *lo.rbegin()) {
            lo.insert(x);
        } else {
            hi.insert(x);
        }

        // Rebalance: sizes should differ by at most 1
        while (lo.size() > hi.size() + 1) {
            hi.insert(*lo.rbegin());
            lo.erase(lo.find(*lo.rbegin()));
        }
        while (hi.size() > lo.size()) {
            lo.insert(*hi.begin());
            hi.erase(hi.begin());
        }

        // Print median
        if (lo.size() == hi.size()) {
            // Even count: average of two middle values
            double median = (*lo.rbegin() + *hi.begin()) / 2.0;
            cout << fixed << setprecision(1) << median << "\n";
        } else {
            // Odd count: middle value is in lo
            cout << *lo.rbegin() << "\n";
        }
    }

    return 0;
}

3.8.6 Practical Patterns

Pattern 1: Counting Distinct Elements

vector<int> data = {1, 5, 3, 1, 2, 5, 5, 3};
set<int> distinct(data.begin(), data.end());
cout << "Distinct count: " << distinct.size() << "\n";  // 4

Pattern 2: Group by Frequency, Sort by Value

vector<int> nums = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5};
map<int, int> freq;
for (int x : nums) freq[x]++;

// Group values by their frequency
map<int, vector<int>> byFreq;
for (auto &[val, cnt] : freq) {
    byFreq[cnt].push_back(val);
}

// Print in order of frequency
for (auto &[cnt, vals] : byFreq) {
    for (int v : vals) cout << v << " (×" << cnt << ")\n";
}

Pattern 3: Offline Queries with Sorting

Sort queries along with events to process them together in O((N+Q) log N):

// Example: for each query point, count how many events have value <= query point
// Sort both arrays, sweep through with two pointers

⚠️ Common Mistakes in Chapter 3.8

#	Mistake	Why It's Wrong	Fix
1	`map[key]` accessing non-existent key	Auto-creates entry with value 0, pollutes data	Use `m.count(key)` or `m.find(key)` to check first
2	`multiset::erase(value)` deletes all equal values	Expected to delete one, deleted all	Use `ms.erase(ms.find(value))` to delete just one
3	Modifying map/set size during iteration	Iterator invalidated, crash or skipped elements	Use `it = m.erase(it)` for safe deletion
4	`unordered_map` hacked to degrade to `O(N)`	Adversary constructs hash-collision data, TLE	Switch to `map` or use custom hash function
5	Forgetting `set` doesn't store duplicates	`size()` doesn't grow after inserting duplicate, count wrong	Use `multiset` when duplicates needed

Chapter Summary

📌 Key Takeaways

Structure	Ordered	Duplicates	Key Feature	Why It Matters
`map<K,V>`	Yes (sorted)	No (unique keys)	Key-value mapping, `O(log N)`	Frequency counting, ID→attribute mapping
`unordered_map<K,V>`	No	No	`O(1)` average lookup	5-10x faster than map for large data
`set<T>`	Yes (sorted)	No	Ordered unique set	Deduplication, range queries (`lower_bound`)
`unordered_set<T>`	No	No	`O(1)` membership test	Just need to check "seen before?"
`multiset<T>`	Yes (sorted)	Yes	Ordered multiset	Dynamic median, sliding window

🧩 "Which Container to Use" Quick Reference

Need	Recommended Container	Reason
Count occurrences of each element	`map` / `unordered_map`	`freq[x]++` in one line
Deduplicate and sort	`set`	Auto-dedup + auto-sort
Check if element was seen	`unordered_set`	`O(1)` lookup
Dynamic ordered set + find extremes	`set` / `multiset`	`O(1)` access to min/max
Need `lower_bound` / `upper_bound`	`set` / `map`	Only ordered containers support this
Value→index mapping	`map` / `unordered_map`	Coordinate compression etc.

❓ FAQ

Q1: What's the difference between map's [] operator and find?

A: m[key] auto-creates a default value (0 for int) when key doesn't exist; m.find(key) only searches, doesn't create. If you just want to check if a key exists, use m.count(key) or m.find(key) != m.end().

Q2: Both multiset and priority_queue can get extremes — which to use?

A: priority_queue can only get the max (or min) and delete it, doesn't support deletion by value. multiset supports finding and deleting any value, more flexible. If you only need to repeatedly get the extreme, priority_queue is simpler; if you need to delete specific elements (e.g., removing elements leaving a sliding window), use multiset.

Q3: When can unordered_map be slower than map?

A: Two situations: ① When hash collisions are severe (many keys hash to the same bucket), degrades to O(N); ② In contests, adversaries deliberately construct data to hack unordered_map. Solution: use a custom hash function, or switch to map.

Q4: Is C++17 structured binding auto &[key, val] safe? Can I use it in contests?

A: USACO and most contest platforms support C++17, so for (auto &[key, val] : m) is safe to use. It's cleaner than entry.first/entry.second.

🔗 Connections to Later Chapters

Chapter 3.3 (Sorting & Searching): coordinate compression often combines with map (value → compressed index)
Chapter 3.9 (Segment Trees): ordered set's lower_bound can replace simple segment tree queries
Chapters 5.1–5.2 (Graphs): map is commonly used to store adjacency lists for sparse graphs
Chapter 4.1 (Greedy): multiset combined with greedy strategies can efficiently maintain dynamic optimal choices
The map frequency counting pattern appears throughout the book and is one of the most fundamental tools in competitive programming

Practice Problems

Problem 3.8.1 — Two Sum 🟢 Easy Read N integers and a target T. Find two values that sum to T. Print their indices (1-indexed).

✅ Full Solution

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, T; cin >> n >> T;
    map<int, int> seen;  // value → index
    for (int i = 1; i <= n; i++) {
        int x; cin >> x;
        if (seen.count(T - x)) {
            cout << seen[T - x] << " " << i << "\n";
            return 0;
        }
        seen[x] = i;
    }
    cout << "no solution\n";
}

Complexity: O(N log N) with map, O(N) with unordered_map.

Problem 3.8.2 — Anagram Groups 🟡 Medium Group N words by their sorted-letter form.

✅ Full Solution

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n; cin >> n;
    map<string, vector<string>> groups;
    for (int i = 0; i < n; i++) {
        string w; cin >> w;
        string key = w; sort(key.begin(), key.end());
        groups[key].push_back(w);
    }
    for (auto& [key, words] : groups) {
        sort(words.begin(), words.end());
        for (const string& w : words) cout << w << " ";
        cout << "\n";
    }
}

Complexity: O(N × K log K) where K = average word length.

Problem 3.8.3 — Interval Overlap Count 🟡 Medium Count maximum overlap of N intervals over points 1..M.

✅ Full Solution

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, M; cin >> n >> M;
    vector<int> diff(M + 2, 0);
    for (int i = 0; i < n; i++) {
        int l, r; cin >> l >> r;
        diff[l]++; diff[r+1]--;  // difference array
    }
    int cur = 0, ans = 0;
    for (int i = 1; i <= M; i++) {
        cur += diff[i];
        ans = max(ans, cur);
    }
    cout << ans << "\n";
}

Complexity: O(N + M).

Problem 3.8.4 — Cow Photography 🔴 Hard Find ordering consistent with all N lists (each a permutation of IDs).

✅ Full Solution

Core Idea: For each pair (a, b), count in how many lists a appears before b. If a appears before b in more than half the lists, a comes before b in the true order. Sort using this pairwise comparison.

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, k; cin >> n >> k;
    // pos[list][cow] = position of cow in list
    vector<vector<int>> pos(k, vector<int>(n+1));
    for (int i = 0; i < k; i++)
        for (int j = 0; j < n; j++) {
            int x; cin >> x; pos[i][x] = j;
        }
    // count[a][b] = #lists where a before b
    // For large N, use the "majority" approach:
    // Compare each pair in O(k) — feasible for small N
    vector<int> cows(n); iota(cows.begin(), cows.end(), 1);
    sort(cows.begin(), cows.end(), [&](int a, int b){
        int before = 0;
        for (int i = 0; i < k; i++) before += (pos[i][a] < pos[i][b]);
        return before > k / 2;
    });
    for (int c : cows) cout << c << "\n";
}

Problem 3.8.5 — Running Distinct Count 🟢 Easy After each new integer, print count of distinct values seen.

✅ Full Solution

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n; cin >> n;
    unordered_set<int> seen;
    for (int i = 0; i < n; i++) {
        int x; cin >> x;
        seen.insert(x);
        cout << seen.size() << "\n";
    }
}

Complexity: O(N) average.

C++ for Competitive Programming: A USACO Guide