πŸ“– Chapter 3.8 ⏱️ ~55 min read 🎯 Intermediate

Chapter 3.8: Maps & Sets

Maps and sets are the workhorses of frequency counting, lookup, and tracking unique elements. In this chapter, we go deep into their practical use in USACO problems.


3.8.1 map vs unordered_map β€” Choosing Wisely

Visual: Map Internal Structure (BST)

Map Structure

std::map stores key-value pairs in a balanced BST (Red-Black tree). This gives O(log N) for all operations and keeps keys sorted automatically β€” great when you need lower_bound/upper_bound queries. Use unordered_map when you only need O(1) lookups and don't care about order.

Featuremapunordered_map
Underlying structureRed-black treeHash table
Insert/lookup timeO(log n)O(1) average, O(n) worst
Iterates inSorted key orderArbitrary order
Min/Max keyAvailable via .begin()/.rbegin()Not available
Keys must beComparable (has <)Hashable
Use whenYou need sorted keys or find min/maxYou need fastest possible lookup

For most USACO problems, either works fine. Use unordered_map for speed when keys are integers or strings, map when you need ordered iteration.

Example: Frequency Map

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    unordered_map<int, int> freq;
    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        freq[x]++;   // increment count; creates with 0 if not present
    }

    // Find the element with highest frequency
    int maxFreq = 0, maxVal = INT_MIN;
    for (auto &[val, count] : freq) {   // structured binding (C++17)
        if (count > maxFreq || (count == maxFreq && val < maxVal)) {
            maxFreq = count;
            maxVal = val;
        }
    }

    cout << "Most frequent: " << maxVal << " (" << maxFreq << " times)\n";

    return 0;
}

3.8.2 Map Operations β€” Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    map<string, int> scores;

    // Insert
    scores["Alice"] = 95;
    scores["Bob"] = 87;
    scores["Charlie"] = 92;
    scores.insert({"Dave", 78});    // another way
    scores.emplace("Eve", 88);      // most efficient way

    // Lookup
    cout << scores["Alice"] << "\n";  // 95
    // WARNING: scores["Unknown"] creates it with value 0!

    // Safe lookup
    if (scores.count("Frank")) {
        cout << scores["Frank"] << "\n";
    } else {
        cout << "Frank not found\n";
    }

    // Using find() β€” returns iterator
    auto it = scores.find("Bob");
    if (it != scores.end()) {
        cout << it->first << ": " << it->second << "\n";  // Bob: 87
    }

    // Update
    scores["Alice"] += 5;    // Alice now has 100

    // Erase
    scores.erase("Charlie");

    // Iterate in sorted key order (map always gives sorted order)
    for (const auto &[name, score] : scores) {
        cout << name << ": " << score << "\n";
    }
    // Alice: 100
    // Bob: 87
    // Dave: 78
    // Eve: 88

    // Size and empty check
    cout << scores.size() << "\n";   // 4
    cout << scores.empty() << "\n";  // 0 (false)

    // Clear all entries
    scores.clear();

    return 0;
}

3.8.3 Set Operations β€” Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    set<int> s = {5, 3, 8, 1, 9, 2};
    // s = {1, 2, 3, 5, 8, 9} (always sorted!)

    // Insert
    s.insert(4);   // s = {1, 2, 3, 4, 5, 8, 9}
    s.insert(3);   // already there, no change

    // Erase
    s.erase(8);    // s = {1, 2, 3, 4, 5, 9}

    // Lookup
    cout << s.count(3) << "\n";  // 1 (exists)
    cout << s.count(7) << "\n";  // 0 (not found)

    // Iterator-based queries
    auto it = s.lower_bound(4);  // first element >= 4
    cout << *it << "\n";         // 4

    auto it2 = s.upper_bound(4); // first element > 4
    cout << *it2 << "\n";        // 5

    // Min and Max
    cout << *s.begin() << "\n";   // 1 (min)
    cout << *s.rbegin() << "\n";  // 9 (max)

    // Remove minimum
    s.erase(s.begin());   // removes 1
    cout << *s.begin() << "\n";  // 2

    // Iterate
    for (int x : s) cout << x << " ";
    cout << "\n";  // 2 3 4 5 9

    return 0;
}

3.8.4 USACO Problem: Cow IDs

Problem (USACO 2017 February Bronze): Bessie wants to find the N-th smallest number that doesn't appear in a set of "taken" IDs. Given a set of taken IDs and N, find the N-th available ID.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    set<int> taken;
    for (int i = 0; i < n; i++) {
        int x; cin >> x;
        taken.insert(x);
    }

    // For each query q, find the q-th positive integer NOT in taken
    while (q--) {
        int k; cin >> k;

        // Binary search: find smallest x such that x - (# taken values <= x) >= k
        int lo = 1, hi = 2e9;
        while (lo < hi) {
            int mid = lo + (hi - lo) / 2;
            // count of available numbers in [1, mid] = mid - (# taken values <= mid)
            int taken_count = (int)(taken.lower_bound(mid + 1) - taken.begin());
            int available = mid - taken_count;
            if (available >= k) hi = mid;
            else lo = mid + 1;
        }

        cout << lo << "\n";
    }

    return 0;
}

3.8.5 Multiset β€” Sorted Bag with Duplicates

A multiset is like a set, but allows duplicate values:

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> ms;
    ms.insert(3);
    ms.insert(1);
    ms.insert(3);   // duplicate allowed
    ms.insert(5);
    ms.insert(1);

    // ms = {1, 1, 3, 3, 5}

    cout << ms.count(3) << "\n";  // 2 (how many 3s)
    cout << ms.count(2) << "\n";  // 0

    // Remove ONE occurrence of 3
    ms.erase(ms.find(3));  // removes only one 3
    // ms = {1, 1, 3, 5}

    // Remove ALL occurrences of 1
    ms.erase(1);  // removes all 1s
    // ms = {3, 5}

    cout << *ms.begin() << "\n";   // 3 (min)
    cout << *ms.rbegin() << "\n";  // 5 (max)

    return 0;
}

Running Median with Two Multisets

Keep track of the median of a stream of numbers using a max-multiset (lower half) and a min-multiset (upper half):

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> lo;  // max-heap: lower half (use negation or reverse iterator)
    multiset<int> hi;  // min-heap: upper half

    // For simplicity, use two multisets where lo stores values in reverse
    // lo's maximum = lo.rbegin(); hi's minimum = hi.begin()

    int n;
    cin >> n;

    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;

        // Add to appropriate half
        if (lo.empty() || x <= *lo.rbegin()) {
            lo.insert(x);
        } else {
            hi.insert(x);
        }

        // Rebalance: sizes should differ by at most 1
        while (lo.size() > hi.size() + 1) {
            hi.insert(*lo.rbegin());
            lo.erase(lo.find(*lo.rbegin()));
        }
        while (hi.size() > lo.size()) {
            lo.insert(*hi.begin());
            hi.erase(hi.begin());
        }

        // Print median
        if (lo.size() == hi.size()) {
            // Even count: average of two middle values
            double median = (*lo.rbegin() + *hi.begin()) / 2.0;
            cout << fixed << setprecision(1) << median << "\n";
        } else {
            // Odd count: middle value is in lo
            cout << *lo.rbegin() << "\n";
        }
    }

    return 0;
}

3.8.6 Practical Patterns

Pattern 1: Counting Distinct Elements

vector<int> data = {1, 5, 3, 1, 2, 5, 5, 3};
set<int> distinct(data.begin(), data.end());
cout << "Distinct count: " << distinct.size() << "\n";  // 4

Pattern 2: Group by Frequency, Sort by Value

vector<int> nums = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5};
map<int, int> freq;
for (int x : nums) freq[x]++;

// Group values by their frequency
map<int, vector<int>> byFreq;
for (auto &[val, cnt] : freq) {
    byFreq[cnt].push_back(val);
}

// Print in order of frequency
for (auto &[cnt, vals] : byFreq) {
    for (int v : vals) cout << v << " (Γ—" << cnt << ")\n";
}

Pattern 3: Offline Queries with Sorting

Sort queries along with events to process them together in O((N+Q) log N):

// Example: for each query point, count how many events have value <= query point
// Sort both arrays, sweep through with two pointers

⚠️ Common Mistakes in Chapter 3.8

#MistakeWhy It's WrongFix
1map[key] accessing non-existent keyAuto-creates entry with value 0, pollutes dataUse m.count(key) or m.find(key) to check first
2multiset::erase(value) deletes all equal valuesExpected to delete one, deleted allUse ms.erase(ms.find(value)) to delete just one
3Modifying map/set size during iterationIterator invalidated, crash or skipped elementsUse it = m.erase(it) for safe deletion
4unordered_map hacked to degrade to O(N)Adversary constructs hash-collision data, TLESwitch to map or use custom hash function
5Forgetting set doesn't store duplicatessize() doesn't grow after inserting duplicate, count wrongUse multiset when duplicates needed

Chapter Summary

πŸ“Œ Key Takeaways

StructureOrderedDuplicatesKey FeatureWhy It Matters
map<K,V>Yes (sorted)No (unique keys)Key-value mapping, O(log N)Frequency counting, ID→attribute mapping
unordered_map<K,V>NoNoO(1) average lookup5-10x faster than map for large data
set<T>Yes (sorted)NoOrdered unique setDeduplication, range queries (lower_bound)
unordered_set<T>NoNoO(1) membership testJust need to check "seen before?"
multiset<T>Yes (sorted)YesOrdered multisetDynamic median, sliding window

🧩 "Which Container to Use" Quick Reference

NeedRecommended ContainerReason
Count occurrences of each elementmap / unordered_mapfreq[x]++ in one line
Deduplicate and sortsetAuto-dedup + auto-sort
Check if element was seenunordered_setO(1) lookup
Dynamic ordered set + find extremesset / multisetO(1) access to min/max
Need lower_bound / upper_boundset / mapOnly ordered containers support this
Value→index mappingmap / unordered_mapCoordinate compression etc.

❓ FAQ

Q1: What's the difference between map's [] operator and find?

A: m[key] auto-creates a default value (0 for int) when key doesn't exist; m.find(key) only searches, doesn't create. If you just want to check if a key exists, use m.count(key) or m.find(key) != m.end().

Q2: Both multiset and priority_queue can get extremes β€” which to use?

A: priority_queue can only get the max (or min) and delete it, doesn't support deletion by value. multiset supports finding and deleting any value, more flexible. If you only need to repeatedly get the extreme, priority_queue is simpler; if you need to delete specific elements (e.g., removing elements leaving a sliding window), use multiset.

Q3: When can unordered_map be slower than map?

A: Two situations: β‘  When hash collisions are severe (many keys hash to the same bucket), degrades to O(N); β‘‘ In contests, adversaries deliberately construct data to hack unordered_map. Solution: use a custom hash function, or switch to map.

Q4: Is C++17 structured binding auto &[key, val] safe? Can I use it in contests?

A: USACO and most contest platforms support C++17, so for (auto &[key, val] : m) is safe to use. It's cleaner than entry.first/entry.second.

πŸ”— Connections to Later Chapters

  • Chapter 3.3 (Sorting & Searching): coordinate compression often combines with map (value β†’ compressed index)
  • Chapter 3.9 (Segment Trees): ordered set's lower_bound can replace simple segment tree queries
  • Chapters 5.1–5.2 (Graphs): map is commonly used to store adjacency lists for sparse graphs
  • Chapter 4.1 (Greedy): multiset combined with greedy strategies can efficiently maintain dynamic optimal choices
  • The map frequency counting pattern appears throughout the book and is one of the most fundamental tools in competitive programming

Practice Problems

Problem 3.8.1 β€” Two Sum Read N integers and a target T. Find two values in the array that sum to T. Print their indices (1-indexed). (Hint: use a map to store value β†’ index)

Problem 3.8.2 β€” Anagram Groups Read N words. Group them by their sorted-letter form. Print each group on one line, sorted alphabetically.

  • Example: "eat tea tan ate nat bat" β†’ groups: {ate, eat, tea}, {bat}, {nat, tan}

Problem 3.8.3 β€” Interval Overlap Count Read N intervals [L_i, R_i]. For each integer point from 1 to M, count how many intervals contain it. Output the maximum overlap count. (Hint: use difference array, or sort events and sweep with a set)

Problem 3.8.4 β€” Cow Photography (USACO Bronze Inspired) N cows each have a unique ID. Read N lists (each a permutation of IDs). Find the ordering that's consistent with all lists (the "true" order). (Hint: use maps to count pairwise orderings)

Problem 3.8.5 β€” Running Distinct Count Read N integers one by one. After each new integer, print the count of distinct values seen so far. (Hint: maintain an unordered_set; its size is the answer)