Chapter 3.7: Hashing Techniques
📝 Before You Continue: You should know STL containers (Chapter 3.1) and string basics (Chapter 2.3). This chapter covers hashing principles and advanced competitive programming usage.
Hashing is one of the most important "tools" in competitive programming: it turns complex comparison problems into O(1) numeric comparisons. But hashing is also the easiest technique to get "hacked"—this chapter teaches both how to use it well and how to prevent being hacked.
3.7.1 unordered_map vs map: Internals & Performance
Internal Implementation Comparison
| Feature | map | unordered_map |
|---|---|---|
| Internal structure | Red-black tree (balanced BST) | Hash table |
| Lookup time | O(log N) | O(1) avg, O(N) worst |
| Insert time | O(log N) | O(1) avg, O(N) worst |
| Iteration order | Ordered (ascending by key) | Unordered |
| Memory usage | O(N), smaller constant | O(N), larger constant |
| Worst case | O(log N) (stable) | O(N) (hash collision) |
#include <bits/stdc++.h>
using namespace std;
int main() {
// map: ordered, O(log N)
map<int, int> m;
m[3] = 30; m[1] = 10; m[2] = 20;
for (auto [k, v] : m) cout << k << ":" << v << " ";
// output: 1:10 2:20 3:30 ← ordered!
// unordered_map: unordered, O(1) average
unordered_map<int, int> um;
um[3] = 30; um[1] = 10; um[2] = 20;
// iteration order undefined, but lookup is very fast
// performance difference: N=10^6 operations
// map: ~300ms; unordered_map: ~80ms (roughly)
}
When to Choose Which?
- Use
map: need ordered iteration, needlower_bound/upper_bound, extreme key range (high hash collision risk) - Use
unordered_map: pure lookup/insert, key is integer or string, large N (> 10^5)
3.7.2 Anti-Hack: Custom Hash
Problem: unordered_map's default integer hash is essentially hash(x) = x, allowing attackers to construct many hash collisions, degrading operations to O(N) and causing TLE.
On platforms like Codeforces, this is a common hack technique.
Solution: splitmix64 Hash
// Anti-hack custom hasher — uses splitmix64
struct custom_hash {
static uint64_t splitmix64(uint64_t x) {
x += 0x9e3779b97f4a7c15;
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
return x ^ (x >> 31);
}
size_t operator()(uint64_t x) const {
static const uint64_t FIXED_RANDOM =
chrono::steady_clock::now().time_since_epoch().count();
return splitmix64(x + FIXED_RANDOM);
}
};
// Usage:
unordered_map<int, int, custom_hash> safe_map;
unordered_set<int, custom_hash> safe_set;
⚠️ Contest tip: When using
unordered_mapon Codeforces, always addcustom_hash. USACO test data won't deliberately construct hacks, but it's a good habit.
3.7.3 String Hashing (Polynomial Hash)
String hashing maps a string to an integer, turning string comparison into numeric comparison (O(1)).
Core Formula
For string s[0..n-1], define the hash value as:
hash(s) = s[0]·B^(n-1) + s[1]·B^(n-2) + ... + s[n-1]·B^0 (mod M)
where B is the base (typically 131 or 131117) and M is a large prime (typically 10⁹+7 or 10⁹+9).
Prefix Hash + Substring Hash O(1)
// String hashing: O(N) preprocessing, O(1) substring hash
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const ull BASE = 131;
// Use unsigned long long natural overflow (equivalent to mod 2^64)
// Or specify MOD manually:
// const ull MOD = 1e9 + 7;
struct StringHash {
int n;
vector<ull> h, pw;
StringHash(const string& s) : n(s.size()), h(n + 1, 0), pw(n + 1, 1) {
for (int i = 0; i < n; i++) {
h[i + 1] = h[i] * BASE + (s[i] - 'a' + 1); // 1-indexed prefix hash
pw[i + 1] = pw[i] * BASE; // BASE^(i+1)
}
}
// Get hash of substring s[l..r] (0-indexed)
ull get(int l, int r) {
return h[r + 1] - h[l] * pw[r - l + 1]; // ← KEY formula
}
};
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s = "abcabc";
StringHash sh(s);
// Compare if two substrings are equal
// s[0..2] = "abc", s[3..5] = "abc"
cout << (sh.get(0, 2) == sh.get(3, 5) ? "Equal" : "Not Equal") << "\n"; // Equal
// Compare s[0..1] = "ab" vs s[3..4] = "ab"
cout << (sh.get(0, 1) == sh.get(3, 4) ? "Equal" : "Not Equal") << "\n"; // Equal
}
Hash Formula Derivation:
h[r+1] = s[0]*B^r + s[1]*B^(r-1) + ... + s[r]*B^0
h[l] = s[0]*B^(l-1) + ... + s[l-1]*B^0
h[r+1] - h[l] * B^(r-l+1)
= (s[0]*B^r + ... + s[r]*B^0)
- (s[0]*B^r + ... + s[l-1]*B^(r-l+1))
= s[l]*B^(r-l) + s[l+1]*B^(r-l-1) + ... + s[r]*B^0
= hash(s[l..r]) ✓
下图直观展示了前缀哈希数组的构建过程,以及如何用 get(l, r) 公式在 O(1) 内提取任意子串的哈希值:
3.7.4 Double Hashing (Avoiding Collisions)
Single hash (mod M) has collision probability ≈ 1/M. For N substring comparisons, expected collisions ≈ N²/(2M).
The diagram below shows two classic ways to handle collisions — chaining (what unordered_map uses) and linear probing:
- If
M = 10⁹+7, N = 10⁶: collision probability ≈10¹²/(2×10⁹) = 500times! Not safe. - Solution: double hashing, using two different (B, M) pairs simultaneously, collision probability drops to
1/(M₁×M₂) ≈ 10⁻¹⁸.
// Double hashing: two (BASE, MOD) pairs used simultaneously, extremely low collision probability
struct DoubleHash {
static const ull B1 = 131, M1 = 1e9 + 7;
static const ull B2 = 137, M2 = 1e9 + 9;
int n;
vector<ull> h1, h2, pw1, pw2;
DoubleHash(const string& s) : n(s.size()),
h1(n+1,0), h2(n+1,0), pw1(n+1,1), pw2(n+1,1) {
for (int i = 0; i < n; i++) {
ull c = s[i] - 'a' + 1;
h1[i+1] = (h1[i] * B1 + c) % M1;
h2[i+1] = (h2[i] * B2 + c) % M2;
pw1[i+1] = pw1[i] * B1 % M1;
pw2[i+1] = pw2[i] * B2 % M2;
}
}
// Return pair<ull,ull> as the hash "fingerprint" of substring s[l..r]
pair<ull,ull> get(int l, int r) {
ull v1 = (h1[r+1] - h1[l] * pw1[r-l+1] % M1 + M1) % M1;
ull v2 = (h2[r+1] - h2[l] * pw2[r-l+1] % M2 + M2) % M2;
return {v1, v2};
}
};
3.7.5 Application: String Matching (Rabin-Karp)
// Rabin-Karp string matching: find all occurrences of pattern P in text T
// Time: O(N+M) average, O(NM) worst case (but extremely fast in practice)
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
vector<int> rabinKarp(const string& T, const string& P) {
int n = T.size(), m = P.size();
if (m > n) return {};
const ull BASE = 131;
ull patHash = 0, textHash = 0, pow_m = 1;
// Compute BASE^m (natural overflow)
for (int i = 0; i < m - 1; i++) pow_m *= BASE;
// Initial hash
for (int i = 0; i < m; i++) {
patHash = patHash * BASE + P[i];
textHash = textHash * BASE + T[i];
}
vector<int> result;
for (int i = 0; i + m <= n; i++) {
if (textHash == patHash) {
// Verify when hashes match (avoid false positives from collision)
if (T.substr(i, m) == P) result.push_back(i);
}
if (i + m < n) {
// Rolling hash: remove leftmost char, add rightmost char
textHash = textHash - T[i] * pow_m; // remove leftmost
textHash = textHash * BASE + T[i + m]; // add rightmost
}
}
return result;
}
3.7.6 Application: Longest Common Substring
Problem: Given strings S and T, find the length of their longest common substring.
Approach: Binary search on the answer (length L of longest common substring), then use a hash set to check if any substring of length L appears in both strings.
// Longest common substring: O(N log N) — binary search + hashing
int longestCommonSubstring(const string& S, const string& T) {
StringHash hs(S), ht(T);
int ns = S.size(), nt = T.size();
auto check = [&](int len) -> bool {
unordered_set<ull> setS;
for (int i = 0; i + len <= ns; i++)
setS.insert(hs.get(i, i + len - 1));
for (int j = 0; j + len <= nt; j++)
if (setS.count(ht.get(j, j + len - 1)))
return true;
return false;
};
int lo = 0, hi = min(ns, nt);
while (lo < hi) {
int mid = (lo + hi + 1) / 2;
if (check(mid)) lo = mid;
else hi = mid - 1;
}
return lo;
}
⚠️ Common Mistakes
-
Bad modulus choice: Don't use numbers other than
10⁹+7; especially avoid non-prime moduli (high collision rate). Recommended:10⁹+7and10⁹+9as a double hash pair. -
unordered_maphacked: On platforms like Codeforces, the default hash can be attacked. Always usecustom_hash. -
Substring hash subtraction underflow:
h[r+1] - h[l] * pw[r-l+1]may be negative (with signed integers). Useunsigned long longnatural overflow, or(... % M + M) % Mto ensure non-negative. -
BASE doesn't match character set: For lowercase letters (26 types), BASE must be > 26 (typically 31 or 131). For all ASCII characters (128 types), BASE must be > 128 (use 131 or 137).
-
Hash collision causing WA: Even with double hashing, collisions are theoretically possible. If uncertain, add direct string comparison when hashes match.
Chapter Summary
📌 Core Comparison Table
| Tool | Time Complexity | Use Case |
|---|---|---|
map<K,V> | O(log N) | Need ordering, need range queries |
unordered_map<K,V> | O(1) amortized | Only need lookup/insert, key order not required |
| String hash (single) | O(N) preprocess, O(1) query | Substring comparison, pattern matching |
| String hash (double) | O(N) preprocess, O(1) query | High-precision scenarios, avoid collisions |
❓ FAQ
Q1: Which is better — unsigned long long natural overflow double hash or manual mod hash?
A:
ullnatural overflow (equivalent to mod 2⁶⁴) is simpler to code, and 2⁶⁴ is large enough that single-hash collision probability is already very low (≈ 10⁻¹⁸). But crafted data can deliberately cause collisions — double hashing is safer then. Both work in contests;ullis more common.
Q2: What can string hashing do that KMP cannot?
A: String hashing excels at multi-string comparison (e.g., finding longest common substring, palindromic substrings), while KMP only excels at single-pattern matching. Hash + binary search can solve many string problems in O(N log N) that would require more complex KMP implementations.
Q3: Should I use BASE 31 or 131?
A: Use 31 for lowercase letters only (a prime less than 37, avoids too-small hash space). Use 131 for mixed case or digits (a prime greater than 128, covers full ASCII). The key is: BASE must be larger than the character set size and ideally a prime.
Practice Problems
Problem 3.7.1 — Two Sum with Hash 🟢 Easy Given array A, find if any two distinct elements sum to target X.
Hint
For each A[i], check if (X - A[i]) is already in the hash set.✅ Full Solution
#include <bits/stdc++.h>
using namespace std;
int main() {
int n, X; cin >> n >> X;
vector<int> a(n); for (int& x : a) cin >> x;
unordered_set<int> seen;
for (int x : a) {
if (seen.count(X - x)) { cout << "YES\n"; return 0; }
seen.insert(x);
}
cout << "NO\n";
}
Complexity: O(N) average.
Problem 3.7.2 — Substring Check 🟢 Easy Given string T and pattern P, print all starting indices where P appears in T.
Hint
Rolling hash: compute hash of each |P|-length window of T in O(1) using prefix hashes.✅ Full Solution (Rolling Hash)
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const ull BASE = 131, MOD = (1ULL<<61)-1;
int main() {
string T, P; cin >> T >> P;
int n = T.size(), m = P.size();
// Build prefix hash of T
vector<ull> h(n+1,0), pw(n+1,1);
for(int i=0;i<n;i++) { h[i+1]=(h[i]*BASE+T[i])%MOD; pw[i+1]=pw[i]*BASE%MOD; }
// Hash of P
ull hp=0; for(char c:P) hp=(hp*BASE+c)%MOD;
// Check each window
for(int i=0;i+m<=n;i++){
ull wh=(h[i+m]-h[i]*pw[m]%MOD+MOD*2)%MOD;
if(wh==hp) cout<<i<<"\n";
}
}
Complexity: O(N + M).
Problem 3.7.3 — Longest Palindromic Substring 🟡 Medium Find the length of the longest palindromic substring.
Hint
Binary search on length. A substring s[l..r] is a palindrome iff hash(s[l..r]) == hash(rev(s)[n-1-r..n-1-l]).✅ Full Solution (Hash + Binary Search)
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const ull BASE=131;
struct Hasher {
vector<ull> h,pw;
Hasher(const string&s){
int n=s.size(); h.resize(n+1,0); pw.resize(n+1,1);
for(int i=0;i<n;i++){h[i+1]=h[i]*BASE+s[i];pw[i+1]=pw[i]*BASE;}
}
ull get(int l,int r){return h[r+1]-h[l]*pw[r-l+1];} // [l,r] 0-indexed
};
int main(){
string s; cin>>s;
string r(s.rbegin(),s.rend());
Hasher hs(s),hr(r);
int n=s.size(), ans=1;
// Check if palindrome of length len exists
auto check=[&](int len)->bool{
for(int i=0;i+len<=n;i++){
int j=i+len-1;
// In reversed string, s[i..j] corresponds to r[n-1-j..n-1-i]
if(hs.get(i,j)==hr.get(n-1-j,n-1-i)) return true;
}
return false;
};
// Binary search on length
int lo=1,hi=n;
while(lo<=hi){int mid=(lo+hi)/2;if(check(mid)){ans=mid;lo=mid+1;}else hi=mid-1;}
cout<<ans<<"\n";
}
Complexity: O(N log N).
Problem 3.7.4 — Count Distinct Substrings 🟡 Medium Given string S of length N (N ≤ 5000), count the number of distinct substrings.
Hint
Insert all O(N²) substring hashes into an unordered_set. Use double hash to avoid collisions.✅ Full Solution
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
int main(){
string s; cin>>s;
int n=s.size();
const ull B1=131,B2=137,M1=1e9+7,M2=1e9+9;
unordered_set<ull> seen;
for(int i=0;i<n;i++){
ull h1=0,h2=0;
for(int j=i;j<n;j++){
h1=(h1*B1+s[j])%M1;
h2=(h2*B2+s[j])%M2;
seen.insert(h1*M2+h2); // combine two hashes
}
}
cout<<seen.size()<<"\n";
}
Complexity: O(N²) time and space (for N ≤ 5000).
Problem 3.7.5 — String Periods 🔴 Hard Find the smallest period of string S (smallest k dividing n such that S = repeat of S[0..k-1]).
Hint
For each divisor k of n, verify s[0..k-1] repeated = s using hash comparison O(n/k) per check.✅ Full Solution
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const ull BASE=131,MOD=1e9+7;
int main(){
string s; cin>>s;
int n=s.size();
// build prefix hash
vector<ull> h(n+1,0),pw(n+1,1);
for(int i=0;i<n;i++){h[i+1]=(h[i]*BASE+s[i])%MOD;pw[i+1]=pw[i]*BASE%MOD;}
auto getHash=[&](int l,int r){return (h[r+1]-h[l]*pw[r-l+1]%MOD+MOD*2)%MOD;};
// find all divisors of n, try smallest first
vector<int> divs;
for(int i=1;i*i<=n;i++) if(n%i==0){divs.push_back(i);if(i!=n/i)divs.push_back(n/i);}
sort(divs.begin(),divs.end());
for(int k:divs){
bool ok=true;
for(int i=0;i+k<=n&&ok;i+=k)
if(getHash(i,i+k-1)!=getHash(0,k-1)) ok=false;
if(ok){cout<<k<<"\n";return 0;}
}
}
Complexity: O(d(N) × N) ≈ O(N log N) for typical inputs.