📖 第 3.12 章：字符串算法

⏱ 预计阅读时间：60 分钟 | 难度：🟡 中等

前置条件

在学习本章之前，请确保你已掌握：

数组的基本操作（第 3.1 章）
string 类型的使用（第 2.3 章）

🎯 学习目标

学完本章后，你将能够：

理解字符串匹配的朴素算法并分析其效率瓶颈
掌握 KMP 算法的核心思想——「前缀函数」的构建
用 KMP 在 O(N + M) 时间内解决字符串匹配问题
理解 Trie 树（字典树）的结构与操作
运用 Trie 树高效检索字符串集合

3.12.1 为什么需要专门的字符串算法？

问题引入

给定一段文本（长度 N）和一个模式串（长度 M），请找出模式串在文本中所有出现的位置。

文本：  "ababcabcabababd"
模式：  "abab"
结果：  位置 0, 8, 10

这是字符串匹配问题，也是竞赛中最常见的字符串题型之一。

朴素算法的问题

最直觉的做法——枚举文本的每个起始位置，逐字符比较：

📄 最直觉的做法——枚举文本的每个起始位置，逐字符比较：

// 朴素字符串匹配 — O(N * M)
vector<int> naive_search(string text, string pattern) {
    int n = text.size(), m = pattern.size();
    vector<int> result;
    for (int i = 0; i <= n - m; i++) {
        bool match = true;
        for (int j = 0; j < m; j++) {
            if (text[i + j] != pattern[j]) {
                match = false;
                break;
            }
        }
        if (match) result.push_back(i);
    }
    return result;
}

最坏情况举例：

文本：  "aaaaaa...a"（N 个 a）
模式：  "aaa...ab"（M-1 个 a + 1 个 b）
每次匹配都要比较 M 次才发现不匹配
总操作数：N * M

当 N = M = 10^5 时，这就是 10^10 次操作，根本无法通过竞赛题目。

KMP 的核心洞察

朴素算法的浪费在于：每次失配后，从头重新开始匹配，丢弃了已经比较过的信息。

KMP 的思想是：失配时，利用已经匹配的部分跳回到合适位置，而不是从头开始。

3.12.2 前缀函数（π 数组）

KMP 的核心是「前缀函数」，也叫「next 数组」或「失配函数」。

定义

对于字符串 s，前缀函数 π[i] 定义为：

子串 s[0..i] 中，最长的相等真前缀与真后缀的长度。

「真前缀」：不包含整个字符串的前缀。
「真后缀」：不包含整个字符串的后缀。

直觉理解

字符串：  a  b  c  a  b  c  d
下标：    0  1  2  3  4  5  6
π 值：    0  0  0  1  2  3  0

逐个分析：

π[0] = 0：单个字符 "a"，无真前后缀，规定为 0
π[1] = 0：字符串 "ab"，前缀 {a}，后缀 {b}，没有相等的，π = 0
π[2] = 0：字符串 "abc"，前缀 {a, ab}，后缀 {c, bc}，没有相等的，π = 0
π[3] = 1：字符串 "abca"，前缀 {a, ab, abc}，后缀 {a, ca, bca}，相等的最长为 "a"，π = 1
π[4] = 2：字符串 "abcab"，最长相等真前后缀为 "ab"，π = 2
π[5] = 3：字符串 "abcabc"，最长相等真前后缀为 "abc"，π = 3
π[6] = 0：字符串 "abcabcd"，末尾 d 打破了规律，π = 0

π 值的用途

π[i] = k 意味着：如果当前字符（位置 i+1）不匹配，
我们可以安全地退回到模式串的第 k 个位置继续尝试，
因为前 k 个字符肯定已经匹配了（它们是当前已匹配部分的前缀 = 后缀）。

3.12.3 前缀函数的高效构建

朴素方法：O(N²) 或 O(N³)

对每个位置枚举所有可能的长度，逐字符比较，效率很低。

KMP 的关键观察

观察 1：相邻的 π 值最多增加 1。
即 π[i+1] ≤ π[i] + 1。

为什么？若 π[i+1] > π[i] + 1，说明存在更长的相等前后缀，那 π[i] 就不是最大的了，矛盾。

观察 2：失配时如何跳转？

若 s[π[i]] == s[i+1]，则 π[i+1] = π[i] + 1（前缀直接延长）。
若不等，利用 π[π[i]-1] 跳转（尝试更短的前缀），直到找到匹配或退到 0。

图解失配跳转

模式串：  a b c a b d
           ↑     ↑
           已匹配到位置4（"abcab"），π[4]=2
           现在第5个字符 d ≠ c（s[2]）
           
           跳转：j = π[4] = 2，尝试 s[2]='c' vs 'd' → 还是不匹配
           跳转：j = π[1] = 0，尝试 s[0]='a' vs 'd' → 还是不匹配
           j = 0 且失配，π[5] = 0

最终算法实现

📄 查看代码：最终算法实现

#include <bits/stdc++.h>
using namespace std;

// 构建前缀函数（KMP 的核心）
// 时间复杂度：O(N)
vector<int> prefix_function(const string& s) {
    int n = (int)s.length();
    vector<int> pi(n, 0);   // pi[0] = 0 是规定
    
    for (int i = 1; i < n; i++) {
        // j 从上一个 π 值开始尝试
        int j = pi[i - 1];
        
        // 失配时，利用已算出的 π 值回退（关键步骤！）
        while (j > 0 && s[i] != s[j])
            j = pi[j - 1];     // 跳到下一个候选长度
        
        // 匹配则长度加一
        if (s[i] == s[j])
            j++;
        
        pi[i] = j;
    }
    return pi;
}

为什么时间复杂度是 O(N)？

关键在于 j 的变化：

j 每次最多加 1（通过 j++）
j 每次至少减 1（通过 j = pi[j-1]）
j 的总增加量 ≤ N，所以总减少量也 ≤ N
while 循环的总执行次数 ≤ N

整个循环的总操作数是 O(N)！

3.12.4 KMP 字符串匹配

核心技巧：拼接字符串

将 pattern + '#' + text 拼在一起，计算整体的前缀函数：

pattern = "abab"，长度 4
text    = "ababcababd"，长度 9

拼接后：  a b a b # a b a b c a b a b d
下标：    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
                              ↑ 1 ↑ 1 ↑ 1
π 值：    0 0 1 2 0 1 2 3 4 0 1 2 3 4 0
                              ↑         ↑
         当 π[i] = 4（= len(pattern)），说明找到匹配！

分隔符 # 的作用是防止模式串的后缀与文本的前缀在拼接处意外匹配。

完整实现

📄 查看代码：完整实现

// KMP 字符串匹配
// 返回 pattern 在 text 中所有匹配的起始下标（0-indexed）
// 时间复杂度：O(N + M)，N = len(text)，M = len(pattern)
vector<int> kmp_search(const string& text, const string& pattern) {
    if (pattern.empty()) return {};
    
    // 第一步：拼接，用不在字符集中的分隔符
    string combined = pattern + '#' + text;
    int m = pattern.size(), n = text.size();
    
    // 第二步：计算整体前缀函数
    vector<int> pi = prefix_function(combined);
    
    // 第三步：收集匹配位置
    vector<int> result;
    for (int i = m + 1; i <= m + n; i++) {
        if (pi[i] == m) {
            // i 在拼接串中的位置，换算回 text 中的起始位置
            result.push_back(i - 2 * m);
        }
    }
    return result;
}

int main() {
    string text    = "ababcabcabababd";
    string pattern = "abab";
    
    auto pos = kmp_search(text, pattern);
    cout << "模式串 \"" << pattern << "\" 出现在位置：";
    for (int p : pos) cout << p << " ";
    // 输出：0 8 10
    cout << endl;
    
    // 验证
    for (int p : pos) {
        cout << "  text[" << p << ".." << p+pattern.size()-1 << "] = "
             << text.substr(p, pattern.size()) << endl;
    }
    return 0;
}

详细追踪示例

📄 查看代码：详细追踪示例

text = "ababcabab", pattern = "abab"（长度4）

拼接串：  a b a b # a b a b c a b a b
π 值：    0 0 1 2 0 1 2 3 4 0 1 2 3 4
                         ↑           ↑
                       i=8          i=13
                    π[8]=4=m ✓    π[13]=4=m ✓

匹配位置（text 中 0-indexed）：
  i=8：  8 - 2*4 = 0  → text[0..3] = "abab" ✓
  i=13： 13 - 2*4 = 5 → text[5..8] = "abab" ✓

3.12.5 KMP 的进阶应用

应用 1：求字符串的最短周期

📄 查看代码：应用 1：求字符串的最短周期

// 若字符串 s 能被前 k 个字符重复构成，则 k 是周期
// 最短周期 = n - π[n-1]
// 条件：n 能被 (n - π[n-1]) 整除

int min_period(const string& s) {
    int n = s.size();
    auto pi = prefix_function(s);
    int period = n - pi[n - 1];
    if (n % period == 0) return period;
    return n;  // 无法压缩，周期为 n
}

// 示例：
// "abcabc" → π[5]=3，period = 6-3 = 3，6%3=0 ✓，最短周期 3
// "abcab"  → π[4]=2，period = 5-2 = 3，5%3≠0，最短周期 5（整串）

应用 2：统计每个前缀的出现次数

📄 查看代码：应用 2：统计每个前缀的出现次数

// 统计模式串的每个前缀在整个串中出现的次数
vector<int> count_prefixes(const string& s) {
    int n = s.size();
    auto pi = prefix_function(s);
    vector<int> cnt(n + 1, 0);
    
    for (int i = 0; i < n; i++) cnt[pi[i]]++;    // 每个位置贡献
    for (int i = n - 1; i > 0; i--) cnt[pi[i-1]] += cnt[i]; // 传播
    for (int i = 0; i <= n; i++) cnt[i]++;        // 加上自身
    
    return cnt;  // cnt[k] = 长度为 k 的前缀在 s 中出现的次数
}

算法对比

算法	时间复杂度	空间复杂度	适用场景
朴素匹配	O(N × M)	O(1)	小数据，代码简单
KMP	O(N + M)	O(M)	单模式匹配（推荐）
Rabin-Karp（字符串哈希）	O(N + M) 平均	O(1)	多模式或哈希方便的场景
AC 自动机	O(N + ΣM)	O(ΣM)	多模式同时匹配

3.12.6 Trie 树（字典树）

什么是 Trie？

Trie（发音：try）是一种树形数据结构，用来存储和查询字符串集合。

核心思想：用从根节点到某节点的路径来表示一个字符串。每条边对应一个字符，相同前缀的字符串共享路径。

可视化

存入字符串集合 {"cat", "car", "bat", "bar", "can"}：

📄 存入字符串集合 {"cat", "car", "bat", "bar", "can"}：

        根节点
       /      \
      c         b
     / \       / \
    a   ?     a   ?
   /|\        |
  t r n       r  t（"bat"和"bar"共享"ba"前缀）
             / \
            ★   ★
         "bar" "bat"

※ ★ 表示在该节点结尾的字符串存在（"end" 标记）

同一前缀（如 "ca"）只存储一次，节省空间。

Trie 的数组实现

📄 查看代码：Trie 的数组实现

#include <bits/stdc++.h>
using namespace std;

// Trie 树（数组版，只支持小写字母）
const int MAXN = 500005;  // 最大节点数 = 总字符数

struct Trie {
    int ch[MAXN][26];  // ch[u][c] = 节点 u 的第 c 个子节点编号
    bool exist[MAXN];  // exist[u] = 以节点 u 结尾的字符串是否存在
    int cnt;           // 节点计数器（根节点为 0）
    
    Trie() : cnt(0) {
        memset(ch[0], 0, sizeof(ch[0]));  // 初始化根节点
        exist[0] = false;
    }
    
    // 插入字符串
    // 时间复杂度：O(|s|)
    void insert(const string& s) {
        int p = 0;  // 从根节点开始
        for (char c : s) {
            int ci = c - 'a';
            if (!ch[p][ci]) {
                ch[p][ci] = ++cnt;           // 新建节点
                memset(ch[cnt], 0, sizeof(ch[cnt]));
                exist[cnt] = false;
            }
            p = ch[p][ci];                    // 移动到下一个节点
        }
        exist[p] = true;  // 标记字符串末尾
    }
    
    // 查询字符串是否存在
    // 时间复杂度：O(|s|)
    bool find(const string& s) {
        int p = 0;  // 从根节点开始
        for (char c : s) {
            int ci = c - 'a';
            if (!ch[p][ci]) return false;  // 路径断了，不存在
            p = ch[p][ci];
        }
        return exist[p];  // 必须到达结尾标记的节点
    }
    
    // 查询是否有以 prefix 为前缀的字符串
    bool startsWith(const string& prefix) {
        int p = 0;
        for (char c : prefix) {
            int ci = c - 'a';
            if (!ch[p][ci]) return false;
            p = ch[p][ci];
        }
        return true;  // 路径存在即有此前缀（不需要 exist 标记）
    }
};

int main() {
    Trie trie;
    
    // 插入一些单词
    trie.insert("apple");
    trie.insert("app");
    trie.insert("apply");
    trie.insert("banana");
    
    // 查询
    cout << trie.find("apple")       << endl;  // 1（存在）
    cout << trie.find("app")         << endl;  // 1（存在）
    cout << trie.find("ap")          << endl;  // 0（路径存在，但没有标记）
    cout << trie.startsWith("ap")    << endl;  // 1（有以 "ap" 开头的字符串）
    cout << trie.startsWith("ban")   << endl;  // 1
    cout << trie.startsWith("cat")   << endl;  // 0
    
    return 0;
}

详细追踪：插入 "app" 和 "apple"

📄 查看代码：详细追踪：插入 "app" 和 "apple"

初始：只有根节点 0

插入 "app"：
  节点0 --'a'--> 节点1
  节点1 --'p'--> 节点2
  节点2 --'p'--> 节点3，exist[3] = true（"app" 结束）

插入 "apple"：
  节点0 --'a'--> 节点1（已存在，复用）
  节点1 --'p'--> 节点2（已存在，复用）
  节点2 --'p'--> 节点3（已存在，复用）
  节点3 --'l'--> 节点4（新建）
  节点4 --'e'--> 节点5，exist[5] = true（"apple" 结束）

现在 "app" 和 "apple" 共享前缀 "app"

3.12.7 Trie 树的经典应用

应用 1：单词频率统计（带计数的 Trie）

📄 查看代码：应用 1：单词频率统计（带计数的 Trie）

// 增加 count 数组记录每个节点被经过的次数
// count[u] = 以 u 为前缀的字符串的总插入次数

int count_prefix[MAXN];  // count[u] = 经过节点 u 的次数

void insert_with_count(const string& s) {
    int p = 0;
    for (char c : s) {
        int ci = c - 'a';
        if (!ch[p][ci]) {
            ch[p][ci] = ++cnt;
            count_prefix[cnt] = 0;
        }
        p = ch[p][ci];
        count_prefix[p]++;  // 每经过一个节点，计数 +1
    }
    exist[p] = true;
}

// 查询有多少个已插入字符串以 prefix 为前缀
int count_starts_with(const string& prefix) {
    int p = 0;
    for (char c : prefix) {
        int ci = c - 'a';
        if (!ch[p][ci]) return 0;
        p = ch[p][ci];
    }
    return count_prefix[p];
}

应用 2：01-Trie 求最大异或值

当数字以二进制形式插入 Trie 时，可以高效求「与某数异或值最大的数」。

原理：从最高位开始，贪心地走与当前位不同的那条路（使该位异或为 1）。

📄 C++ 完整代码

// 01-Trie：维护整数集合，支持查询"与 x 异或最大的数"
// 按二进制位（从第30位到第0位）建树
const int BIT = 30;
const int MAXN_01 = 3000005;

int ch01[MAXN_01][2];  // 只有 0/1 两个子节点
int cnt01 = 0;

// 将整数 x 插入 01-Trie
void insert01(int x) {
    int p = 0;
    for (int i = BIT; i >= 0; i--) {
        int bit = (x >> i) & 1;       // 取第 i 位
        if (!ch01[p][bit]) ch01[p][bit] = ++cnt01;
        p = ch01[p][bit];
    }
}

// 查询与 x 异或值最大的数，返回最大异或值
int max_xor(int x) {
    int p = 0, res = 0;
    for (int i = BIT; i >= 0; i--) {
        int bit = (x >> i) & 1;
        int want = 1 - bit;            // 想要走与 bit 不同的方向（使异或为1）
        if (ch01[p][want]) {
            p = ch01[p][want];
            res |= (1 << i);           // 这一位异或为 1
        } else {
            p = ch01[p][bit];          // 没有理想方向，只能走 bit 方向
        }
    }
    return res;
}

// 示例：
// 集合 {3, 7, 9, 12}，查询与 6（二进制 110）异或最大
// 6 XOR 9 = 15（1111），是最大值
int main() {
    int arr[] = {3, 7, 9, 12};
    for (int x : arr) insert01(x);
    cout << max_xor(6) << endl;  // 输出 15
    return 0;
}

追踪 max_xor(6) 的过程（6 = 0...0110）：

第30~4位：全 0，Trie 中无这些位，跳过
第3位（bit=0）：want=1，ch[p][1] 存在（9=1001 的第3位=1）→ 走1，res |= 8
第2位（bit=1）：want=0，ch[p][0] 存在（9=1001 的第2位=0）→ 走0，res |= 4
第1位（bit=1）：want=0，ch[p][0] 存在（9=1001 的第1位=0）→ 走0，res |= 2
第0位（bit=0）：want=1，ch[p][1] 存在（9=1001 的第0位=1）→ 走1，res |= 1

最终 res = 8+4+2+1 = 15
对应的数是 9（9 XOR 6 = 15 ✓）

⚠️ 常见错误

错误	原因	修复方案
KMP 匹配位置计算错误	`i - 2*m` 忘记两个 m（模式串 + 分隔符）	画图确认拼接串的下标
π[0] 不初始化为 0	忘记规定	`vector<int> pi(n, 0)`
Trie 节点数组太小	最大节点数 = 所有字符串的总字符数	`MAXN = 所有字符串长度之和 + 1`
忘记初始化新建节点的子指针	`ch[cnt]` 未清零	新建节点时 `memset(ch[cnt], 0, sizeof(ch[cnt]))`
01-Trie 位数设置错误	BIT 设小了导致数据丢失	根据题目数值范围设置，整数通常 `BIT = 30`

💪 练习题（共 8 道，全部含完整解答）

🟢 基础练习（1~3）

题目 1：实现 strStr
在字符串 haystack 中找到 needle 第一次出现的起始位置（若不存在返回 -1）。
要求使用 KMP 实现，时间复杂度 O(N+M)。

示例：

haystack = "hello", needle = "ll"  → 2
haystack = "aaaaa", needle = "bba" → -1

✅ 完整解答

思路： 拼接 needle + '#' + haystack，计算前缀函数，找 π[i] == len(needle) 的位置。

#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

int strStr(string haystack, string needle) {
    if (needle.empty()) return 0;
    int m = needle.size(), n = haystack.size();
    auto pi = prefix_function(needle + '#' + haystack);
    for (int i = m + 1; i <= m + n; i++)
        if (pi[i] == m) return i - 2 * m;
    return -1;
}

int main() {
    cout << strStr("hello",  "ll")  << "\n";  // 2
    cout << strStr("aaaaa",  "bba") << "\n";  // -1
    cout << strStr("sadbutsad", "sad") << "\n";  // 0（第一次出现）
    return 0;
}

追踪（haystack="hello", needle="ll"）：

拼接串：  l l # h e l l o
π 值：    0 1 0 0 0 1 2 0
                        ↑ i=6，π[6]=2=len("ll") ✓
位置 = 6 - 2×2 = 2

题目 2：单词检索
给定 N 个单词，Q 次查询，每次询问某单词是否在词典中。要求插入和查询都用 Trie，总复杂度 O(总字符数)。

✅ 完整解答

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 1000005;
int ch[MAXN][26];
bool exist[MAXN];
int tot = 0;

void insert(const string& s) {
    int p = 0;
    for (char c : s) {
        int ci = c - 'a';
        if (!ch[p][ci]) {
            ch[p][ci] = ++tot;
            fill(ch[tot], ch[tot] + 26, 0);
            exist[tot] = false;
        }
        p = ch[p][ci];
    }
    exist[p] = true;
}

bool query(const string& s) {
    int p = 0;
    for (char c : s) {
        int ci = c - 'a';
        if (!ch[p][ci]) return false;
        p = ch[p][ci];
    }
    return exist[p];
}

int main() {
    ios_base::sync_with_stdio(false); cin.tie(NULL);
    int n, q;
    cin >> n;
    for (int i = 0; i < n; i++) {
        string w; cin >> w;
        insert(w);
    }
    cin >> q;
    while (q--) {
        string w; cin >> w;
        cout << (query(w) ? "YES" : "NO") << "\n";
    }
    return 0;
}

题目 3：最长公共前缀
给定 N 个字符串，求它们的最长公共前缀（LCP）。

示例：

输入：["flower","flow","flight"]
输出："fl"

✅ 完整解答

方法 1（排序法）： 排序后只比较第一个和最后一个字符串的前缀，因为它们字典序差距最大，LCP 最短。

#include <bits/stdc++.h>
using namespace std;

string longestCommonPrefix(vector<string>& strs) {
    if (strs.empty()) return "";
    sort(strs.begin(), strs.end());
    string& first = strs.front();
    string& last  = strs.back();
    int i = 0;
    while (i < (int)first.size() && i < (int)last.size() && first[i] == last[i])
        i++;
    return first.substr(0, i);
}

int main() {
    vector<string> v = {"flower", "flow", "flight"};
    cout << longestCommonPrefix(v) << "\n";  // "fl"
    v = {"dog", "racecar", "car"};
    cout << longestCommonPrefix(v) << "\n";  // ""
    return 0;
}

方法 2（Trie法）： 将所有字符串插入 Trie，从根节点出发沿唯一路径走，直到遇到分叉或终止标记。

string lcp_trie(vector<string>& strs) {
    // 插入所有字符串后，从根出发找只有一个非空子节点且不是末尾的最长路径
    for (auto& s : strs) insert(s);
    
    string result = "";
    int p = 0;
    while (true) {
        if (exist[p]) break;  // 某字符串在此结束，不能继续
        int next = -1, cnt = 0;
        for (int i = 0; i < 26; i++) {
            if (ch[p][i]) { next = i; cnt++; }
        }
        if (cnt != 1) break;  // 有分叉，停止
        result += (char)('a' + next);
        p = ch[p][next];
    }
    return result;
}

🟡 进阶练习（4~6）

题目 4：字符串最小周期
给定字符串 s，求其最短重复周期的长度 T。
即找最小的 T，使得 s 可以由 s[0..T-1] 重复若干次（或截断）构成。

示例：

"abcabc"  → T = 3（"abc" 重复 2 次）
"abababab" → T = 2（"ab" 重复 4 次）
"abcd"    → T = 4（整串本身）

✅ 完整解答

核心公式： T = n - π[n-1]，当 n % T == 0 时字符串可被整除，否则最小周期为 n。

#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

int min_period(const string& s) {
    int n = s.size();
    auto pi = prefix_function(s);
    int T = n - pi[n - 1];
    return (n % T == 0) ? T : n;
}

int main() {
    cout << min_period("abcabc")   << "\n";  // 3
    cout << min_period("abababab") << "\n";  // 2
    cout << min_period("abcd")     << "\n";  // 4
    cout << min_period("aaaaaa")   << "\n";  // 1
    return 0;
}

原理解释： π[n-1] = k 意味着字符串前 k 个字符 = 后 k 个字符。
若 s 有周期 T，则 π[n-1] ≥ n-T，等价于 T ≤ n - π[n-1]。
n - π[n-1] 正是最小的满足条件的 T。

题目 5：数组中任意两数的最大异或值
给定整数数组 nums，求任意两个元素异或的最大值。

示例：

nums = [3, 10, 5, 25, 2, 8]
最大异或：5 XOR 25 = 28
输出：28

✅ 完整解答

思路： 将所有数插入 01-Trie，然后对每个数 x 查询「与 x 异或最大的数」并取全局最大值。

#include <bits/stdc++.h>
using namespace std;

const int BIT = 30;
const int MAXN = 3200005;
int ch[MAXN][2];
int tot = 0;

void insert(int x) {
    int p = 0;
    for (int i = BIT; i >= 0; i--) {
        int b = (x >> i) & 1;
        if (!ch[p][b]) ch[p][b] = ++tot;
        p = ch[p][b];
    }
}

int max_xor_with(int x) {
    int p = 0, res = 0;
    for (int i = BIT; i >= 0; i--) {
        int b = (x >> i) & 1;
        int want = 1 - b;  // 想走异位，使该位异或为 1
        if (ch[p][want]) {
            res |= (1 << i);
            p = ch[p][want];
        } else {
            p = ch[p][b];
        }
    }
    return res;
}

int main() {
    ios_base::sync_with_stdio(false); cin.tie(NULL);
    
    int n; cin >> n;
    vector<int> nums(n);
    for (int& x : nums) { cin >> x; insert(x); }
    
    int ans = 0;
    for (int x : nums)
        ans = max(ans, max_xor_with(x));
    cout << ans << "\n";
    return 0;
}

追踪（nums = [3, 10, 5, 25, 2, 8]，查询 5 = 0b000101）：

插入所有数后对 5 查询：
位30~5：全0，Trie也全0，只能往0走
位4(x=0)：want=1，Trie中有1（25=11001的第4位=1）→ 走1，res+=16
位3(x=0)：want=1，Trie中有1（25的第3位=1）→ 走1，res+=8
位2(x=1)：want=0，Trie中有0（25的第2位=0）→ 走0，res+=4
位1(x=0)：want=1，无1路 → 走0
位0(x=1)：want=0，有0路（25的第0位=1，走不了） → 走1

res = 16+8+4 = 28 = 5 XOR 25 ✓

题目 6：电话号码查找（前缀冲突检测）
给定 N 个电话号码，判断是否存在某号码是另一个号码的前缀。若存在输出 NO，否则输出 YES。

示例：

号码：["911","9116","91125"]
"911" 是 "9116" 的前缀 → 输出 NO

✅ 完整解答

思路： 将所有号码插入 Trie，若插入过程中遇到已有终止标记的节点（当前号码是已插入号码的延伸），或插入完后该节点还有子节点（已插入号码是当前号码的前缀），则存在冲突。

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 1000005;
int ch[MAXN][10];  // 数字 0~9
bool is_end[MAXN];
int tot = 0;

// 插入号码，返回是否发现冲突
bool insert_check(const string& s) {
    int p = 0;
    for (char c : s) {
        int d = c - '0';
        if (is_end[p]) return true;   // 已有号码是当前号码的前缀
        if (!ch[p][d]) {
            ch[p][d] = ++tot;
            fill(ch[tot], ch[tot] + 10, 0);
            is_end[tot] = false;
        }
        p = ch[p][d];
    }
    // 插入完成，检查该节点是否已有子节点（当前号码是已有号码的前缀）
    for (int i = 0; i < 10; i++)
        if (ch[p][i]) return true;
    is_end[p] = true;
    return false;
}

int main() {
    int n; cin >> n;
    bool conflict = false;
    for (int i = 0; i < n; i++) {
        string s; cin >> s;
        if (insert_check(s)) conflict = true;
    }
    cout << (conflict ? "NO" : "YES") << "\n";
    return 0;
}

🔴 挑战练习（7~8）

题目 7：统计所有出现次数 ≥ 2 的子串数量
给定字符串 s（长度 ≤ 1000），统计在 s 中出现次数 ≥ 2 的不同子串的数量。

提示： 枚举每个可能的子串（起点 i，长度 len），用字符串哈希去重并统计出现次数。

✅ 完整解答

思路： 枚举所有子串，用 set<pair<哈希值, 长度>> 标记已统计过的，对每个子串用 KMP 统计出现次数。

#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

int count_occurrences(const string& text, const string& pattern) {
    if (pattern.empty()) return 0;
    int m = pattern.size();
    auto pi = prefix_function(pattern + '#' + text);
    int cnt = 0;
    for (int i = m + 1; i < (int)pi.size(); i++)
        if (pi[i] == m) cnt++;
    return cnt;
}

int main() {
    string s; cin >> s;
    int n = s.size();
    
    set<string> counted;  // 已统计的子串
    int ans = 0;
    
    for (int i = 0; i < n; i++) {
        for (int len = 1; len <= n - i; len++) {
            string sub = s.substr(i, len);
            if (counted.count(sub)) continue;
            counted.insert(sub);
            if (count_occurrences(s, sub) >= 2) ans++;
        }
    }
    
    cout << ans << "\n";
    return 0;
}

注：上面方法对小数据（n ≤ 1000）可行，大数据需用后缀数组或后缀自动机优化至 O(N log N)。

题目 8：最大数组异或路径（树上路径 + 01-Trie）
给定一棵有 N 个节点的树，每条边有权值。求树上任意一条路径，使路径上所有边权的异或和最大。

提示： 利用「两点路径的异或 = 两点到根的异或路径的异或」，先 DFS 求所有节点到根的异或距离，再用 01-Trie 求最大异或对。

✅ 完整解答

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
const int BIT = 30;
const int TRIE_MAXN = MAXN * 31 * 2;

// 01-Trie
int ch_trie[TRIE_MAXN][2];
int tot_trie = 0;

void insert_trie(int x) {
    int p = 0;
    for (int i = BIT; i >= 0; i--) {
        int b = (x >> i) & 1;
        if (!ch_trie[p][b]) ch_trie[p][b] = ++tot_trie;
        p = ch_trie[p][b];
    }
}

int max_xor_trie(int x) {
    int p = 0, res = 0;
    for (int i = BIT; i >= 0; i--) {
        int b = (x >> i) & 1;
        int want = 1 - b;
        if (ch_trie[p][want]) { res |= (1 << i); p = ch_trie[p][want]; }
        else p = ch_trie[p][b];
    }
    return res;
}

// 树 DFS
vector<pair<int,int>> adj[MAXN];
int dist[MAXN];  // dist[i] = 根到 i 的异或距离

void dfs(int u, int parent) {
    insert_trie(dist[u]);
    for (auto [v, w] : adj[u]) {
        if (v == parent) continue;
        dist[v] = dist[u] ^ w;
        dfs(v, u);
    }
}

int main() {
    int n; cin >> n;
    for (int i = 0; i < n - 1; i++) {
        int u, v, w; cin >> u >> v >> w;
        adj[u].push_back({v, w});
        adj[v].push_back({u, w});
    }
    
    dist[1] = 0;
    dfs(1, -1);  // 从节点 1 出发 DFS
    
    // 对每个节点，在 Trie 中找与其异或最大的节点
    int ans = 0;
    for (int i = 1; i <= n; i++)
        ans = max(ans, max_xor_trie(dist[i]));
    
    cout << ans << "\n";
    return 0;
}

关键原理：
设 d[u] = 根到 u 的边权异或和。
路径 u → v 的异或和 = d[u] XOR d[v]（因为公共前缀部分异或两次等于 0）。
所以只需在所有 d[i] 中找最大异或对，01-Trie 做到 O(N log W)。

💡 章节联系： 字符串算法是 USACO Silver/Gold 的常见考点。KMP 解决单模式匹配，Trie 解决多字符串的集合查询和异或问题。学完本章后，可以进一步探索字符串哈希（第 3.7 章）和 AC 自动机（多模式同时匹配，竞赛进阶）。

C++ 竞赛编程：USACO 学习指南