数据结构与算法: 树形结构- Trie树

Trie树,又称为字典树、单词查找树或键树,常用于统计和排序大量的字符串(但不仅限于字符串)。优点是最大限度地减少无谓的字符串的比较,可以用于字符串的最长匹配查找。

Trie树的核心思想是空间换时间,利用字符串的公共前缀来降低查询时间的开销以提高效率。

3个基本性质:

  • 根节点不包含字符,除根节点外每个节点都只包含一个字符。
  • 从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串。
  • 每个节点的所有子节点包含的字符都不相同。

标准Trie树实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
public class TrieTree<T> {
private TrieNode<T> root = new TrieNode<>();
static class TrieNode<T>{
// 分隔字符
private char c;
// 值
private T value;
// 子节点
private Map<Character, TrieNode<T>> children = new HashMap<>();
public TrieNode() {
}
public TrieNode(char c) {
this.c = c;
}
public TrieNode(char c, T value) {
this.c = c;
this.value = value;
}
public TrieNode<T> child(Character c){
return children.get(c);
}
public void addChild(TrieNode<T> child){
this.children.put(child.c, child);
}
}
public void addWord(String source, T value){
TrieNode tstNode = root;
for (int i = 0; i < source.length(); i++){
char c = source.charAt(i);
TrieNode tmpNode = tstNode.child(c);
if (tmpNode == null){
tmpNode = new TrieNode(c);
}
if (i == source.length() - 1){
tmpNode.value = value;
}
tstNode.addChild(tmpNode);
tstNode = tmpNode;
}
}
public T search(String source){
TrieNode tstNode = root;
for (int i = 0; i < source.length(); i++){
tstNode = tstNode.child(source.charAt(i));
if (tstNode.value != null){
return (T) tstNode.value;
}
}
return null;
}
public static void main(String[] args) {
TrieTree<String> trieTree = new TrieTree<>();
trieTree.addWord("0995", "新疆托克逊");
trieTree.addWord("0856", "贵州铜仁");
trieTree.addWord("0996", "新疆回族");
System.out.println(trieTree.search("0996"));
}
}

三叉Trie树

三叉Trie树有三个指针:一个指向左边的树,一个指向右边的树,还有一个向下指向单词的下一个数据单元。

单词的读入顺序对于创建平衡的三叉Trie树很重要,一般将输入排好序,选择排序后数据单元集合的中间值作为开始节点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
public class TernaryTrieTree<T> {
private TrieNode<T> root;
public static class TrieNode<T> {
private TrieNode<T> leftNode;
private TrieNode<T> midNode;
private TrieNode<T> rightNode;
private char c;
private T value;
public TrieNode(char c, T value) {
this.c = c;
this.value = value;
}
}
public void addWord(String source, T value) {
if (root == null){
root = new TrieNode<>(source.charAt(0), value);
}
TrieNode tstNode = root;
int charIndex = 0; // 从词的开头匹配
while (true) {
int charCmp = source.charAt(charIndex) - tstNode.c;
if (charCmp == 0) {
charIndex++;
if (charIndex == source.length()) {
return; // 已有该词
}
if (tstNode.midNode == null) {
tstNode.midNode = new TrieNode(source.charAt(charIndex), value);
}
tstNode = tstNode.midNode;
} else if (charCmp < 0) {
if (tstNode.leftNode == null) {
tstNode.leftNode = new TrieNode(source.charAt(charIndex), value);
}
tstNode = tstNode.leftNode;
} else {
if (tstNode.rightNode == null) {
tstNode.rightNode = new TrieNode(source.charAt(charIndex), value);
}
tstNode = tstNode.rightNode;
}
}
}
public T search(String source) {
if (source == null || source.length() == 0) {
return null;
}
TrieNode tstNode = root;
int charIndex = 0;
char c = source.charAt(charIndex);
int charCmp;
while (true) {
if (tstNode == null) {
return null;
}
charCmp = c - tstNode.c;
if (charCmp == 0) {
charIndex++;
if (charIndex == source.length()) {
return (T) tstNode.value;
} else {
c = source.charAt(charIndex);
}
tstNode = tstNode.midNode;
} else if (charCmp < 0) {
tstNode = tstNode.leftNode;
} else {
tstNode = tstNode.rightNode;
}
}
}
public static void main(String[] args) {
TernaryTrieTree<String> tree = new TernaryTrieTree<>();
String[] strs = {"0995", "0856", "0996"};
// Collections.sort(Arrays.asList(strs));
for (String str: strs){
tree.addWord(str, str);
}
System.out.println(tree.search("0856"));
}
}