Glossary Term
Trie
History, etymology, and pronunciation
- Trie was first abstractly described by Axel Thue in 1912
- René de la Briandais described tries in a computer context in 1959
- Edward Fredkin independently described tries in 1960 and coined the term 'trie'
- Some authors pronounce it as /ˈtriː/ (tree), while others pronounce it as /ˈtraɪ/ (try)
- The term 'trie' comes from the middle syllable of 'retrieval'
Overview and Operations
- Tries are a form of string-indexed look-up data structure
- They are used to store a dictionary list of words
- Tries allow for efficient generation of completion lists
- They are particularly useful for string-searching algorithms like predictive text and spell checking
- Tries can be seen as tree-shaped deterministic finite automata
- Tries support insertion, deletion, and lookup of string keys
- Nodes in a trie contain links to other suffix child nodes or null
- Each node has a parent except for the root node
- The number of links in each node is equal to the number of characters in the alphabet
- Nodes in the trie do not store their associated key, but their position defines the key
Searching and Insertion
- Searching in a trie is guided by the characters in the search string key
- Each node in the trie contains a link to each possible character in the string
- Following the string within the trie leads to the associated value for the key
- A null link during the search indicates the key does not exist in the trie
- The search operation in a standard trie takes O(dm) time, where m is the size of the string key and d is the alphabet size
- Insertion into a trie involves using the characters in the string key as indexes to the children array
- Each node in the trie corresponds to one call of the radix sorting routine
- If a null link is encountered during insertion, a new node is created
- The value of the terminal node is assigned to the input value
- If the terminal node already had a value, it is replaced with the new value
Replacing other data structures and Implementation strategies
- A trie can replace a hash table
- Searching for a node in a trie has a complexity of O(m), while an imperfect hash function may have a worst-case lookup speed of O(N)
- Tries do not require a hash function and do not have collisions
- Tries can sort string keys using alphabetical ordering
- Tries are less efficient than hash tables when data is accessed on a secondary storage device
- Tries can be represented using a vector of pointers or a singly linked list for each node
- Techniques like alphabet reduction can reduce space complexity by using a smaller alphabet
- Storing a vector of ASCII pointers as a bitmap reduces the size of nodes
- Different implementation strategies have trade-offs between memory use and speed
- Memory space can be reduced at the expense of running time
Bitwise tries and Compressed tries
- Bitwise tries address the space requirement issue in naive pointer vector implementations
- Each character in the string key set is represented using individual bits
- Vectorized CPU instructions are used to find the first set bit in a fixed-length key input
- Bitwise tries are cache-local and highly parallelizable
- They perform well on out-of-order execution CPUs
- Radix tree, or compressed trie, is a space-optimized variant of a trie
- Nodes with only one child get merged with their parents, reducing space and time metrics
- Compressed tries work best when the trie remains static and keys are sparse
- Packing the trie can further optimize space usage
- Patricia trees are a binary encoding of string keys in a compressed binary trie
Note: The content did not provide enough information to create a comprehensive fifth group.