Skip to main content
Glossary Term

Suffix array

Suffix Arrays and their Construction - A suffix array is a sorted array of all suffixes of a string. - Suffix arrays were introduced as an alternative to suffix trees. - The first in-place suffix array construction algorithm was developed in 2016. - Enhanced suffix arrays reproduce the functionality of suffix trees. - Suffix arrays require less space than suffix trees. - A suffix array requires 4 bytes per integer, while a suffix tree requires 20 bytes per node. - Compressed suffix arrays and BWT-based indices require less space than suffix arrays. - The SA-IS algorithm is a fast and space-efficient suffix array construction algorithm. - Suffix array construction algorithms differ based on the supported alphabet. - Prefix doubling algorithms find prefixes that honor the lexicographic ordering of suffixes. - Recursive algorithms recursively sort a subset of suffixes and merge the suffix arrays. - Induced copying algorithms sort the remaining suffixes using an already sorted subset. - The DC3/skew algorithm is a recursive algorithm for integer alphabets. - Dynamic suffix arrays update the suffix array of an edited text instead of rebuilding it. - Yuta Mori's DivSufSort is the fastest known suffix sorting algorithm in main memory. - Ilya Grebnov presented a faster implementation of the algorithm in 2021. - The algorithm showed a 65% performance improvement over DivSufSort on Silesia Corpus. - Dynamic suffix arrays are more efficient than rebuilding for inserting letters in the original text. - Open source routines such as qsufsort and DivSufSort are commonly used for suffix array construction. Generalized Suffix Arrays and their Applications - A generalized suffix array contains all suffixes for a set of strings. - It is lexicographically sorted with all suffixes of each string. - Suffix sorting algorithms can be used to compute the Burrows-Wheeler transform (BWT). - The BWT can be computed in linear time using a suffix array. - The BWT is useful for data compression and string searching. Enhanced Suffix Arrays and their Properties - Enhanced suffix arrays consist of suffix arrays and a child table, improving space efficiency and time complexity. - The child table contains information about the parent-child relationship in the suffix tree. - Enhanced suffix arrays can be applied to any algorithm that uses a suffix tree by using lcp-interval trees. - Searching a pattern in an enhanced suffix array has a time complexity of O(m|Σ|). - An lcp-interval is associated with a node in the suffix tree and represents a range of suffixes with a common longest prefix. - The lcp-interval has specific properties, such as the lcp-value and the relationship with other intervals. - The lcp-array stores the lengths of the longest common prefixes between consecutive suffixes. - The lcp-interval reflects the parent-child relationship in the suffix tree. - The lcp-interval can be used to compute the child table in linear time. - The child table, composed of arrays like cldtab, down, and nextlIndex, stores information about the edges of the suffix tree. - The down and nextlIndex arrays maintain the parent-child relationship. - The child table can be constructed by traversing the lcp-interval tree in a bottom-up manner. - Separate algorithms can compute the up/down values and the nextlIndex values. - Constructing the child table is essential for efficient operations on enhanced suffix arrays. - The suffix links in an enhanced suffix array can be computed using suffix link intervals. - Suffix link intervals are generated for each lcp-interval during preprocessing. - The left and right elements of the interval are maintained in the first index of . - The suffix link table is constructed through a breadth-first traversal of the lcp-interval tree. - The suffix link interval is represented by the interval in the l-list. Recent Developments in Suffix Arrays - Yuta Mori's DivSufSort is the fastest known suffix sorting algorithm in main memory. - Ilya Grebnov presented a faster implementation of the algorithm in 2021. - The algorithm showed a 65% performance improvement over DivSufSort on Silesia Corpus. - Dynamic suffix arrays are more efficient than rebuilding for inserting letters in the original text. - Open source routines such as qsufsort and DivSufSort are commonly used for suffix array construction. References - Abouelhoda, Kurtz & Ohlebusch 2004 - I, Kärkkäinen & Kempa 2014 - Gawrychowski & Kociumaka 2017 - Abouelhoda, Kurtz & Ohlebusch 2002 - Kurtz 1999 - Puglisi, Smyth & Turpin 2007 - Fischer 2011 - Mori, Yuta. sais. Archived from the original on 9 Mar 2023. Retrieved 31 Aug 2023 - Burkhardt & Kärkkäinen 2003 - Kulla & Sanders 2007 - L. Ard, "Dynamic Extended Suffix Arrays," Journal of Discrete Algorithms, vol. 8, no. 2, pp. 241, 2010. - S. Burkhardt and J. Kärkkäinen, "Fast Lightweight Suffix Array Construction and Checking," in Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676, pp. 55-69, 2003. - R. M. Karp, R. E. Miller, and A. L. Rosenberg, "Rapid Identification of Repeated Patterns," in Proceedings of the fourth annual ACM symposium on Theory of computing - STOC 72, pp. 125-136, 1972. - M. Farach, "Optimal Suffix Tree Construction