Suffix array

« Back to Glossary Index

Suffix Arrays and their Construction
– A suffix array is a sorted array of all suffixes of a string.
– Suffix arrays were introduced as an alternative to suffix trees.
– The first in-place suffix array construction algorithm was developed in 2016.
– Enhanced suffix arrays reproduce the functionality of suffix trees.
– Suffix arrays require less space than suffix trees.
– A suffix array requires 4 bytes per integer, while a suffix tree requires 20 bytes per node.
– Compressed suffix arrays and BWT-based indices require less space than suffix arrays.
– The SA-IS algorithm is a fast and space-efficient suffix array construction algorithm.
– Suffix array construction algorithms differ based on the supported alphabet.
– Prefix doubling algorithms find prefixes that honor the lexicographic ordering of suffixes.
– Recursive algorithms recursively sort a subset of suffixes and merge the suffix arrays.
– Induced copying algorithms sort the remaining suffixes using an already sorted subset.
– The DC3/skew algorithm is a recursive algorithm for integer alphabets.
– Dynamic suffix arrays update the suffix array of an edited text instead of rebuilding it.
– Yuta Mori’s DivSufSort is the fastest known suffix sorting algorithm in main memory.
– Ilya Grebnov presented a faster implementation of the algorithm in 2021.
– The algorithm showed a 65% performance improvement over DivSufSort on Silesia Corpus.
– Dynamic suffix arrays are more efficient than rebuilding for inserting letters in the original text.
– Open source routines such as qsufsort and DivSufSort are commonly used for suffix array construction.

Generalized Suffix Arrays and their Applications
– A generalized suffix array contains all suffixes for a set of strings.
– It is lexicographically sorted with all suffixes of each string.
– Suffix sorting algorithms can be used to compute the Burrows-Wheeler transform (BWT).
– The BWT can be computed in linear time using a suffix array.
– The BWT is useful for data compression and string searching.

Enhanced Suffix Arrays and their Properties
– Enhanced suffix arrays consist of suffix arrays and a child table, improving space efficiency and time complexity.
– The child table contains information about the parent-child relationship in the suffix tree.
– Enhanced suffix arrays can be applied to any algorithm that uses a suffix tree by using lcp-interval trees.
– Searching a pattern in an enhanced suffix array has a time complexity of O(m|Σ|).
– An lcp-interval is associated with a node in the suffix tree and represents a range of suffixes with a common longest prefix.
– The lcp-interval has specific properties, such as the lcp-value and the relationship with other intervals.
– The lcp-array stores the lengths of the longest common prefixes between consecutive suffixes.
– The lcp-interval reflects the parent-child relationship in the suffix tree.
– The lcp-interval can be used to compute the child table in linear time.
– The child table, composed of arrays like cldtab, down, and nextlIndex, stores information about the edges of the suffix tree.
– The down and nextlIndex arrays maintain the parent-child relationship.
– The child table can be constructed by traversing the lcp-interval tree in a bottom-up manner.
– Separate algorithms can compute the up/down values and the nextlIndex values.
– Constructing the child table is essential for efficient operations on enhanced suffix arrays.
– The suffix links in an enhanced suffix array can be computed using suffix link intervals.
– Suffix link intervals [i,..j] are generated for each lcp-interval [i,..j] during preprocessing.
– The left and right elements of the interval are maintained in the first index of [i,..j].
– The suffix link table is constructed through a breadth-first traversal of the lcp-interval tree.
– The suffix link interval [l,..r] is represented by the interval [l,..r] in the l-list.

Recent Developments in Suffix Arrays
– Yuta Mori’s DivSufSort is the fastest known suffix sorting algorithm in main memory.
– Ilya Grebnov presented a faster implementation of the algorithm in 2021.
– The algorithm showed a 65% performance improvement over DivSufSort on Silesia Corpus.
– Dynamic suffix arrays are more efficient than rebuilding for inserting letters in the original text.
– Open source routines such as qsufsort and DivSufSort are commonly used for suffix array construction.

References
– Abouelhoda, Kurtz & Ohlebusch 2004
– I, Kärkkäinen & Kempa 2014
– Gawrychowski & Kociumaka 2017
– Abouelhoda, Kurtz & Ohlebusch 2002
– Kurtz 1999
– Puglisi, Smyth & Turpin 2007
– Fischer 2011
– Mori, Yuta. sais. Archived from the original on 9 Mar 2023. Retrieved 31 Aug 2023
– Burkhardt & Kärkkäinen 2003
– Kulla & Sanders 2007
– L. Ard, “Dynamic Extended Suffix Arrays,” Journal of Discrete Algorithms, vol. 8, no. 2, pp. 241, 2010.
– S. Burkhardt and J. Kärkkäinen, “Fast Lightweight Suffix Array Construction and Checking,” in Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676, pp. 55-69, 2003.
– R. M. Karp, R. E. Miller, and A. L. Rosenberg, “Rapid Identification of Repeated Patterns,” in Proceedings of the fourth annual ACM symposium on Theory of computing – STOC 72, pp. 125-136, 1972.
– M. Farach, “Optimal Suffix Tree Construction

Suffix array (Wikipedia)

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.

Suffix array
TypeArray
Invented byManber & Myers (1990)
Time complexity
in big O notation
Average Worst case
Space
Construction

Suffix arrays were introduced by Manber & Myers (1990) as a simple, space efficient alternative to suffix trees. They had independently been discovered by Gaston Gonnet in 1987 under the name PAT array (Gonnet, Baeza-Yates & Snider 1992).

Li, Li & Huo (2016) gave the first in-place time suffix array construction algorithm that is optimal both in time and space, where in-place means that the algorithm only needs additional space beyond the input string and the output suffix array.

Enhanced suffix arrays (ESAs) are suffix arrays with additional tables that reproduce the full functionality of suffix trees preserving the same time and memory complexity. The suffix array for a subset of all suffixes of a string is called sparse suffix array. Multiple probabilistic algorithms have been developed to minimize the additional memory usage including an optimal time and memory algorithm.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

Wonderful people that understand our needs and make it happen!

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

facebook
Vince Fogliani
recommends

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

facebook
Steve Sacre
recommends

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website "stevestours.net"that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

facebook
Jamie Hill
recommends

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

facebook
Dean Eardley
recommends

Beautifully functional websites from professional, knowledgeable team.

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...