Directory

Encyclopedia

NodeWorks
                              ENCYCLOPEDIA

Link Checker

Home
Encyclopedia : S : SU : SUF :

Suffix tree

 

Suffix tree

The suffix tree data structure was one of the first linear-time solutions for the longest common substring problem. It was first described by E.M. McCreight in 1976. A suffix tree for an n-character string S is a Patricia trie containing all n suffixes of S.

With it, a large text can be searched, and common substrings can be extracted, very quickly. Variants of the LZW compression schemes use it (LZSS). Suffix trees are useful for string matching applications, such as those that arise when working with DNA sequences.

Each edge in a suffix tree contains the following information: an edge label, in the form of a substring of the source string, represented by the start and end positions of the substring; a list of child nodes, often in the form of a linked list, a pointer to the next sibling node, and a suffix link, pointing to the node for the immediate suffix of the string represented by the current node. Suffix links are a key feature for linear-time construction of the tree, since they allow changes to propagate to all suffixes quickly.

The large amount of information at each node makes the suffix tree very memory-intensive, consuming some twenty times the memory size of the source text in common implementations. The Suffix array reduces this requirement to a factor of four, and efforts have continued to find smaller indexing structures.

References

  • E.M. McCreight. (1976). A space-economical suffix tree construction algorithm. Journal of the ACM 23 262-272.
  • E. Ukkonen. (1995). On-line construction of suffix trees. Algorithmica 14(3):249-260. PDF

    External links

  • Suffix Trees by Lloyd Allison
  • Suffix Trees links collection by Mark Nelson
  • Fast String Searching With Suffix Trees by Mark Nelson
  • NIST's Dictionary of Algorithms and Data Structures: Suffix Tree



  • NodeWorks boosts web surfing!
    Page Returned in 0.099 seconds - HTML Compressed 67.3%

    This article is from Wikipedia. All text is available
    under the terms of the GNU Free Documentation License.
     GNU Free Documentation License
    © 2008 Chamas Enterprises Inc.