Suffix trays and suffix trists: Structures for faster text indexing

Richard Cole, Tsvi Kopelowitz, Moshe Lewenstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time [3,5,6,7]. However, when it comes to answering queries, the prior does so in O(m log |Σ|) time, where m is the query size, |Σ| is the alphabet size, and the latter does so in O(m + log n), where n is the text size. We propose a novel way of combining the two into, what we call, a suffix tray. The space and construction time remain linear and the query time improves to O(m + log |Σ|). We also consider the online version of indexing, where the indexing structure continues to update the text online and queries are answered in tandem. Here we suggest a suffix trist, a cross between a suffix tree and a suffix list. It supports queries in O(m+log |Σ|). The space and text update time of a suffix trist are the same as for the suffix tree or the suffix list.

Original languageEnglish (US)
Title of host publicationAutomata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings
Pages358-369
Number of pages12
Volume4051 LNCS
DOIs
StatePublished - 2006
Event33rd International Colloquium on Automata, Languages and Programming, ICALP 2006 - Venice, Italy
Duration: Jul 10 2006Jul 14 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4051 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other33rd International Colloquium on Automata, Languages and Programming, ICALP 2006
CountryItaly
CityVenice
Period7/10/067/14/06

Fingerprint

Text Indexing
Suffix
Data structures
Suffix Tree
Query
Indexing
Linear Time
Update
Suffix Array
Linear Space
Data Structures
Continue
benzoylprop-ethyl
Text

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Cole, R., Kopelowitz, T., & Lewenstein, M. (2006). Suffix trays and suffix trists: Structures for faster text indexing. In Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings (Vol. 4051 LNCS, pp. 358-369). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4051 LNCS). https://doi.org/10.1007/11786986_32

Suffix trays and suffix trists : Structures for faster text indexing. / Cole, Richard; Kopelowitz, Tsvi; Lewenstein, Moshe.

Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings. Vol. 4051 LNCS 2006. p. 358-369 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4051 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cole, R, Kopelowitz, T & Lewenstein, M 2006, Suffix trays and suffix trists: Structures for faster text indexing. in Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings. vol. 4051 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4051 LNCS, pp. 358-369, 33rd International Colloquium on Automata, Languages and Programming, ICALP 2006, Venice, Italy, 7/10/06. https://doi.org/10.1007/11786986_32
Cole R, Kopelowitz T, Lewenstein M. Suffix trays and suffix trists: Structures for faster text indexing. In Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings. Vol. 4051 LNCS. 2006. p. 358-369. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11786986_32
Cole, Richard ; Kopelowitz, Tsvi ; Lewenstein, Moshe. / Suffix trays and suffix trists : Structures for faster text indexing. Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings. Vol. 4051 LNCS 2006. pp. 358-369 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b83fbc1799a5431d8e01260215fb6cc9,
title = "Suffix trays and suffix trists: Structures for faster text indexing",
abstract = "Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time [3,5,6,7]. However, when it comes to answering queries, the prior does so in O(m log |Σ|) time, where m is the query size, |Σ| is the alphabet size, and the latter does so in O(m + log n), where n is the text size. We propose a novel way of combining the two into, what we call, a suffix tray. The space and construction time remain linear and the query time improves to O(m + log |Σ|). We also consider the online version of indexing, where the indexing structure continues to update the text online and queries are answered in tandem. Here we suggest a suffix trist, a cross between a suffix tree and a suffix list. It supports queries in O(m+log |Σ|). The space and text update time of a suffix trist are the same as for the suffix tree or the suffix list.",
author = "Richard Cole and Tsvi Kopelowitz and Moshe Lewenstein",
year = "2006",
doi = "10.1007/11786986_32",
language = "English (US)",
isbn = "3540359044",
volume = "4051 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "358--369",
booktitle = "Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings",

}

TY - GEN

T1 - Suffix trays and suffix trists

T2 - Structures for faster text indexing

AU - Cole, Richard

AU - Kopelowitz, Tsvi

AU - Lewenstein, Moshe

PY - 2006

Y1 - 2006

N2 - Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time [3,5,6,7]. However, when it comes to answering queries, the prior does so in O(m log |Σ|) time, where m is the query size, |Σ| is the alphabet size, and the latter does so in O(m + log n), where n is the text size. We propose a novel way of combining the two into, what we call, a suffix tray. The space and construction time remain linear and the query time improves to O(m + log |Σ|). We also consider the online version of indexing, where the indexing structure continues to update the text online and queries are answered in tandem. Here we suggest a suffix trist, a cross between a suffix tree and a suffix list. It supports queries in O(m+log |Σ|). The space and text update time of a suffix trist are the same as for the suffix tree or the suffix list.

AB - Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time [3,5,6,7]. However, when it comes to answering queries, the prior does so in O(m log |Σ|) time, where m is the query size, |Σ| is the alphabet size, and the latter does so in O(m + log n), where n is the text size. We propose a novel way of combining the two into, what we call, a suffix tray. The space and construction time remain linear and the query time improves to O(m + log |Σ|). We also consider the online version of indexing, where the indexing structure continues to update the text online and queries are answered in tandem. Here we suggest a suffix trist, a cross between a suffix tree and a suffix list. It supports queries in O(m+log |Σ|). The space and text update time of a suffix trist are the same as for the suffix tree or the suffix list.

UR - http://www.scopus.com/inward/record.url?scp=33746367784&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746367784&partnerID=8YFLogxK

U2 - 10.1007/11786986_32

DO - 10.1007/11786986_32

M3 - Conference contribution

AN - SCOPUS:33746367784

SN - 3540359044

SN - 9783540359043

VL - 4051 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 358

EP - 369

BT - Automata, Languages and Programming - 33rd International Colloquium, ICALP 2006, Proceedings

ER -