Generalized substring selectivity estimation

Zhiyuan Chen, Flip Korn, Nick Koudas, S. Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    An approach to selectivity estimation of generalized Boolean substring queries with a focus on conjunctive multidimensional and Boolean queries was presented. The set hashing, a Monte Carlo technique, was used to succinctly represent the set of tuples containing a given substring as a signature vector of hash values. The analysis showed that using only linear storage, a large number of cross-counts were generated including those for complex co-occurrences of substrings.

    Original languageEnglish (US)
    Pages (from-to)98-132
    Number of pages35
    JournalJournal of Computer and System Sciences
    Volume66
    Issue number1
    DOIs
    StatePublished - Jul 8 2003

      Fingerprint

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Networks and Communications
    • Computational Theory and Mathematics
    • Applied Mathematics

    Cite this