Detecting false matches in string matching algorithms

Shanmugavelayutham Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Consider a text string of length n, a pattern string of length m, and a match vector of length n which declares each location in the text to be either a mismatch (the pattern does not occur beginning at that location in the text) or a potential match (the pattern may occur beginning at that location in the text). Some of the potential matches could be false, i.e., the pattern may not occur beginning at some location in the text declared to be a potential match. We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists any false match in O(1) time using O(n) processors. Since string matching takes Ω(log log m) time on the CRCW PRAM, checking for false matches is provably simpler than string matching. As an important application, we use this simple algorithm to convert the Karp-Rabin Monte Carlo type string matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. We also present an efficient algorithm for identifying all the false matches and as a consequence, show that string matching algorithms take Ω(log log m) time even given the flexibility to output a few false matches. In addition, we give a sequential algorithm for checking using three heads on a 2-way deterministic finite slate automaton (DFA) in linear time and another on a 1-way DFA with a fixed number of heads.

    Original languageEnglish (US)
    Title of host publicationCombinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings
    EditorsZvi Galil, Maxime Crochemore , Alberto Apostolico, Alberto Apostolico, Zvi Galil, Udi Manber
    PublisherSpringer-Verlag
    Pages164-178
    Number of pages15
    ISBN (Print)9783540567646
    StatePublished - Jan 1 1993
    EventConference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2017 and 16th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, IWIFSGN 2017 - Warsaw, Poland
    Duration: Sep 11 2017Sep 15 2017

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume684 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceConference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2017 and 16th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, IWIFSGN 2017
    CountryPoland
    CityWarsaw
    Period9/11/179/15/17

    Fingerprint

    String searching algorithms
    String Algorithms
    String Matching
    Matching Algorithm
    Slate
    Deterministic Finite Automata
    Strings
    Sequential Algorithm
    False
    Convert
    Linear Time
    Efficient Algorithms
    Flexibility
    Text
    Output

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Muthukrishnan, S. (1993). Detecting false matches in string matching algorithms. In Z. Galil, M. Crochemore , A. Apostolico, A. Apostolico, Z. Galil, & U. Manber (Eds.), Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings (pp. 164-178). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 684 LNCS). Springer-Verlag.

    Detecting false matches in string matching algorithms. / Muthukrishnan, Shanmugavelayutham.

    Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings. ed. / Zvi Galil; Maxime Crochemore ; Alberto Apostolico; Alberto Apostolico; Zvi Galil; Udi Manber. Springer-Verlag, 1993. p. 164-178 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 684 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Muthukrishnan, S 1993, Detecting false matches in string matching algorithms. in Z Galil, M Crochemore , A Apostolico, A Apostolico, Z Galil & U Manber (eds), Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 684 LNCS, Springer-Verlag, pp. 164-178, Conference of the European Society for Fuzzy Logic and Technology, EUSFLAT 2017 and 16th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, IWIFSGN 2017, Warsaw, Poland, 9/11/17.
    Muthukrishnan S. Detecting false matches in string matching algorithms. In Galil Z, Crochemore M, Apostolico A, Apostolico A, Galil Z, Manber U, editors, Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings. Springer-Verlag. 1993. p. 164-178. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    Muthukrishnan, Shanmugavelayutham. / Detecting false matches in string matching algorithms. Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings. editor / Zvi Galil ; Maxime Crochemore ; Alberto Apostolico ; Alberto Apostolico ; Zvi Galil ; Udi Manber. Springer-Verlag, 1993. pp. 164-178 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{d74b45e9b79742fe9c043ffc4ccfd83b,
    title = "Detecting false matches in string matching algorithms",
    abstract = "Consider a text string of length n, a pattern string of length m, and a match vector of length n which declares each location in the text to be either a mismatch (the pattern does not occur beginning at that location in the text) or a potential match (the pattern may occur beginning at that location in the text). Some of the potential matches could be false, i.e., the pattern may not occur beginning at some location in the text declared to be a potential match. We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists any false match in O(1) time using O(n) processors. Since string matching takes Ω(log log m) time on the CRCW PRAM, checking for false matches is provably simpler than string matching. As an important application, we use this simple algorithm to convert the Karp-Rabin Monte Carlo type string matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. We also present an efficient algorithm for identifying all the false matches and as a consequence, show that string matching algorithms take Ω(log log m) time even given the flexibility to output a few false matches. In addition, we give a sequential algorithm for checking using three heads on a 2-way deterministic finite slate automaton (DFA) in linear time and another on a 1-way DFA with a fixed number of heads.",
    author = "Shanmugavelayutham Muthukrishnan",
    year = "1993",
    month = "1",
    day = "1",
    language = "English (US)",
    isbn = "9783540567646",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "164--178",
    editor = "Zvi Galil and {Crochemore }, Maxime and Alberto Apostolico and Alberto Apostolico and Zvi Galil and Udi Manber",
    booktitle = "Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings",

    }

    TY - GEN

    T1 - Detecting false matches in string matching algorithms

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 1993/1/1

    Y1 - 1993/1/1

    N2 - Consider a text string of length n, a pattern string of length m, and a match vector of length n which declares each location in the text to be either a mismatch (the pattern does not occur beginning at that location in the text) or a potential match (the pattern may occur beginning at that location in the text). Some of the potential matches could be false, i.e., the pattern may not occur beginning at some location in the text declared to be a potential match. We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists any false match in O(1) time using O(n) processors. Since string matching takes Ω(log log m) time on the CRCW PRAM, checking for false matches is provably simpler than string matching. As an important application, we use this simple algorithm to convert the Karp-Rabin Monte Carlo type string matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. We also present an efficient algorithm for identifying all the false matches and as a consequence, show that string matching algorithms take Ω(log log m) time even given the flexibility to output a few false matches. In addition, we give a sequential algorithm for checking using three heads on a 2-way deterministic finite slate automaton (DFA) in linear time and another on a 1-way DFA with a fixed number of heads.

    AB - Consider a text string of length n, a pattern string of length m, and a match vector of length n which declares each location in the text to be either a mismatch (the pattern does not occur beginning at that location in the text) or a potential match (the pattern may occur beginning at that location in the text). Some of the potential matches could be false, i.e., the pattern may not occur beginning at some location in the text declared to be a potential match. We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists any false match in O(1) time using O(n) processors. Since string matching takes Ω(log log m) time on the CRCW PRAM, checking for false matches is provably simpler than string matching. As an important application, we use this simple algorithm to convert the Karp-Rabin Monte Carlo type string matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. We also present an efficient algorithm for identifying all the false matches and as a consequence, show that string matching algorithms take Ω(log log m) time even given the flexibility to output a few false matches. In addition, we give a sequential algorithm for checking using three heads on a 2-way deterministic finite slate automaton (DFA) in linear time and another on a 1-way DFA with a fixed number of heads.

    UR - http://www.scopus.com/inward/record.url?scp=0005311094&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0005311094&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 9783540567646

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 164

    EP - 178

    BT - Combinatorial Pattern Matching - 4th Annual Symposium, CPM 1993, Proceedings

    A2 - Galil, Zvi

    A2 - Crochemore , Maxime

    A2 - Apostolico, Alberto

    A2 - Apostolico, Alberto

    A2 - Galil, Zvi

    A2 - Manber, Udi

    PB - Springer-Verlag

    ER -