The string edit distance matching problem with moves

Graham Cormode, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

    Original languageEnglish (US)
    Article number1219947
    JournalACM Transactions on Algorithms
    Volume3
    Issue number1
    DOIs
    StatePublished - Feb 1 2007

    Fingerprint

    Edit Distance
    Matching Problem
    Strings
    Parsing
    Deterministic Algorithm
    Linear-time Algorithm
    Approximation
    Convert
    Vector space
    Alignment

    Keywords

    • Approximate pattern matching
    • Data streams
    • Edit distance
    • Embedding
    • Similarity search
    • String matching

    ASJC Scopus subject areas

    • Mathematics (miscellaneous)

    Cite this

    The string edit distance matching problem with moves. / Cormode, Graham; Muthukrishnan, Shanmugavelayutham.

    In: ACM Transactions on Algorithms, Vol. 3, No. 1, 1219947, 01.02.2007.

    Research output: Contribution to journalArticle

    Cormode, Graham ; Muthukrishnan, Shanmugavelayutham. / The string edit distance matching problem with moves. In: ACM Transactions on Algorithms. 2007 ; Vol. 3, No. 1.
    @article{aebd0b340b244d63985bc26ef0c9d0c6,
    title = "The string edit distance matching problem with moves",
    abstract = "The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).",
    keywords = "Approximate pattern matching, Data streams, Edit distance, Embedding, Similarity search, String matching",
    author = "Graham Cormode and Shanmugavelayutham Muthukrishnan",
    year = "2007",
    month = "2",
    day = "1",
    doi = "10.1145/1186810.1186812",
    language = "English (US)",
    volume = "3",
    journal = "ACM Transactions on Algorithms",
    issn = "1549-6325",
    publisher = "Association for Computing Machinery (ACM)",
    number = "1",

    }

    TY - JOUR

    T1 - The string edit distance matching problem with moves

    AU - Cormode, Graham

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2007/2/1

    Y1 - 2007/2/1

    N2 - The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

    AB - The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

    KW - Approximate pattern matching

    KW - Data streams

    KW - Edit distance

    KW - Embedding

    KW - Similarity search

    KW - String matching

    UR - http://www.scopus.com/inward/record.url?scp=33847272670&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33847272670&partnerID=8YFLogxK

    U2 - 10.1145/1186810.1186812

    DO - 10.1145/1186810.1186812

    M3 - Article

    AN - SCOPUS:33847272670

    VL - 3

    JO - ACM Transactions on Algorithms

    JF - ACM Transactions on Algorithms

    SN - 1549-6325

    IS - 1

    M1 - 1219947

    ER -