### Abstract

The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L_{1} vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

Original language | English (US) |
---|---|

Article number | 1219947 |

Journal | ACM Transactions on Algorithms |

Volume | 3 |

Issue number | 1 |

DOIs | |

State | Published - Feb 1 2007 |

### Fingerprint

### Keywords

- Approximate pattern matching
- Data streams
- Edit distance
- Embedding
- Similarity search
- String matching

### ASJC Scopus subject areas

- Mathematics (miscellaneous)

### Cite this

*ACM Transactions on Algorithms*,

*3*(1), [1219947]. https://doi.org/10.1145/1186810.1186812

**The string edit distance matching problem with moves.** / Cormode, Graham; Muthukrishnan, Shanmugavelayutham.

Research output: Contribution to journal › Article

*ACM Transactions on Algorithms*, vol. 3, no. 1, 1219947. https://doi.org/10.1145/1186810.1186812

}

TY - JOUR

T1 - The string edit distance matching problem with moves

AU - Cormode, Graham

AU - Muthukrishnan, Shanmugavelayutham

PY - 2007/2/1

Y1 - 2007/2/1

N2 - The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

AB - The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

KW - Approximate pattern matching

KW - Data streams

KW - Edit distance

KW - Embedding

KW - Similarity search

KW - String matching

UR - http://www.scopus.com/inward/record.url?scp=33847272670&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847272670&partnerID=8YFLogxK

U2 - 10.1145/1186810.1186812

DO - 10.1145/1186810.1186812

M3 - Article

AN - SCOPUS:33847272670

VL - 3

JO - ACM Transactions on Algorithms

JF - ACM Transactions on Algorithms

SN - 1549-6325

IS - 1

M1 - 1219947

ER -