Approximate nearest neighbors and sequence comparison with block operations

Shanmugavelayutham Muthukrishnan, Süleyman Cenk Sahinalp

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S, Q) ≤ d(S, T) for any other sequence T in D. Here d(S, Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S, T) = d(T, S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with preprocessing time and space polynomial in size of D and query time near-linear in size of Q. We assume the distance d(S, T) between two sequences S and T is the minimum number of character edits and block operations needed to transform one to the other. The approximation factor we achieve is O(log ℓ(log* ℓ)2), where ℓ is the size of the longest sequence in D. In addition, we also give an algorithm for exactly computing the distance between two sequences when edit operations of the type character replacements and block reversals are allowed. The time and space requirements of the algorithm is near linear; previously known approaches take at least quadratic time.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000
    Pages416-424
    Number of pages9
    DOIs
    StatePublished - Dec 1 2000
    Event32nd Annual ACM Symposium on Theory of Computing, STOC 2000 - Portland, OR, United States
    Duration: May 21 2000May 23 2000

    Publication series

    NameProceedings of the Annual ACM Symposium on Theory of Computing
    ISSN (Print)0737-8017

    Conference

    Conference32nd Annual ACM Symposium on Theory of Computing, STOC 2000
    CountryUnited States
    CityPortland, OR
    Period5/21/005/23/00

    Fingerprint

    Polynomials
    Nearest neighbor search

    ASJC Scopus subject areas

    • Software

    Cite this

    Muthukrishnan, S., & Sahinalp, S. C. (2000). Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000 (pp. 416-424). (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/335305.335353

    Approximate nearest neighbors and sequence comparison with block operations. / Muthukrishnan, Shanmugavelayutham; Sahinalp, Süleyman Cenk.

    Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000. 2000. p. 416-424 (Proceedings of the Annual ACM Symposium on Theory of Computing).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Muthukrishnan, S & Sahinalp, SC 2000, Approximate nearest neighbors and sequence comparison with block operations. in Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000. Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 416-424, 32nd Annual ACM Symposium on Theory of Computing, STOC 2000, Portland, OR, United States, 5/21/00. https://doi.org/10.1145/335305.335353
    Muthukrishnan S, Sahinalp SC. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000. 2000. p. 416-424. (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/335305.335353
    Muthukrishnan, Shanmugavelayutham ; Sahinalp, Süleyman Cenk. / Approximate nearest neighbors and sequence comparison with block operations. Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000. 2000. pp. 416-424 (Proceedings of the Annual ACM Symposium on Theory of Computing).
    @inproceedings{e1cd03ebc5504409b29843755c062fee,
    title = "Approximate nearest neighbors and sequence comparison with block operations",
    abstract = "We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S, Q) ≤ d(S, T) for any other sequence T in D. Here d(S, Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S, T) = d(T, S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for {"}approximate{"} nearest neighbor search for sequences with preprocessing time and space polynomial in size of D and query time near-linear in size of Q. We assume the distance d(S, T) between two sequences S and T is the minimum number of character edits and block operations needed to transform one to the other. The approximation factor we achieve is O(log ℓ(log* ℓ)2), where ℓ is the size of the longest sequence in D. In addition, we also give an algorithm for exactly computing the distance between two sequences when edit operations of the type character replacements and block reversals are allowed. The time and space requirements of the algorithm is near linear; previously known approaches take at least quadratic time.",
    author = "Shanmugavelayutham Muthukrishnan and Sahinalp, {S{\"u}leyman Cenk}",
    year = "2000",
    month = "12",
    day = "1",
    doi = "10.1145/335305.335353",
    language = "English (US)",
    isbn = "1581131844",
    series = "Proceedings of the Annual ACM Symposium on Theory of Computing",
    pages = "416--424",
    booktitle = "Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000",

    }

    TY - GEN

    T1 - Approximate nearest neighbors and sequence comparison with block operations

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Sahinalp, Süleyman Cenk

    PY - 2000/12/1

    Y1 - 2000/12/1

    N2 - We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S, Q) ≤ d(S, T) for any other sequence T in D. Here d(S, Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S, T) = d(T, S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with preprocessing time and space polynomial in size of D and query time near-linear in size of Q. We assume the distance d(S, T) between two sequences S and T is the minimum number of character edits and block operations needed to transform one to the other. The approximation factor we achieve is O(log ℓ(log* ℓ)2), where ℓ is the size of the longest sequence in D. In addition, we also give an algorithm for exactly computing the distance between two sequences when edit operations of the type character replacements and block reversals are allowed. The time and space requirements of the algorithm is near linear; previously known approaches take at least quadratic time.

    AB - We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S, Q) ≤ d(S, T) for any other sequence T in D. Here d(S, Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S, T) = d(T, S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with preprocessing time and space polynomial in size of D and query time near-linear in size of Q. We assume the distance d(S, T) between two sequences S and T is the minimum number of character edits and block operations needed to transform one to the other. The approximation factor we achieve is O(log ℓ(log* ℓ)2), where ℓ is the size of the longest sequence in D. In addition, we also give an algorithm for exactly computing the distance between two sequences when edit operations of the type character replacements and block reversals are allowed. The time and space requirements of the algorithm is near linear; previously known approaches take at least quadratic time.

    UR - http://www.scopus.com/inward/record.url?scp=0033705069&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0033705069&partnerID=8YFLogxK

    U2 - 10.1145/335305.335353

    DO - 10.1145/335305.335353

    M3 - Conference contribution

    SN - 1581131844

    SN - 9781581131840

    T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

    SP - 416

    EP - 424

    BT - Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000

    ER -