Permutation editing and matching via embeddings

Graham Cormode, Shanmugavelayutham Muthukrishnan, Süleyman Cenk Sahinalp

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to transform one such permutation to another can be used as a measure of their evolutionary distance. Motivated by such scenarios, we study problems of computing distances between permutations as well as matching permutations in sequences, and finding most similar permutation from a collection (nearest neighbor). We adopt a general approach: embed permutation distances of relevance into well-known vector spaces in an approximately distance-preserving manner, and solve the resulting problems on the well-known spaces. Our results are as follows: We present the first known approximately distance preserving embeddings of these permutation distances into well-known spaces. Using these embeddings, we obtain several results, including the first known ecient solution for approximately solving nearest neighbor problems with permutations and the first known algorithms for finding permutation distances in the data stream model. We consider a novel class of problems called permutation matching problems which are similar to string matching problems, except that the pattern is a permutation (rather than a string) and present linear or near-linear time algorithms for approximately solving permutation matching problems; in contrast, the corresponding string problems take significantly longer.

    Original languageEnglish (US)
    Title of host publicationAutomata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings
    Pages481-492
    Number of pages12
    StatePublished - Dec 1 2001
    Event28th International Colloquium on Automata, Languages and Programming, ICALP 2001 - Crete, Greece
    Duration: Jul 8 2001Jul 12 2001

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2076 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other28th International Colloquium on Automata, Languages and Programming, ICALP 2001
    CountryGreece
    CityCrete
    Period7/8/017/12/01

    Fingerprint

    Permutation
    Vector spaces
    Genes
    Matching Problem
    Nearest Neighbor
    Strings
    String Matching
    Linear-time Algorithm
    Rearrangement
    Data Streams
    Deletion
    Vector space
    Inversion
    Transform
    Gene
    Scenarios
    Computing

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Cormode, G., Muthukrishnan, S., & Sahinalp, S. C. (2001). Permutation editing and matching via embeddings. In Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings (pp. 481-492). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2076 LNCS).

    Permutation editing and matching via embeddings. / Cormode, Graham; Muthukrishnan, Shanmugavelayutham; Sahinalp, Süleyman Cenk.

    Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings. 2001. p. 481-492 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2076 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Cormode, G, Muthukrishnan, S & Sahinalp, SC 2001, Permutation editing and matching via embeddings. in Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2076 LNCS, pp. 481-492, 28th International Colloquium on Automata, Languages and Programming, ICALP 2001, Crete, Greece, 7/8/01.
    Cormode G, Muthukrishnan S, Sahinalp SC. Permutation editing and matching via embeddings. In Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings. 2001. p. 481-492. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    Cormode, Graham ; Muthukrishnan, Shanmugavelayutham ; Sahinalp, Süleyman Cenk. / Permutation editing and matching via embeddings. Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings. 2001. pp. 481-492 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{ad08b22d85e745be9a25395b663be8a4,
    title = "Permutation editing and matching via embeddings",
    abstract = "If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to transform one such permutation to another can be used as a measure of their evolutionary distance. Motivated by such scenarios, we study problems of computing distances between permutations as well as matching permutations in sequences, and finding most similar permutation from a collection (nearest neighbor). We adopt a general approach: embed permutation distances of relevance into well-known vector spaces in an approximately distance-preserving manner, and solve the resulting problems on the well-known spaces. Our results are as follows: We present the first known approximately distance preserving embeddings of these permutation distances into well-known spaces. Using these embeddings, we obtain several results, including the first known ecient solution for approximately solving nearest neighbor problems with permutations and the first known algorithms for finding permutation distances in the data stream model. We consider a novel class of problems called permutation matching problems which are similar to string matching problems, except that the pattern is a permutation (rather than a string) and present linear or near-linear time algorithms for approximately solving permutation matching problems; in contrast, the corresponding string problems take significantly longer.",
    author = "Graham Cormode and Shanmugavelayutham Muthukrishnan and Sahinalp, {S{\"u}leyman Cenk}",
    year = "2001",
    month = "12",
    day = "1",
    language = "English (US)",
    isbn = "3540422870",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    pages = "481--492",
    booktitle = "Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings",

    }

    TY - GEN

    T1 - Permutation editing and matching via embeddings

    AU - Cormode, Graham

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Sahinalp, Süleyman Cenk

    PY - 2001/12/1

    Y1 - 2001/12/1

    N2 - If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to transform one such permutation to another can be used as a measure of their evolutionary distance. Motivated by such scenarios, we study problems of computing distances between permutations as well as matching permutations in sequences, and finding most similar permutation from a collection (nearest neighbor). We adopt a general approach: embed permutation distances of relevance into well-known vector spaces in an approximately distance-preserving manner, and solve the resulting problems on the well-known spaces. Our results are as follows: We present the first known approximately distance preserving embeddings of these permutation distances into well-known spaces. Using these embeddings, we obtain several results, including the first known ecient solution for approximately solving nearest neighbor problems with permutations and the first known algorithms for finding permutation distances in the data stream model. We consider a novel class of problems called permutation matching problems which are similar to string matching problems, except that the pattern is a permutation (rather than a string) and present linear or near-linear time algorithms for approximately solving permutation matching problems; in contrast, the corresponding string problems take significantly longer.

    AB - If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to transform one such permutation to another can be used as a measure of their evolutionary distance. Motivated by such scenarios, we study problems of computing distances between permutations as well as matching permutations in sequences, and finding most similar permutation from a collection (nearest neighbor). We adopt a general approach: embed permutation distances of relevance into well-known vector spaces in an approximately distance-preserving manner, and solve the resulting problems on the well-known spaces. Our results are as follows: We present the first known approximately distance preserving embeddings of these permutation distances into well-known spaces. Using these embeddings, we obtain several results, including the first known ecient solution for approximately solving nearest neighbor problems with permutations and the first known algorithms for finding permutation distances in the data stream model. We consider a novel class of problems called permutation matching problems which are similar to string matching problems, except that the pattern is a permutation (rather than a string) and present linear or near-linear time algorithms for approximately solving permutation matching problems; in contrast, the corresponding string problems take significantly longer.

    UR - http://www.scopus.com/inward/record.url?scp=84879509047&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84879509047&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:84879509047

    SN - 3540422870

    SN - 9783540422877

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 481

    EP - 492

    BT - Automata, Languages and Programming - 28th International Colloquium, ICALP 2001, Proceedings

    ER -