Periodicity testing with sublinear samples and space

Funda Ergun, Shanmugavelayutham Muthukrishnan, Cenk Sahinalp

    Research output: Contribution to journalArticle

    Abstract

    In this work, we are interested in periodic trends in long data streams in the presence of computational constraints. To this end; we present algorithms for discovering periodic trends in the combinatorial property testing model in a data stream S of length n using o(n) samples and space. In accordance with the property testing model, we first explore the notion of being close to periodic by defining three different notions of self-distance through relaxing different notions of exact periodicity. An input S is then called approximately periodic if it exhibits a small self-distance (with respect to any one self-distance defined). We show that even though the different definitions of exact periodicity are equivalent, the resulting definitions of self-distance and approximate periodicity are not; we also show that these self-distances are constant approximations of each other. Afterwards, we present algorithms which distinguish between the two cases where S is exactly periodic and S is far from periodic with only a constant probability of error. Our algorithms sample only O(nlog2 n) (or O(nlog4 n), depending on the self-distance) positions and use as much space. They can also find, using o(n) samples and space, the largest/smallest period, and/or all of the approximate periods of S. These algorithms can also be viewed as working on streaming inputs where each data item is seen once and in order, storing only a sublinear (O(nlog 2 n) or O(nlog4 n)) size sample from which periodicities are identified.

    Original languageEnglish (US)
    Article number43
    JournalACM Transactions on Algorithms
    Volume6
    Issue number2
    DOIs
    StatePublished - Mar 1 2010

    Fingerprint

    Periodicity
    Testing
    Property Testing
    Data Streams
    Streaming
    Sample Size
    Approximation
    Model
    Trends

    Keywords

    • Combinatorial property testing
    • Periodicity

    ASJC Scopus subject areas

    • Mathematics (miscellaneous)

    Cite this

    Periodicity testing with sublinear samples and space. / Ergun, Funda; Muthukrishnan, Shanmugavelayutham; Sahinalp, Cenk.

    In: ACM Transactions on Algorithms, Vol. 6, No. 2, 43, 01.03.2010.

    Research output: Contribution to journalArticle

    Ergun, Funda ; Muthukrishnan, Shanmugavelayutham ; Sahinalp, Cenk. / Periodicity testing with sublinear samples and space. In: ACM Transactions on Algorithms. 2010 ; Vol. 6, No. 2.
    @article{b2c778bcf75a4ee19fc5efe889e489f2,
    title = "Periodicity testing with sublinear samples and space",
    abstract = "In this work, we are interested in periodic trends in long data streams in the presence of computational constraints. To this end; we present algorithms for discovering periodic trends in the combinatorial property testing model in a data stream S of length n using o(n) samples and space. In accordance with the property testing model, we first explore the notion of being close to periodic by defining three different notions of self-distance through relaxing different notions of exact periodicity. An input S is then called approximately periodic if it exhibits a small self-distance (with respect to any one self-distance defined). We show that even though the different definitions of exact periodicity are equivalent, the resulting definitions of self-distance and approximate periodicity are not; we also show that these self-distances are constant approximations of each other. Afterwards, we present algorithms which distinguish between the two cases where S is exactly periodic and S is far from periodic with only a constant probability of error. Our algorithms sample only O(nlog2 n) (or O(nlog4 n), depending on the self-distance) positions and use as much space. They can also find, using o(n) samples and space, the largest/smallest period, and/or all of the approximate periods of S. These algorithms can also be viewed as working on streaming inputs where each data item is seen once and in order, storing only a sublinear (O(nlog 2 n) or O(nlog4 n)) size sample from which periodicities are identified.",
    keywords = "Combinatorial property testing, Periodicity",
    author = "Funda Ergun and Shanmugavelayutham Muthukrishnan and Cenk Sahinalp",
    year = "2010",
    month = "3",
    day = "1",
    doi = "10.1145/1721837.1721859",
    language = "English (US)",
    volume = "6",
    journal = "ACM Transactions on Algorithms",
    issn = "1549-6325",
    publisher = "Association for Computing Machinery (ACM)",
    number = "2",

    }

    TY - JOUR

    T1 - Periodicity testing with sublinear samples and space

    AU - Ergun, Funda

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Sahinalp, Cenk

    PY - 2010/3/1

    Y1 - 2010/3/1

    N2 - In this work, we are interested in periodic trends in long data streams in the presence of computational constraints. To this end; we present algorithms for discovering periodic trends in the combinatorial property testing model in a data stream S of length n using o(n) samples and space. In accordance with the property testing model, we first explore the notion of being close to periodic by defining three different notions of self-distance through relaxing different notions of exact periodicity. An input S is then called approximately periodic if it exhibits a small self-distance (with respect to any one self-distance defined). We show that even though the different definitions of exact periodicity are equivalent, the resulting definitions of self-distance and approximate periodicity are not; we also show that these self-distances are constant approximations of each other. Afterwards, we present algorithms which distinguish between the two cases where S is exactly periodic and S is far from periodic with only a constant probability of error. Our algorithms sample only O(nlog2 n) (or O(nlog4 n), depending on the self-distance) positions and use as much space. They can also find, using o(n) samples and space, the largest/smallest period, and/or all of the approximate periods of S. These algorithms can also be viewed as working on streaming inputs where each data item is seen once and in order, storing only a sublinear (O(nlog 2 n) or O(nlog4 n)) size sample from which periodicities are identified.

    AB - In this work, we are interested in periodic trends in long data streams in the presence of computational constraints. To this end; we present algorithms for discovering periodic trends in the combinatorial property testing model in a data stream S of length n using o(n) samples and space. In accordance with the property testing model, we first explore the notion of being close to periodic by defining three different notions of self-distance through relaxing different notions of exact periodicity. An input S is then called approximately periodic if it exhibits a small self-distance (with respect to any one self-distance defined). We show that even though the different definitions of exact periodicity are equivalent, the resulting definitions of self-distance and approximate periodicity are not; we also show that these self-distances are constant approximations of each other. Afterwards, we present algorithms which distinguish between the two cases where S is exactly periodic and S is far from periodic with only a constant probability of error. Our algorithms sample only O(nlog2 n) (or O(nlog4 n), depending on the self-distance) positions and use as much space. They can also find, using o(n) samples and space, the largest/smallest period, and/or all of the approximate periods of S. These algorithms can also be viewed as working on streaming inputs where each data item is seen once and in order, storing only a sublinear (O(nlog 2 n) or O(nlog4 n)) size sample from which periodicities are identified.

    KW - Combinatorial property testing

    KW - Periodicity

    UR - http://www.scopus.com/inward/record.url?scp=77950813642&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=77950813642&partnerID=8YFLogxK

    U2 - 10.1145/1721837.1721859

    DO - 10.1145/1721837.1721859

    M3 - Article

    VL - 6

    JO - ACM Transactions on Algorithms

    JF - ACM Transactions on Algorithms

    SN - 1549-6325

    IS - 2

    M1 - 43

    ER -