Pan-private algorithms via statistics on sketches

Darakhshan Mir, Shanmugavelayutham Muthukrishnan, Aleksandar Nikolov, Rebecca N. Wright

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious breakin or insider curiosity or by flat by the government or law). We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known panprivate algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.

    Original languageEnglish (US)
    Title of host publicationPODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems
    Pages37-48
    Number of pages12
    DOIs
    StatePublished - Jul 15 2011
    Event30th Symposium on Principles of Database Systems, PODS'11 - Athens, Greece
    Duration: May 13 2011May 15 2011

    Publication series

    NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

    Conference

    Conference30th Symposium on Principles of Database Systems, PODS'11
    CountryGreece
    CityAthens
    Period5/13/115/15/11

    Fingerprint

    Statistics
    Data storage equipment
    Decoding
    Dynamic models
    Processing

    Keywords

    • Differential privacy
    • Pan-privacy

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Cite this

    Mir, D., Muthukrishnan, S., Nikolov, A., & Wright, R. N. (2011). Pan-private algorithms via statistics on sketches. In PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems (pp. 37-48). (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems). https://doi.org/10.1145/1989284.1989290

    Pan-private algorithms via statistics on sketches. / Mir, Darakhshan; Muthukrishnan, Shanmugavelayutham; Nikolov, Aleksandar; Wright, Rebecca N.

    PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems. 2011. p. 37-48 (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Mir, D, Muthukrishnan, S, Nikolov, A & Wright, RN 2011, Pan-private algorithms via statistics on sketches. in PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 37-48, 30th Symposium on Principles of Database Systems, PODS'11, Athens, Greece, 5/13/11. https://doi.org/10.1145/1989284.1989290
    Mir D, Muthukrishnan S, Nikolov A, Wright RN. Pan-private algorithms via statistics on sketches. In PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems. 2011. p. 37-48. (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems). https://doi.org/10.1145/1989284.1989290
    Mir, Darakhshan ; Muthukrishnan, Shanmugavelayutham ; Nikolov, Aleksandar ; Wright, Rebecca N. / Pan-private algorithms via statistics on sketches. PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems. 2011. pp. 37-48 (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems).
    @inproceedings{25d81ef2b6d747b28b62e51466c36784,
    title = "Pan-private algorithms via statistics on sketches",
    abstract = "Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious breakin or insider curiosity or by flat by the government or law). We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known panprivate algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.",
    keywords = "Differential privacy, Pan-privacy",
    author = "Darakhshan Mir and Shanmugavelayutham Muthukrishnan and Aleksandar Nikolov and Wright, {Rebecca N.}",
    year = "2011",
    month = "7",
    day = "15",
    doi = "10.1145/1989284.1989290",
    language = "English (US)",
    isbn = "9781450306607",
    series = "Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems",
    pages = "37--48",
    booktitle = "PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems",

    }

    TY - GEN

    T1 - Pan-private algorithms via statistics on sketches

    AU - Mir, Darakhshan

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Nikolov, Aleksandar

    AU - Wright, Rebecca N.

    PY - 2011/7/15

    Y1 - 2011/7/15

    N2 - Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious breakin or insider curiosity or by flat by the government or law). We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known panprivate algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.

    AB - Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious breakin or insider curiosity or by flat by the government or law). We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known panprivate algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.

    KW - Differential privacy

    KW - Pan-privacy

    UR - http://www.scopus.com/inward/record.url?scp=79960182242&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79960182242&partnerID=8YFLogxK

    U2 - 10.1145/1989284.1989290

    DO - 10.1145/1989284.1989290

    M3 - Conference contribution

    AN - SCOPUS:79960182242

    SN - 9781450306607

    T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

    SP - 37

    EP - 48

    BT - PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems

    ER -