Vaccine

Using contextual integrity for data leakage detection

Yan Shvartzshnaider, Thomas Wies, Zvonimir Pavlinovic, Lakshminarayanan, Prateek Mittal, Ananth Balashankar, Helen Nissenbaum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern enterprises rely on Data Leakage Prevention (DLP) systems to enforce privacy policies that prevent unintentional flow of sensitive information to unauthorized entities. However, these systems operate based on rule sets that are limited to syntactic analysis and therefore completely ignore the semantic relationships between participants involved in the information exchanges. For similar reasons, these systems cannot enforce complex privacy policies that require temporal reasoning about events that have previously occurred. To address these limitations, we advocate a new design methodology for DLP systems centered on the notion of Contextual Integrity (CI). We use the CI framework to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI allows us to decouple (1) the syntactic extraction of flows from information exchanges, and (2) the enforcement of privacy policies on these flows. We applied this approach to built VACCINE, a DLP auditing system for emails. VACCINE uses state-of-the-art techniques in natural language processing to extract flows from email text. It also provides a declarative language for describing privacy policies. These policies are automatically compiled to operational rules that the system uses for detecting data leakages. We evaluated VACCINE on the Enron email corpus and show that it improves over the state of the art both in terms of the expressivity of the policies that DLP systems can enforce as well as its precision in detecting data leakages.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019
PublisherAssociation for Computing Machinery, Inc
Pages1702-1712
Number of pages11
ISBN (Electronic)9781450366748
DOIs
StatePublished - May 13 2019
Event2019 World Wide Web Conference, WWW 2019 - San Francisco, United States
Duration: May 13 2019May 17 2019

Publication series

NameThe Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019

Conference

Conference2019 World Wide Web Conference, WWW 2019
CountryUnited States
CitySan Francisco
Period5/13/195/17/19

Fingerprint

Vaccines
Electronic mail
Syntactics
Ion exchange
Semantics
Communication
Processing
Industry

Keywords

  • Contextual Integrity
  • Data Leakage Detection
  • DLP
  • Privacy

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Shvartzshnaider, Y., Wies, T., Pavlinovic, Z., Lakshminarayanan, Mittal, P., Balashankar, A., & Nissenbaum, H. (2019). Vaccine: Using contextual integrity for data leakage detection. In The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 (pp. 1702-1712). (The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308558.3313655

Vaccine : Using contextual integrity for data leakage detection. / Shvartzshnaider, Yan; Wies, Thomas; Pavlinovic, Zvonimir; Lakshminarayanan; Mittal, Prateek; Balashankar, Ananth; Nissenbaum, Helen.

The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. p. 1702-1712 (The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shvartzshnaider, Y, Wies, T, Pavlinovic, Z, Lakshminarayanan, Mittal, P, Balashankar, A & Nissenbaum, H 2019, Vaccine: Using contextual integrity for data leakage detection. in The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019, Association for Computing Machinery, Inc, pp. 1702-1712, 2019 World Wide Web Conference, WWW 2019, San Francisco, United States, 5/13/19. https://doi.org/10.1145/3308558.3313655
Shvartzshnaider Y, Wies T, Pavlinovic Z, Lakshminarayanan, Mittal P, Balashankar A et al. Vaccine: Using contextual integrity for data leakage detection. In The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc. 2019. p. 1702-1712. (The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019). https://doi.org/10.1145/3308558.3313655
Shvartzshnaider, Yan ; Wies, Thomas ; Pavlinovic, Zvonimir ; Lakshminarayanan ; Mittal, Prateek ; Balashankar, Ananth ; Nissenbaum, Helen. / Vaccine : Using contextual integrity for data leakage detection. The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. pp. 1702-1712 (The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019).
@inproceedings{2c28fc347e604f689e50c015926b19ef,
title = "Vaccine: Using contextual integrity for data leakage detection",
abstract = "Modern enterprises rely on Data Leakage Prevention (DLP) systems to enforce privacy policies that prevent unintentional flow of sensitive information to unauthorized entities. However, these systems operate based on rule sets that are limited to syntactic analysis and therefore completely ignore the semantic relationships between participants involved in the information exchanges. For similar reasons, these systems cannot enforce complex privacy policies that require temporal reasoning about events that have previously occurred. To address these limitations, we advocate a new design methodology for DLP systems centered on the notion of Contextual Integrity (CI). We use the CI framework to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI allows us to decouple (1) the syntactic extraction of flows from information exchanges, and (2) the enforcement of privacy policies on these flows. We applied this approach to built VACCINE, a DLP auditing system for emails. VACCINE uses state-of-the-art techniques in natural language processing to extract flows from email text. It also provides a declarative language for describing privacy policies. These policies are automatically compiled to operational rules that the system uses for detecting data leakages. We evaluated VACCINE on the Enron email corpus and show that it improves over the state of the art both in terms of the expressivity of the policies that DLP systems can enforce as well as its precision in detecting data leakages.",
keywords = "Contextual Integrity, Data Leakage Detection, DLP, Privacy",
author = "Yan Shvartzshnaider and Thomas Wies and Zvonimir Pavlinovic and Lakshminarayanan and Prateek Mittal and Ananth Balashankar and Helen Nissenbaum",
year = "2019",
month = "5",
day = "13",
doi = "10.1145/3308558.3313655",
language = "English (US)",
series = "The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019",
publisher = "Association for Computing Machinery, Inc",
pages = "1702--1712",
booktitle = "The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019",

}

TY - GEN

T1 - Vaccine

T2 - Using contextual integrity for data leakage detection

AU - Shvartzshnaider, Yan

AU - Wies, Thomas

AU - Pavlinovic, Zvonimir

AU - Lakshminarayanan,

AU - Mittal, Prateek

AU - Balashankar, Ananth

AU - Nissenbaum, Helen

PY - 2019/5/13

Y1 - 2019/5/13

N2 - Modern enterprises rely on Data Leakage Prevention (DLP) systems to enforce privacy policies that prevent unintentional flow of sensitive information to unauthorized entities. However, these systems operate based on rule sets that are limited to syntactic analysis and therefore completely ignore the semantic relationships between participants involved in the information exchanges. For similar reasons, these systems cannot enforce complex privacy policies that require temporal reasoning about events that have previously occurred. To address these limitations, we advocate a new design methodology for DLP systems centered on the notion of Contextual Integrity (CI). We use the CI framework to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI allows us to decouple (1) the syntactic extraction of flows from information exchanges, and (2) the enforcement of privacy policies on these flows. We applied this approach to built VACCINE, a DLP auditing system for emails. VACCINE uses state-of-the-art techniques in natural language processing to extract flows from email text. It also provides a declarative language for describing privacy policies. These policies are automatically compiled to operational rules that the system uses for detecting data leakages. We evaluated VACCINE on the Enron email corpus and show that it improves over the state of the art both in terms of the expressivity of the policies that DLP systems can enforce as well as its precision in detecting data leakages.

AB - Modern enterprises rely on Data Leakage Prevention (DLP) systems to enforce privacy policies that prevent unintentional flow of sensitive information to unauthorized entities. However, these systems operate based on rule sets that are limited to syntactic analysis and therefore completely ignore the semantic relationships between participants involved in the information exchanges. For similar reasons, these systems cannot enforce complex privacy policies that require temporal reasoning about events that have previously occurred. To address these limitations, we advocate a new design methodology for DLP systems centered on the notion of Contextual Integrity (CI). We use the CI framework to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI allows us to decouple (1) the syntactic extraction of flows from information exchanges, and (2) the enforcement of privacy policies on these flows. We applied this approach to built VACCINE, a DLP auditing system for emails. VACCINE uses state-of-the-art techniques in natural language processing to extract flows from email text. It also provides a declarative language for describing privacy policies. These policies are automatically compiled to operational rules that the system uses for detecting data leakages. We evaluated VACCINE on the Enron email corpus and show that it improves over the state of the art both in terms of the expressivity of the policies that DLP systems can enforce as well as its precision in detecting data leakages.

KW - Contextual Integrity

KW - Data Leakage Detection

KW - DLP

KW - Privacy

UR - http://www.scopus.com/inward/record.url?scp=85066901764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066901764&partnerID=8YFLogxK

U2 - 10.1145/3308558.3313655

DO - 10.1145/3308558.3313655

M3 - Conference contribution

T3 - The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019

SP - 1702

EP - 1712

BT - The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019

PB - Association for Computing Machinery, Inc

ER -