Anomaly pattern detection in categorical datasets

Kaustav Das, Jeff Schneider, Daniel Neill

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages169-176
Number of pages8
DOIs
StatePublished - Dec 1 2008
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
CountryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

Fingerprint

Detectors
Freight transportation
Set theory
Containers
Testing

Keywords

  • Anomaly Detection
  • Machine Learning
  • Pattern Detection

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Das, K., Schneider, J., & Neill, D. (2008). Anomaly pattern detection in categorical datasets. In KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining (pp. 169-176) https://doi.org/10.1145/1401890.1401915

Anomaly pattern detection in categorical datasets. / Das, Kaustav; Schneider, Jeff; Neill, Daniel.

KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 169-176.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Das, K, Schneider, J & Neill, D 2008, Anomaly pattern detection in categorical datasets. in KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. pp. 169-176, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, NV, United States, 8/24/08. https://doi.org/10.1145/1401890.1401915
Das K, Schneider J, Neill D. Anomaly pattern detection in categorical datasets. In KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 169-176 https://doi.org/10.1145/1401890.1401915
Das, Kaustav ; Schneider, Jeff ; Neill, Daniel. / Anomaly pattern detection in categorical datasets. KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. pp. 169-176
@inproceedings{ff29657257fe4c618f1d783f31ce5edb,
title = "Anomaly pattern detection in categorical datasets",
abstract = "We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a {"}local anomaly detector{"} to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.",
keywords = "Anomaly Detection, Machine Learning, Pattern Detection",
author = "Kaustav Das and Jeff Schneider and Daniel Neill",
year = "2008",
month = "12",
day = "1",
doi = "10.1145/1401890.1401915",
language = "English (US)",
isbn = "9781605581934",
pages = "169--176",
booktitle = "KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Anomaly pattern detection in categorical datasets

AU - Das, Kaustav

AU - Schneider, Jeff

AU - Neill, Daniel

PY - 2008/12/1

Y1 - 2008/12/1

N2 - We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.

AB - We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.

KW - Anomaly Detection

KW - Machine Learning

KW - Pattern Detection

UR - http://www.scopus.com/inward/record.url?scp=65449143380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449143380&partnerID=8YFLogxK

U2 - 10.1145/1401890.1401915

DO - 10.1145/1401890.1401915

M3 - Conference contribution

SN - 9781605581934

SP - 169

EP - 176

BT - KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining

ER -