Fast Generalized Subset Scan for anomalous pattern detection

Edward McFowland, Skyler Speakman, Daniel Neill

Research output: Contribution to journalArticle

Abstract

We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets.

Original languageEnglish (US)
Pages (from-to)1533-1561
Number of pages29
JournalJournal of Machine Learning Research
Volume14
StatePublished - Jun 1 2013

Fingerprint

Set theory
Anomalous
Statistics
Subset
Intrusion detection
Scan Statistic
Monitoring
Network Intrusion Detection
Nominal or categorical data
Exhaustive Search
Multivariate Data
High-dimensional Data
Real-world Applications
Surveillance
Fast Algorithm
Attribute
Optimization
Evaluate
Demonstrate

Keywords

  • Anomaly detection
  • Bayesian networks
  • Knowledge discovery
  • Pattern detection
  • Scan statistics

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Fast Generalized Subset Scan for anomalous pattern detection. / McFowland, Edward; Speakman, Skyler; Neill, Daniel.

In: Journal of Machine Learning Research, Vol. 14, 01.06.2013, p. 1533-1561.

Research output: Contribution to journalArticle

McFowland, Edward ; Speakman, Skyler ; Neill, Daniel. / Fast Generalized Subset Scan for anomalous pattern detection. In: Journal of Machine Learning Research. 2013 ; Vol. 14. pp. 1533-1561.
@article{8b180ddae88d456e8086b3bfb7741b6f,
title = "Fast Generalized Subset Scan for anomalous pattern detection",
abstract = "We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets.",
keywords = "Anomaly detection, Bayesian networks, Knowledge discovery, Pattern detection, Scan statistics",
author = "Edward McFowland and Skyler Speakman and Daniel Neill",
year = "2013",
month = "6",
day = "1",
language = "English (US)",
volume = "14",
pages = "1533--1561",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Fast Generalized Subset Scan for anomalous pattern detection

AU - McFowland, Edward

AU - Speakman, Skyler

AU - Neill, Daniel

PY - 2013/6/1

Y1 - 2013/6/1

N2 - We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets.

AB - We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets.

KW - Anomaly detection

KW - Bayesian networks

KW - Knowledge discovery

KW - Pattern detection

KW - Scan statistics

UR - http://www.scopus.com/inward/record.url?scp=84880176997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880176997&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84880176997

VL - 14

SP - 1533

EP - 1561

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -