Fast subset scan for multivariate event detection

Daniel Neill, Edward Mcfowland, Huanian Zheng

Research output: Contribution to journalArticle

Abstract

We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. For two variants of the multivariate subset scan, we demonstrate that the scan statistic can be efficiently optimized over proximity-constrained subsets of locations and over all subsets of the monitored data streams, enabling timely detection of emerging events and accurate characterization of the affected locations and streams. Using our new fast search algorithms, we perform an empirical comparison of the Subset Aggregation and Kulldorff multivariate subset scans on synthetic data and real-world disease surveillance tasks, demonstrating tradeoffs between the detection and characterization performance of the two methods.

Original languageEnglish (US)
Pages (from-to)2185-2208
Number of pages24
JournalStatistics in Medicine
Volume32
Issue number13
DOIs
StatePublished - Jun 15 2013

Fingerprint

Event Detection
Subset
Data Streams
Space-time
Scan Statistic
Multivariate Data
Synthetic Data
Surveillance
Fast Algorithm
Proximity
Search Algorithm
Univariate
Irregular
Aggregation
Trade-offs
Datasets
Demonstrate

Keywords

  • Algorithms
  • Disease surveillance
  • Event detection
  • Scan statistics
  • Spatial scan

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Fast subset scan for multivariate event detection. / Neill, Daniel; Mcfowland, Edward; Zheng, Huanian.

In: Statistics in Medicine, Vol. 32, No. 13, 15.06.2013, p. 2185-2208.

Research output: Contribution to journalArticle

Neill, Daniel ; Mcfowland, Edward ; Zheng, Huanian. / Fast subset scan for multivariate event detection. In: Statistics in Medicine. 2013 ; Vol. 32, No. 13. pp. 2185-2208.
@article{ed18d6c84f4b41ea82356f9113fa71d6,
title = "Fast subset scan for multivariate event detection",
abstract = "We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. For two variants of the multivariate subset scan, we demonstrate that the scan statistic can be efficiently optimized over proximity-constrained subsets of locations and over all subsets of the monitored data streams, enabling timely detection of emerging events and accurate characterization of the affected locations and streams. Using our new fast search algorithms, we perform an empirical comparison of the Subset Aggregation and Kulldorff multivariate subset scans on synthetic data and real-world disease surveillance tasks, demonstrating tradeoffs between the detection and characterization performance of the two methods.",
keywords = "Algorithms, Disease surveillance, Event detection, Scan statistics, Spatial scan",
author = "Daniel Neill and Edward Mcfowland and Huanian Zheng",
year = "2013",
month = "6",
day = "15",
doi = "10.1002/sim.5675",
language = "English (US)",
volume = "32",
pages = "2185--2208",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "13",

}

TY - JOUR

T1 - Fast subset scan for multivariate event detection

AU - Neill, Daniel

AU - Mcfowland, Edward

AU - Zheng, Huanian

PY - 2013/6/15

Y1 - 2013/6/15

N2 - We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. For two variants of the multivariate subset scan, we demonstrate that the scan statistic can be efficiently optimized over proximity-constrained subsets of locations and over all subsets of the monitored data streams, enabling timely detection of emerging events and accurate characterization of the affected locations and streams. Using our new fast search algorithms, we perform an empirical comparison of the Subset Aggregation and Kulldorff multivariate subset scans on synthetic data and real-world disease surveillance tasks, demonstrating tradeoffs between the detection and characterization performance of the two methods.

AB - We present new subset scan methods for multivariate event detection in massive space-time datasets. We extend the recently proposed 'fast subset scan' framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. For two variants of the multivariate subset scan, we demonstrate that the scan statistic can be efficiently optimized over proximity-constrained subsets of locations and over all subsets of the monitored data streams, enabling timely detection of emerging events and accurate characterization of the affected locations and streams. Using our new fast search algorithms, we perform an empirical comparison of the Subset Aggregation and Kulldorff multivariate subset scans on synthetic data and real-world disease surveillance tasks, demonstrating tradeoffs between the detection and characterization performance of the two methods.

KW - Algorithms

KW - Disease surveillance

KW - Event detection

KW - Scan statistics

KW - Spatial scan

UR - http://www.scopus.com/inward/record.url?scp=84877634608&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877634608&partnerID=8YFLogxK

U2 - 10.1002/sim.5675

DO - 10.1002/sim.5675

M3 - Article

VL - 32

SP - 2185

EP - 2208

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 13

ER -