Fast elastic peak detection for mass spectrometry data mining

Xin Zhang, Dennis E. Shasha, Yang Song, Jason T L Wang

Research output: Contribution to journalArticle

Abstract

We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.

Original languageEnglish (US)
Article number5645627
Pages (from-to)634-648
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume24
Issue number4
DOIs
StatePublished - 2012

Fingerprint

Liquid chromatography
Mass spectrometry
Data mining
Data structures
Time series
Agglomeration
Topology

Keywords

  • algorithms and data structures
  • bioinformatics
  • computational proteomics
  • Knowledge discovery from LC-MS data
  • time series data mining

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Cite this

Fast elastic peak detection for mass spectrometry data mining. / Zhang, Xin; Shasha, Dennis E.; Song, Yang; Wang, Jason T L.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 4, 5645627, 2012, p. 634-648.

Research output: Contribution to journalArticle

Zhang, Xin ; Shasha, Dennis E. ; Song, Yang ; Wang, Jason T L. / Fast elastic peak detection for mass spectrometry data mining. In: IEEE Transactions on Knowledge and Data Engineering. 2012 ; Vol. 24, No. 4. pp. 634-648.
@article{0ca2244e4da14c78a0e70ddd3d97656c,
title = "Fast elastic peak detection for mass spectrometry data mining",
abstract = "We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.",
keywords = "algorithms and data structures, bioinformatics, computational proteomics, Knowledge discovery from LC-MS data, time series data mining",
author = "Xin Zhang and Shasha, {Dennis E.} and Yang Song and Wang, {Jason T L}",
year = "2012",
doi = "10.1109/TKDE.2010.238",
language = "English (US)",
volume = "24",
pages = "634--648",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Fast elastic peak detection for mass spectrometry data mining

AU - Zhang, Xin

AU - Shasha, Dennis E.

AU - Song, Yang

AU - Wang, Jason T L

PY - 2012

Y1 - 2012

N2 - We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.

AB - We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.

KW - algorithms and data structures

KW - bioinformatics

KW - computational proteomics

KW - Knowledge discovery from LC-MS data

KW - time series data mining

UR - http://www.scopus.com/inward/record.url?scp=84863259465&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863259465&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2010.238

DO - 10.1109/TKDE.2010.238

M3 - Article

VL - 24

SP - 634

EP - 648

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

M1 - 5645627

ER -