Density estimation with adaptive sparse grids for large data sets

Benjamin Peherstorfer, Dirk Pflüger, Hans Joachim Bungartz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Nonparametric density estimation is a fundamental problem of statistics and data mining. Even though kernel density estimation is the most widely used method, its performance highly depends on the choice of the kernel bandwidth, and it can become computationally expensive for large data sets. WTe present an adaptive sparse-grid-based density estimation method which discretizes the estimated density function on basis functions centered at grid points rather than on kernels centered at the data points. Thus, the costs of evaluating the estimated density function are independent from the number of data points. We give details on how to estimate density functions on sparse grids and develop a cross validation technique for the parameter selection. We show numerical results to confirm that our sparse-grid-based method is well-suited for large data sets, and, finally, employ our method for the classification of astronomical objects to demonstrate that it is competitive to current kernel-based density estimation approaches with respect to classification accuracy and runtime. Copyright

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Pages443-451
Number of pages9
Volume1
ISBN (Electronic)9781510811515
DOIs
StatePublished - Jan 1 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Other

Other14th SIAM International Conference on Data Mining, SDM 2014
CountryUnited States
CityPhiladelphia
Period4/24/144/26/14

Fingerprint

Probability density function
Data mining
Statistics
Bandwidth
Costs

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Peherstorfer, B., Pflüger, D., & Bungartz, H. J. (2014). Density estimation with adaptive sparse grids for large data sets. In M. J. Zaki, A. Banerjee, S. Parthasarathy, P. Ning-Tan, Z. Obradovic, & C. Kamath (Eds.), SIAM International Conference on Data Mining 2014, SDM 2014 (Vol. 1, pp. 443-451). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611973440.51

Density estimation with adaptive sparse grids for large data sets. / Peherstorfer, Benjamin; Pflüger, Dirk; Bungartz, Hans Joachim.

SIAM International Conference on Data Mining 2014, SDM 2014. ed. / Mohammed J. Zaki; Arindam Banerjee; Srinivasan Parthasarathy; Pang Ning-Tan; Zoran Obradovic; Chandrika Kamath. Vol. 1 Society for Industrial and Applied Mathematics Publications, 2014. p. 443-451.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Peherstorfer, B, Pflüger, D & Bungartz, HJ 2014, Density estimation with adaptive sparse grids for large data sets. in MJ Zaki, A Banerjee, S Parthasarathy, P Ning-Tan, Z Obradovic & C Kamath (eds), SIAM International Conference on Data Mining 2014, SDM 2014. vol. 1, Society for Industrial and Applied Mathematics Publications, pp. 443-451, 14th SIAM International Conference on Data Mining, SDM 2014, Philadelphia, United States, 4/24/14. https://doi.org/10.1137/1.9781611973440.51
Peherstorfer B, Pflüger D, Bungartz HJ. Density estimation with adaptive sparse grids for large data sets. In Zaki MJ, Banerjee A, Parthasarathy S, Ning-Tan P, Obradovic Z, Kamath C, editors, SIAM International Conference on Data Mining 2014, SDM 2014. Vol. 1. Society for Industrial and Applied Mathematics Publications. 2014. p. 443-451 https://doi.org/10.1137/1.9781611973440.51
Peherstorfer, Benjamin ; Pflüger, Dirk ; Bungartz, Hans Joachim. / Density estimation with adaptive sparse grids for large data sets. SIAM International Conference on Data Mining 2014, SDM 2014. editor / Mohammed J. Zaki ; Arindam Banerjee ; Srinivasan Parthasarathy ; Pang Ning-Tan ; Zoran Obradovic ; Chandrika Kamath. Vol. 1 Society for Industrial and Applied Mathematics Publications, 2014. pp. 443-451
@inproceedings{52437209871d4b2e84aed04c22275f2d,
title = "Density estimation with adaptive sparse grids for large data sets",
abstract = "Nonparametric density estimation is a fundamental problem of statistics and data mining. Even though kernel density estimation is the most widely used method, its performance highly depends on the choice of the kernel bandwidth, and it can become computationally expensive for large data sets. WTe present an adaptive sparse-grid-based density estimation method which discretizes the estimated density function on basis functions centered at grid points rather than on kernels centered at the data points. Thus, the costs of evaluating the estimated density function are independent from the number of data points. We give details on how to estimate density functions on sparse grids and develop a cross validation technique for the parameter selection. We show numerical results to confirm that our sparse-grid-based method is well-suited for large data sets, and, finally, employ our method for the classification of astronomical objects to demonstrate that it is competitive to current kernel-based density estimation approaches with respect to classification accuracy and runtime. Copyright",
author = "Benjamin Peherstorfer and Dirk Pfl{\"u}ger and Bungartz, {Hans Joachim}",
year = "2014",
month = "1",
day = "1",
doi = "10.1137/1.9781611973440.51",
language = "English (US)",
volume = "1",
pages = "443--451",
editor = "Zaki, {Mohammed J.} and Arindam Banerjee and Srinivasan Parthasarathy and Pang Ning-Tan and Zoran Obradovic and Chandrika Kamath",
booktitle = "SIAM International Conference on Data Mining 2014, SDM 2014",
publisher = "Society for Industrial and Applied Mathematics Publications",

}

TY - GEN

T1 - Density estimation with adaptive sparse grids for large data sets

AU - Peherstorfer, Benjamin

AU - Pflüger, Dirk

AU - Bungartz, Hans Joachim

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Nonparametric density estimation is a fundamental problem of statistics and data mining. Even though kernel density estimation is the most widely used method, its performance highly depends on the choice of the kernel bandwidth, and it can become computationally expensive for large data sets. WTe present an adaptive sparse-grid-based density estimation method which discretizes the estimated density function on basis functions centered at grid points rather than on kernels centered at the data points. Thus, the costs of evaluating the estimated density function are independent from the number of data points. We give details on how to estimate density functions on sparse grids and develop a cross validation technique for the parameter selection. We show numerical results to confirm that our sparse-grid-based method is well-suited for large data sets, and, finally, employ our method for the classification of astronomical objects to demonstrate that it is competitive to current kernel-based density estimation approaches with respect to classification accuracy and runtime. Copyright

AB - Nonparametric density estimation is a fundamental problem of statistics and data mining. Even though kernel density estimation is the most widely used method, its performance highly depends on the choice of the kernel bandwidth, and it can become computationally expensive for large data sets. WTe present an adaptive sparse-grid-based density estimation method which discretizes the estimated density function on basis functions centered at grid points rather than on kernels centered at the data points. Thus, the costs of evaluating the estimated density function are independent from the number of data points. We give details on how to estimate density functions on sparse grids and develop a cross validation technique for the parameter selection. We show numerical results to confirm that our sparse-grid-based method is well-suited for large data sets, and, finally, employ our method for the classification of astronomical objects to demonstrate that it is competitive to current kernel-based density estimation approaches with respect to classification accuracy and runtime. Copyright

UR - http://www.scopus.com/inward/record.url?scp=84921664859&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921664859&partnerID=8YFLogxK

U2 - 10.1137/1.9781611973440.51

DO - 10.1137/1.9781611973440.51

M3 - Conference contribution

VL - 1

SP - 443

EP - 451

BT - SIAM International Conference on Data Mining 2014, SDM 2014

A2 - Zaki, Mohammed J.

A2 - Banerjee, Arindam

A2 - Parthasarathy, Srinivasan

A2 - Ning-Tan, Pang

A2 - Obradovic, Zoran

A2 - Kamath, Chandrika

PB - Society for Industrial and Applied Mathematics Publications

ER -