A statistical perspective of sampling scores for linear regression

Siheng Chen, Rohan Varma, Aarti Singh, Jelena Kovacevic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we consider a statistical problem of learning a linear model from noisy samples. Existing work has focused on approximating the least squares solution by using leverage-based scores as an importance sampling distribution. However, no finite sample statistical guarantees and no computationally efficient optimal sampling strategies have been proposed. To evaluate the statistical properties of different sampling strategies, we propose a simple yet effective estimator, which is easy for theoretical analysis and is useful in multitask linear regression. We derive the exact mean square error of the proposed estimator for any given sampling scores. Based on minimizing the mean square error, we propose the optimal sampling scores for both estimator and predictor, and show that they are influenced by the noise-to-signal ratio. Numerical simulations match the theoretical analysis well.

Original languageEnglish (US)
Title of host publicationProceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1556-1560
Number of pages5
Volume2016-August
ISBN (Electronic)9781509018062
DOIs
StatePublished - Aug 10 2016
Event2016 IEEE International Symposium on Information Theory, ISIT 2016 - Barcelona, Spain
Duration: Jul 10 2016Jul 15 2016

Other

Other2016 IEEE International Symposium on Information Theory, ISIT 2016
CountrySpain
CityBarcelona
Period7/10/167/15/16

Fingerprint

Linear regression
Sampling Strategy
Sampling
Estimator
Mean square error
Theoretical Analysis
Least-squares Solution
Sampling Distribution
Importance Sampling
Optimal Strategy
Leverage
Importance sampling
Statistical property
Predictors
Linear Model
Numerical Simulation
Signal to noise ratio
Evaluate
Computer simulation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Cite this

Chen, S., Varma, R., Singh, A., & Kovacevic, J. (2016). A statistical perspective of sampling scores for linear regression. In Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory (Vol. 2016-August, pp. 1556-1560). [7541560] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISIT.2016.7541560

A statistical perspective of sampling scores for linear regression. / Chen, Siheng; Varma, Rohan; Singh, Aarti; Kovacevic, Jelena.

Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory. Vol. 2016-August Institute of Electrical and Electronics Engineers Inc., 2016. p. 1556-1560 7541560.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, S, Varma, R, Singh, A & Kovacevic, J 2016, A statistical perspective of sampling scores for linear regression. in Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory. vol. 2016-August, 7541560, Institute of Electrical and Electronics Engineers Inc., pp. 1556-1560, 2016 IEEE International Symposium on Information Theory, ISIT 2016, Barcelona, Spain, 7/10/16. https://doi.org/10.1109/ISIT.2016.7541560
Chen S, Varma R, Singh A, Kovacevic J. A statistical perspective of sampling scores for linear regression. In Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory. Vol. 2016-August. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1556-1560. 7541560 https://doi.org/10.1109/ISIT.2016.7541560
Chen, Siheng ; Varma, Rohan ; Singh, Aarti ; Kovacevic, Jelena. / A statistical perspective of sampling scores for linear regression. Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory. Vol. 2016-August Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1556-1560
@inproceedings{224a24a7441a40e783fa270c4d830b16,
title = "A statistical perspective of sampling scores for linear regression",
abstract = "In this paper, we consider a statistical problem of learning a linear model from noisy samples. Existing work has focused on approximating the least squares solution by using leverage-based scores as an importance sampling distribution. However, no finite sample statistical guarantees and no computationally efficient optimal sampling strategies have been proposed. To evaluate the statistical properties of different sampling strategies, we propose a simple yet effective estimator, which is easy for theoretical analysis and is useful in multitask linear regression. We derive the exact mean square error of the proposed estimator for any given sampling scores. Based on minimizing the mean square error, we propose the optimal sampling scores for both estimator and predictor, and show that they are influenced by the noise-to-signal ratio. Numerical simulations match the theoretical analysis well.",
author = "Siheng Chen and Rohan Varma and Aarti Singh and Jelena Kovacevic",
year = "2016",
month = "8",
day = "10",
doi = "10.1109/ISIT.2016.7541560",
language = "English (US)",
volume = "2016-August",
pages = "1556--1560",
booktitle = "Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A statistical perspective of sampling scores for linear regression

AU - Chen, Siheng

AU - Varma, Rohan

AU - Singh, Aarti

AU - Kovacevic, Jelena

PY - 2016/8/10

Y1 - 2016/8/10

N2 - In this paper, we consider a statistical problem of learning a linear model from noisy samples. Existing work has focused on approximating the least squares solution by using leverage-based scores as an importance sampling distribution. However, no finite sample statistical guarantees and no computationally efficient optimal sampling strategies have been proposed. To evaluate the statistical properties of different sampling strategies, we propose a simple yet effective estimator, which is easy for theoretical analysis and is useful in multitask linear regression. We derive the exact mean square error of the proposed estimator for any given sampling scores. Based on minimizing the mean square error, we propose the optimal sampling scores for both estimator and predictor, and show that they are influenced by the noise-to-signal ratio. Numerical simulations match the theoretical analysis well.

AB - In this paper, we consider a statistical problem of learning a linear model from noisy samples. Existing work has focused on approximating the least squares solution by using leverage-based scores as an importance sampling distribution. However, no finite sample statistical guarantees and no computationally efficient optimal sampling strategies have been proposed. To evaluate the statistical properties of different sampling strategies, we propose a simple yet effective estimator, which is easy for theoretical analysis and is useful in multitask linear regression. We derive the exact mean square error of the proposed estimator for any given sampling scores. Based on minimizing the mean square error, we propose the optimal sampling scores for both estimator and predictor, and show that they are influenced by the noise-to-signal ratio. Numerical simulations match the theoretical analysis well.

UR - http://www.scopus.com/inward/record.url?scp=84986000858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986000858&partnerID=8YFLogxK

U2 - 10.1109/ISIT.2016.7541560

DO - 10.1109/ISIT.2016.7541560

M3 - Conference contribution

AN - SCOPUS:84986000858

VL - 2016-August

SP - 1556

EP - 1560

BT - Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory

PB - Institute of Electrical and Electronics Engineers Inc.

ER -