Sample selection bias correction theory

Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh

Research output: Contribution to journalArticle

Abstract

This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stability which generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.

Fingerprint

Sample Selection
Selection Bias
Bias Correction
Learning algorithms
Learning systems
Weighting
Biased
Learning Algorithm
Theoretical Analysis
Machine Learning
Costs
kernel
Experiments
Generalise

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Sample selection bias correction theory. / Cortes, Corinna; Mohri, Mehryar; Riley, Michael; Rostamizadeh, Afshin.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 5254 LNAI, 2008, p. 38-53.

Research output: Contribution to journalArticle

@article{66ede6aba4b147069910bc834b173545,
title = "Sample selection bias correction theory",
abstract = "This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stability which generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.",
author = "Corinna Cortes and Mehryar Mohri and Michael Riley and Afshin Rostamizadeh",
year = "2008",
doi = "10.1007/978-3-540-87987-9_8",
language = "English (US)",
volume = "5254 LNAI",
pages = "38--53",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Sample selection bias correction theory

AU - Cortes, Corinna

AU - Mohri, Mehryar

AU - Riley, Michael

AU - Rostamizadeh, Afshin

PY - 2008

Y1 - 2008

N2 - This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stability which generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.

AB - This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stability which generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.

UR - http://www.scopus.com/inward/record.url?scp=56749103116&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56749103116&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-87987-9_8

DO - 10.1007/978-3-540-87987-9_8

M3 - Article

AN - SCOPUS:56749103116

VL - 5254 LNAI

SP - 38

EP - 53

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -