Abstract
Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 639-645 |
Number of pages | 7 |
ISBN (Print) | 9781467384926 |
DOIs | |
State | Published - Jan 29 2016 |
Event | 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 - Atlantic City, United States Duration: Nov 14 2015 → Nov 17 2015 |
Other
Other | 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 |
---|---|
Country | United States |
City | Atlantic City |
Period | 11/14/15 → 11/17/15 |
Fingerprint
Keywords
- Machine Learning
- Positive-Unlabeled Learning
- Semi-Supervised Learning
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Computer Science Applications
Cite this
Positive-Unlabeled Learning in the Face of Labeling Bias. / Youngs, Noah; Shasha, Dennis; Bonneau, Richard.
Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 639-645 7395727.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Positive-Unlabeled Learning in the Face of Labeling Bias
AU - Youngs, Noah
AU - Shasha, Dennis
AU - Bonneau, Richard
PY - 2016/1/29
Y1 - 2016/1/29
N2 - Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.
AB - Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.
KW - Machine Learning
KW - Positive-Unlabeled Learning
KW - Semi-Supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=84964720236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964720236&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2015.207
DO - 10.1109/ICDMW.2015.207
M3 - Conference contribution
AN - SCOPUS:84964720236
SN - 9781467384926
SP - 639
EP - 645
BT - Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
PB - Institute of Electrical and Electronics Engineers Inc.
ER -