Positive-Unlabeled Learning in the Face of Labeling Bias

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.

Original languageEnglish (US)
Title of host publicationProceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages639-645
Number of pages7
ISBN (Print)9781467384926
DOIs
StatePublished - Jan 29 2016
Event15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 - Atlantic City, United States
Duration: Nov 14 2015Nov 17 2015

Other

Other15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
CountryUnited States
CityAtlantic City
Period11/14/1511/17/15

Fingerprint

Labeling
Labels
Supervised learning
Learning algorithms
Costs
Tuning
Experiments

Keywords

  • Machine Learning
  • Positive-Unlabeled Learning
  • Semi-Supervised Learning

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Cite this

Youngs, N., Shasha, D., & Bonneau, R. (2016). Positive-Unlabeled Learning in the Face of Labeling Bias. In Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 (pp. 639-645). [7395727] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDMW.2015.207

Positive-Unlabeled Learning in the Face of Labeling Bias. / Youngs, Noah; Shasha, Dennis; Bonneau, Richard.

Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 639-645 7395727.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Youngs, N, Shasha, D & Bonneau, R 2016, Positive-Unlabeled Learning in the Face of Labeling Bias. in Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015., 7395727, Institute of Electrical and Electronics Engineers Inc., pp. 639-645, 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015, Atlantic City, United States, 11/14/15. https://doi.org/10.1109/ICDMW.2015.207
Youngs N, Shasha D, Bonneau R. Positive-Unlabeled Learning in the Face of Labeling Bias. In Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015. Institute of Electrical and Electronics Engineers Inc. 2016. p. 639-645. 7395727 https://doi.org/10.1109/ICDMW.2015.207
Youngs, Noah ; Shasha, Dennis ; Bonneau, Richard. / Positive-Unlabeled Learning in the Face of Labeling Bias. Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 639-645
@inproceedings{e1553f36a71b4141ab7bf9aea041976d,
title = "Positive-Unlabeled Learning in the Face of Labeling Bias",
abstract = "Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.",
keywords = "Machine Learning, Positive-Unlabeled Learning, Semi-Supervised Learning",
author = "Noah Youngs and Dennis Shasha and Richard Bonneau",
year = "2016",
month = "1",
day = "29",
doi = "10.1109/ICDMW.2015.207",
language = "English (US)",
isbn = "9781467384926",
pages = "639--645",
booktitle = "Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Positive-Unlabeled Learning in the Face of Labeling Bias

AU - Youngs, Noah

AU - Shasha, Dennis

AU - Bonneau, Richard

PY - 2016/1/29

Y1 - 2016/1/29

N2 - Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.

AB - Positive-Unlabeled (PU) learning scenarios are a class of semi-supervised learning where only a fraction of the data is labeled, and all available labels are positive. The goal is to assign correct (positive and negative) labels to as much data as possible. Several important learning problems fall into the PU-learning domain, as in many cases the cost and feasibility of obtaining negative examples is prohibitive. In addition to the positive-negative disparity the overall cost of labeling these datasets typically leads to situations where the number of unlabeled examples greatly outnumbers the labeled. Accordingly, we perform several experiments, on both synthetic and real-world datasets, examining the performance of state of the art PU-learning algorithms when there is significant bias in the labeling process. We propose novel PU algorithms and demonstrate that they outperform the current state of the art on a variety of benchmarks. Lastly, we present a methodology for removing the costly parameter-tuning step in a popular PU algorithm.

KW - Machine Learning

KW - Positive-Unlabeled Learning

KW - Semi-Supervised Learning

UR - http://www.scopus.com/inward/record.url?scp=84964720236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964720236&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2015.207

DO - 10.1109/ICDMW.2015.207

M3 - Conference contribution

AN - SCOPUS:84964720236

SN - 9781467384926

SP - 639

EP - 645

BT - Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -