Identifying unreliable and adversarial workers in crowdsourced labeling tasks

Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman

Research output: Contribution to journalArticle

Abstract

We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.

Original languageEnglish (US)
Pages (from-to)1-67
Number of pages67
JournalJournal of Machine Learning Research
Volume18
StatePublished - Sep 1 2017

Fingerprint

Labeling
Labels
Probabilistic Model
Outlier
Aggregation
Agglomeration
Filtering
Classify
Unknown
Strategy
Arbitrary
Statistical Models

Keywords

  • Adversary
  • Crowdsourcing
  • Outliers
  • Reputation

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Identifying unreliable and adversarial workers in crowdsourced labeling tasks. / Jagabathula, Srikanth; Subramanian, Lakshminarayanan; Venkataraman, Ashwin.

In: Journal of Machine Learning Research, Vol. 18, 01.09.2017, p. 1-67.

Research output: Contribution to journalArticle

@article{6e53bd1e44c94715b13ab194b5743f10,
title = "Identifying unreliable and adversarial workers in crowdsourced labeling tasks",
abstract = "We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.",
keywords = "Adversary, Crowdsourcing, Outliers, Reputation",
author = "Srikanth Jagabathula and Lakshminarayanan Subramanian and Ashwin Venkataraman",
year = "2017",
month = "9",
day = "1",
language = "English (US)",
volume = "18",
pages = "1--67",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Identifying unreliable and adversarial workers in crowdsourced labeling tasks

AU - Jagabathula, Srikanth

AU - Subramanian, Lakshminarayanan

AU - Venkataraman, Ashwin

PY - 2017/9/1

Y1 - 2017/9/1

N2 - We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.

AB - We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.

KW - Adversary

KW - Crowdsourcing

KW - Outliers

KW - Reputation

UR - http://www.scopus.com/inward/record.url?scp=85032972309&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032972309&partnerID=8YFLogxK

M3 - Article

VL - 18

SP - 1

EP - 67

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -