Producing knowledge by admitting ignorance: Enhancing data quality through an “I don’t know” option in citizen science

Marina Torre, Shinnosuke Nakayama, Tyrone J. Tolbert, Maurizio Porfiri

Research output: Contribution to journalArticle

Abstract

The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.

Original languageEnglish (US)
Article numbere0211907
JournalPLoS ONE
Volume14
Issue number2
DOIs
StatePublished - Feb 1 2019

Fingerprint

Canals
volunteers
Volunteers
Entropy
entropy
Data Accuracy

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Producing knowledge by admitting ignorance : Enhancing data quality through an “I don’t know” option in citizen science. / Torre, Marina; Nakayama, Shinnosuke; Tolbert, Tyrone J.; Porfiri, Maurizio.

In: PLoS ONE, Vol. 14, No. 2, e0211907, 01.02.2019.

Research output: Contribution to journalArticle

@article{0abc00b2d7df47079ad689dc557fc352,
title = "Producing knowledge by admitting ignorance: Enhancing data quality through an “I don’t know” option in citizen science",
abstract = "The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.",
author = "Marina Torre and Shinnosuke Nakayama and Tolbert, {Tyrone J.} and Maurizio Porfiri",
year = "2019",
month = "2",
day = "1",
doi = "10.1371/journal.pone.0211907",
language = "English (US)",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "2",

}

TY - JOUR

T1 - Producing knowledge by admitting ignorance

T2 - Enhancing data quality through an “I don’t know” option in citizen science

AU - Torre, Marina

AU - Nakayama, Shinnosuke

AU - Tolbert, Tyrone J.

AU - Porfiri, Maurizio

PY - 2019/2/1

Y1 - 2019/2/1

N2 - The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.

AB - The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.

UR - http://www.scopus.com/inward/record.url?scp=85062166548&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062166548&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0211907

DO - 10.1371/journal.pone.0211907

M3 - Article

C2 - 30811452

AN - SCOPUS:85062166548

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 2

M1 - e0211907

ER -