Statistical considerations for crowdsourced perceptual ratings of human speech productions

Daniel Fernández, Daphna Harel, Panos Ipeirotis, Tara McAllister Byun

Research output: Contribution to journalArticle

Abstract

Crowdsourcing has become a major tool for scholarly research since its introduction to the academic sphere in 2008. However, unlike in traditional laboratory settings, it is nearly impossible to control the conditions under which workers on crowdsourcing platforms complete tasks. In the study of communication disorders, crowdsourcing has provided a novel solution to the collection of perceptual ratings of human speech production. Such ratings allow researchers to gauge whether a treatment yields meaningful change in how human listeners' perceive disordered speech. This paper will explore some statistical considerations of crowdsourced data with specific focus on collecting perceptual ratings of human speech productions. Random effects models are applied to crowdsourced perceptual ratings collected in both a continuous and binary fashion. A simulation study is conducted to test the reliability of the proposed models under differing numbers of workers and tasks. Finally, this methodology is applied to a data set from the study of communication disorders.

Original languageEnglish (US)
JournalJournal of Applied Statistics
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Speech Production
Disorder
Random Effects Model
Gauge
Simulation Study
Binary
Methodology
Human
Rating
Communication
Model
Workers

Keywords

  • Amazon Mechanical Turk
  • communication disorders
  • crowdsourcing
  • random effects models
  • Reliability
  • validity

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Statistical considerations for crowdsourced perceptual ratings of human speech productions. / Fernández, Daniel; Harel, Daphna; Ipeirotis, Panos; McAllister Byun, Tara.

In: Journal of Applied Statistics, 01.01.2018.

Research output: Contribution to journalArticle

@article{b4b00cce649e42f2969af184ec7e81bd,
title = "Statistical considerations for crowdsourced perceptual ratings of human speech productions",
abstract = "Crowdsourcing has become a major tool for scholarly research since its introduction to the academic sphere in 2008. However, unlike in traditional laboratory settings, it is nearly impossible to control the conditions under which workers on crowdsourcing platforms complete tasks. In the study of communication disorders, crowdsourcing has provided a novel solution to the collection of perceptual ratings of human speech production. Such ratings allow researchers to gauge whether a treatment yields meaningful change in how human listeners' perceive disordered speech. This paper will explore some statistical considerations of crowdsourced data with specific focus on collecting perceptual ratings of human speech productions. Random effects models are applied to crowdsourced perceptual ratings collected in both a continuous and binary fashion. A simulation study is conducted to test the reliability of the proposed models under differing numbers of workers and tasks. Finally, this methodology is applied to a data set from the study of communication disorders.",
keywords = "Amazon Mechanical Turk, communication disorders, crowdsourcing, random effects models, Reliability, validity",
author = "Daniel Fern{\'a}ndez and Daphna Harel and Panos Ipeirotis and {McAllister Byun}, Tara",
year = "2018",
month = "1",
day = "1",
doi = "10.1080/02664763.2018.1547692",
language = "English (US)",
journal = "Journal of Applied Statistics",
issn = "0266-4763",
publisher = "Routledge",

}

TY - JOUR

T1 - Statistical considerations for crowdsourced perceptual ratings of human speech productions

AU - Fernández, Daniel

AU - Harel, Daphna

AU - Ipeirotis, Panos

AU - McAllister Byun, Tara

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Crowdsourcing has become a major tool for scholarly research since its introduction to the academic sphere in 2008. However, unlike in traditional laboratory settings, it is nearly impossible to control the conditions under which workers on crowdsourcing platforms complete tasks. In the study of communication disorders, crowdsourcing has provided a novel solution to the collection of perceptual ratings of human speech production. Such ratings allow researchers to gauge whether a treatment yields meaningful change in how human listeners' perceive disordered speech. This paper will explore some statistical considerations of crowdsourced data with specific focus on collecting perceptual ratings of human speech productions. Random effects models are applied to crowdsourced perceptual ratings collected in both a continuous and binary fashion. A simulation study is conducted to test the reliability of the proposed models under differing numbers of workers and tasks. Finally, this methodology is applied to a data set from the study of communication disorders.

AB - Crowdsourcing has become a major tool for scholarly research since its introduction to the academic sphere in 2008. However, unlike in traditional laboratory settings, it is nearly impossible to control the conditions under which workers on crowdsourcing platforms complete tasks. In the study of communication disorders, crowdsourcing has provided a novel solution to the collection of perceptual ratings of human speech production. Such ratings allow researchers to gauge whether a treatment yields meaningful change in how human listeners' perceive disordered speech. This paper will explore some statistical considerations of crowdsourced data with specific focus on collecting perceptual ratings of human speech productions. Random effects models are applied to crowdsourced perceptual ratings collected in both a continuous and binary fashion. A simulation study is conducted to test the reliability of the proposed models under differing numbers of workers and tasks. Finally, this methodology is applied to a data set from the study of communication disorders.

KW - Amazon Mechanical Turk

KW - communication disorders

KW - crowdsourcing

KW - random effects models

KW - Reliability

KW - validity

UR - http://www.scopus.com/inward/record.url?scp=85057322802&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057322802&partnerID=8YFLogxK

U2 - 10.1080/02664763.2018.1547692

DO - 10.1080/02664763.2018.1547692

M3 - Article

JO - Journal of Applied Statistics

JF - Journal of Applied Statistics

SN - 0266-4763

ER -