Deriving gradient measures of child speech from crowdsourced ratings

Research output: Contribution to journalArticle

Abstract

Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the “correct /r/” label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.

Original languageEnglish (US)
Pages (from-to)91-102
Number of pages12
JournalJournal of Communication Disorders
Volume64
DOIs
StatePublished - Nov 1 2016

Fingerprint

Acoustics
listener
Crowdsourcing
Articulation Disorders
rating
rating scale
scaling
gold standard
Guidelines
acoustics
Research
stimulus
Therapeutics
evidence
Group

Keywords

  • Covert contrast
  • Crowdsourcing
  • Research methods
  • Speech perception
  • Speech rating
  • Speech sound disorders
  • Visual analog scaling

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Linguistics and Language
  • Cognitive Neuroscience
  • Speech and Hearing
  • LPN and LVN

Cite this

Deriving gradient measures of child speech from crowdsourced ratings. / McAllister Byun, Tara; Harel, Daphna; Halpin, Peter F.; Szeredi, Daniel.

In: Journal of Communication Disorders, Vol. 64, 01.11.2016, p. 91-102.

Research output: Contribution to journalArticle

@article{f8b0745939774f7ab8f1dede1c95c039,
title = "Deriving gradient measures of child speech from crowdsourced ratings",
abstract = "Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the “correct /r/” label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.",
keywords = "Covert contrast, Crowdsourcing, Research methods, Speech perception, Speech rating, Speech sound disorders, Visual analog scaling",
author = "{McAllister Byun}, Tara and Daphna Harel and Halpin, {Peter F.} and Daniel Szeredi",
year = "2016",
month = "11",
day = "1",
doi = "10.1016/j.jcomdis.2016.07.001",
language = "English (US)",
volume = "64",
pages = "91--102",
journal = "Journal of Communication Disorders",
issn = "0021-9924",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Deriving gradient measures of child speech from crowdsourced ratings

AU - McAllister Byun, Tara

AU - Harel, Daphna

AU - Halpin, Peter F.

AU - Szeredi, Daniel

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the “correct /r/” label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.

AB - Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the “correct /r/” label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.

KW - Covert contrast

KW - Crowdsourcing

KW - Research methods

KW - Speech perception

KW - Speech rating

KW - Speech sound disorders

KW - Visual analog scaling

UR - http://www.scopus.com/inward/record.url?scp=85004125587&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85004125587&partnerID=8YFLogxK

U2 - 10.1016/j.jcomdis.2016.07.001

DO - 10.1016/j.jcomdis.2016.07.001

M3 - Article

C2 - 27481555

AN - SCOPUS:85004125587

VL - 64

SP - 91

EP - 102

JO - Journal of Communication Disorders

JF - Journal of Communication Disorders

SN - 0021-9924

ER -