Crowdsourced data collection for public health: A comparison with nationally representative, population tobacco use data

John D. Kraemer, Andrew A. Strasser, Eric N. Lindblom, Raymond Niaura, Darren Mays

Research output: Contribution to journalArticle

Abstract

Introduction Internet-based crowdsourcing is increasingly used for social and behavioral research in public health, however the potential generalizability of crowdsourced data remains unclear. This study assessed the population representativeness of Internet-based crowdsourced data. Methods A total of 3999 U.S. young adults ages 18 to 30 years were recruited in 2016 through Internet-based crowdsourcing to complete measures taken from the 2012–2013 National Adult Tobacco Survey (NATS). Post-hoc sampling weights were created using procedures similar to the NATS. Weighted analyses were conducted in 2016 to compare crowdsourced and publicly-available 2012–2013 NATS data on demographics, tobacco use, and measures of tobacco perceptions and product warning label exposure. Results Those in the crowdsourced sample were less likely to report an annual household income of $50,000 or greater, and e-cigarette, waterpipe, and cigar use were more prevalent in the crowdsourced sample. High proportions of both samples indicated cigarette smoking is very harmful and very addictive. Comparable proportions of non-smokers and smokers reported cigarette warning label exposure, however the likelihood of reporting that smoking is very harmful by frequency of warning label exposure was lower among smokers in the crowdsourced sample. Conclusions Our findings indicate that crowdsourced samples may differ demographically and may not produce generalizable estimates of tobacco use prevalence relative to population data after post-hoc sample weighting. However, correlational analyses in crowdsourced samples may reasonably approximate population data. Future studies can build from this work by testing additional methodological strategies to improve crowdsourced sampling strategies.

Original languageEnglish (US)
Pages (from-to)93-99
Number of pages7
JournalPreventive Medicine
Volume102
DOIs
StatePublished - Sep 1 2017

Fingerprint

Tobacco Use
Tobacco Products
Crowdsourcing
Public Health
Internet
Tobacco
Population
Smoking
Behavioral Research
Annual Reports
Young Adult
Demography
Weights and Measures
Surveys and Questionnaires

Keywords

  • Crowdsourcing
  • Tobacco control
  • Young adult

ASJC Scopus subject areas

  • Epidemiology
  • Public Health, Environmental and Occupational Health

Cite this

Crowdsourced data collection for public health : A comparison with nationally representative, population tobacco use data. / Kraemer, John D.; Strasser, Andrew A.; Lindblom, Eric N.; Niaura, Raymond; Mays, Darren.

In: Preventive Medicine, Vol. 102, 01.09.2017, p. 93-99.

Research output: Contribution to journalArticle

Kraemer, John D. ; Strasser, Andrew A. ; Lindblom, Eric N. ; Niaura, Raymond ; Mays, Darren. / Crowdsourced data collection for public health : A comparison with nationally representative, population tobacco use data. In: Preventive Medicine. 2017 ; Vol. 102. pp. 93-99.
@article{1c860f4830264e1195e139436a944bf9,
title = "Crowdsourced data collection for public health: A comparison with nationally representative, population tobacco use data",
abstract = "Introduction Internet-based crowdsourcing is increasingly used for social and behavioral research in public health, however the potential generalizability of crowdsourced data remains unclear. This study assessed the population representativeness of Internet-based crowdsourced data. Methods A total of 3999 U.S. young adults ages 18 to 30 years were recruited in 2016 through Internet-based crowdsourcing to complete measures taken from the 2012–2013 National Adult Tobacco Survey (NATS). Post-hoc sampling weights were created using procedures similar to the NATS. Weighted analyses were conducted in 2016 to compare crowdsourced and publicly-available 2012–2013 NATS data on demographics, tobacco use, and measures of tobacco perceptions and product warning label exposure. Results Those in the crowdsourced sample were less likely to report an annual household income of $50,000 or greater, and e-cigarette, waterpipe, and cigar use were more prevalent in the crowdsourced sample. High proportions of both samples indicated cigarette smoking is very harmful and very addictive. Comparable proportions of non-smokers and smokers reported cigarette warning label exposure, however the likelihood of reporting that smoking is very harmful by frequency of warning label exposure was lower among smokers in the crowdsourced sample. Conclusions Our findings indicate that crowdsourced samples may differ demographically and may not produce generalizable estimates of tobacco use prevalence relative to population data after post-hoc sample weighting. However, correlational analyses in crowdsourced samples may reasonably approximate population data. Future studies can build from this work by testing additional methodological strategies to improve crowdsourced sampling strategies.",
keywords = "Crowdsourcing, Tobacco control, Young adult",
author = "Kraemer, {John D.} and Strasser, {Andrew A.} and Lindblom, {Eric N.} and Raymond Niaura and Darren Mays",
year = "2017",
month = "9",
day = "1",
doi = "10.1016/j.ypmed.2017.07.006",
language = "English (US)",
volume = "102",
pages = "93--99",
journal = "Preventive Medicine",
issn = "0091-7435",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Crowdsourced data collection for public health

T2 - A comparison with nationally representative, population tobacco use data

AU - Kraemer, John D.

AU - Strasser, Andrew A.

AU - Lindblom, Eric N.

AU - Niaura, Raymond

AU - Mays, Darren

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Introduction Internet-based crowdsourcing is increasingly used for social and behavioral research in public health, however the potential generalizability of crowdsourced data remains unclear. This study assessed the population representativeness of Internet-based crowdsourced data. Methods A total of 3999 U.S. young adults ages 18 to 30 years were recruited in 2016 through Internet-based crowdsourcing to complete measures taken from the 2012–2013 National Adult Tobacco Survey (NATS). Post-hoc sampling weights were created using procedures similar to the NATS. Weighted analyses were conducted in 2016 to compare crowdsourced and publicly-available 2012–2013 NATS data on demographics, tobacco use, and measures of tobacco perceptions and product warning label exposure. Results Those in the crowdsourced sample were less likely to report an annual household income of $50,000 or greater, and e-cigarette, waterpipe, and cigar use were more prevalent in the crowdsourced sample. High proportions of both samples indicated cigarette smoking is very harmful and very addictive. Comparable proportions of non-smokers and smokers reported cigarette warning label exposure, however the likelihood of reporting that smoking is very harmful by frequency of warning label exposure was lower among smokers in the crowdsourced sample. Conclusions Our findings indicate that crowdsourced samples may differ demographically and may not produce generalizable estimates of tobacco use prevalence relative to population data after post-hoc sample weighting. However, correlational analyses in crowdsourced samples may reasonably approximate population data. Future studies can build from this work by testing additional methodological strategies to improve crowdsourced sampling strategies.

AB - Introduction Internet-based crowdsourcing is increasingly used for social and behavioral research in public health, however the potential generalizability of crowdsourced data remains unclear. This study assessed the population representativeness of Internet-based crowdsourced data. Methods A total of 3999 U.S. young adults ages 18 to 30 years were recruited in 2016 through Internet-based crowdsourcing to complete measures taken from the 2012–2013 National Adult Tobacco Survey (NATS). Post-hoc sampling weights were created using procedures similar to the NATS. Weighted analyses were conducted in 2016 to compare crowdsourced and publicly-available 2012–2013 NATS data on demographics, tobacco use, and measures of tobacco perceptions and product warning label exposure. Results Those in the crowdsourced sample were less likely to report an annual household income of $50,000 or greater, and e-cigarette, waterpipe, and cigar use were more prevalent in the crowdsourced sample. High proportions of both samples indicated cigarette smoking is very harmful and very addictive. Comparable proportions of non-smokers and smokers reported cigarette warning label exposure, however the likelihood of reporting that smoking is very harmful by frequency of warning label exposure was lower among smokers in the crowdsourced sample. Conclusions Our findings indicate that crowdsourced samples may differ demographically and may not produce generalizable estimates of tobacco use prevalence relative to population data after post-hoc sample weighting. However, correlational analyses in crowdsourced samples may reasonably approximate population data. Future studies can build from this work by testing additional methodological strategies to improve crowdsourced sampling strategies.

KW - Crowdsourcing

KW - Tobacco control

KW - Young adult

UR - http://www.scopus.com/inward/record.url?scp=85023618571&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023618571&partnerID=8YFLogxK

U2 - 10.1016/j.ypmed.2017.07.006

DO - 10.1016/j.ypmed.2017.07.006

M3 - Article

VL - 102

SP - 93

EP - 99

JO - Preventive Medicine

JF - Preventive Medicine

SN - 0091-7435

ER -