Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk

A Feasibility Study

Wenhua Lu, Alexandra Guttentag, Brian D. Elbel, Kamila Kiszko, Courtney Abrams, Thomas Kirchner

Research output: Contribution to journalArticle

Abstract

BACKGROUND: The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE: Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS: Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald's, Subway, and Wendy's). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS: Among the 196 receipts completed by turkers, the interturker agreement was 100% (196/196) for restaurant names (eg, Burger King, McDonald's, and Subway), 98.5% (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3% (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2% (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100% (196/196) for restaurant name, 99.5% (195/196) for beverage inclusion, and 99.5% (195/196) for beverage types. CONCLUSIONS: Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.

Original languageEnglish (US)
Pages (from-to)e12047
JournalJournal of medical Internet research
Volume21
Issue number4
DOIs
StatePublished - Apr 5 2019

Fingerprint

Crowdsourcing
Beverages
Restaurants
Feasibility Studies
Food and Beverages
Food
Names
Railroads
Fast Foods
Coffee
Tea
Energy Drinks
Coke
Food Chain
Information Storage and Retrieval
Energy Intake
Reproducibility of Results
Drinking Water
Weight Gain
Obesity

Keywords

  • Amazon Mechanical Turk
  • crowdsourcing
  • feasibility
  • food purchase receipt
  • reliability
  • validity

ASJC Scopus subject areas

  • Health Informatics

Cite this

Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk : A Feasibility Study. / Lu, Wenhua; Guttentag, Alexandra; Elbel, Brian D.; Kiszko, Kamila; Abrams, Courtney; Kirchner, Thomas.

In: Journal of medical Internet research, Vol. 21, No. 4, 05.04.2019, p. e12047.

Research output: Contribution to journalArticle

Lu, Wenhua ; Guttentag, Alexandra ; Elbel, Brian D. ; Kiszko, Kamila ; Abrams, Courtney ; Kirchner, Thomas. / Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk : A Feasibility Study. In: Journal of medical Internet research. 2019 ; Vol. 21, No. 4. pp. e12047.
@article{a084c947a41f494e93ab06d533460294,
title = "Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk: A Feasibility Study",
abstract = "BACKGROUND: The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE: Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS: Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald's, Subway, and Wendy's). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS: Among the 196 receipts completed by turkers, the interturker agreement was 100{\%} (196/196) for restaurant names (eg, Burger King, McDonald's, and Subway), 98.5{\%} (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3{\%} (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2{\%} (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100{\%} (196/196) for restaurant name, 99.5{\%} (195/196) for beverage inclusion, and 99.5{\%} (195/196) for beverage types. CONCLUSIONS: Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.",
keywords = "Amazon Mechanical Turk, crowdsourcing, feasibility, food purchase receipt, reliability, validity",
author = "Wenhua Lu and Alexandra Guttentag and Elbel, {Brian D.} and Kamila Kiszko and Courtney Abrams and Thomas Kirchner",
year = "2019",
month = "4",
day = "5",
doi = "10.2196/12047",
language = "English (US)",
volume = "21",
pages = "e12047",
journal = "Journal of Medical Internet Research",
issn = "1439-4456",
publisher = "Journal of medical Internet Research",
number = "4",

}

TY - JOUR

T1 - Crowdsourcing for Food Purchase Receipt Annotation via Amazon Mechanical Turk

T2 - A Feasibility Study

AU - Lu, Wenhua

AU - Guttentag, Alexandra

AU - Elbel, Brian D.

AU - Kiszko, Kamila

AU - Abrams, Courtney

AU - Kirchner, Thomas

PY - 2019/4/5

Y1 - 2019/4/5

N2 - BACKGROUND: The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE: Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS: Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald's, Subway, and Wendy's). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS: Among the 196 receipts completed by turkers, the interturker agreement was 100% (196/196) for restaurant names (eg, Burger King, McDonald's, and Subway), 98.5% (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3% (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2% (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100% (196/196) for restaurant name, 99.5% (195/196) for beverage inclusion, and 99.5% (195/196) for beverage types. CONCLUSIONS: Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.

AB - BACKGROUND: The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE: Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS: Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald's, Subway, and Wendy's). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS: Among the 196 receipts completed by turkers, the interturker agreement was 100% (196/196) for restaurant names (eg, Burger King, McDonald's, and Subway), 98.5% (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3% (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2% (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100% (196/196) for restaurant name, 99.5% (195/196) for beverage inclusion, and 99.5% (195/196) for beverage types. CONCLUSIONS: Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.

KW - Amazon Mechanical Turk

KW - crowdsourcing

KW - feasibility

KW - food purchase receipt

KW - reliability

KW - validity

UR - http://www.scopus.com/inward/record.url?scp=85064326098&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064326098&partnerID=8YFLogxK

U2 - 10.2196/12047

DO - 10.2196/12047

M3 - Article

VL - 21

SP - e12047

JO - Journal of Medical Internet Research

JF - Journal of Medical Internet Research

SN - 1439-4456

IS - 4

ER -