Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot

Yin Aphinyanaphongs, Armine Lulejian, Duncan Penfold Brown, Richard Bonneau, Paul Krebs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

Original languageEnglish (US)
Title of host publicationPacific Symposium on Biocomputing 2016, PSB 2016
PublisherWorld Scientific Publishing Co. Pte Ltd
Pages480-491
Number of pages12
StatePublished - 2016
Event21st Pacific Symposium on Biocomputing, PSB 2016 - Big Island, United States
Duration: Jan 4 2016Jan 8 2016

Other

Other21st Pacific Symposium on Biocomputing, PSB 2016
CountryUnited States
CityBig Island
Period1/4/161/8/16

Fingerprint

Classifiers
Public health
Byproducts

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering

Cite this

Aphinyanaphongs, Y., Lulejian, A., Brown, D. P., Bonneau, R., & Krebs, P. (2016). Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot. In Pacific Symposium on Biocomputing 2016, PSB 2016 (pp. 480-491). World Scientific Publishing Co. Pte Ltd.

Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter : A feasibility pilot. / Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul.

Pacific Symposium on Biocomputing 2016, PSB 2016. World Scientific Publishing Co. Pte Ltd, 2016. p. 480-491.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aphinyanaphongs, Y, Lulejian, A, Brown, DP, Bonneau, R & Krebs, P 2016, Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot. in Pacific Symposium on Biocomputing 2016, PSB 2016. World Scientific Publishing Co. Pte Ltd, pp. 480-491, 21st Pacific Symposium on Biocomputing, PSB 2016, Big Island, United States, 1/4/16.
Aphinyanaphongs Y, Lulejian A, Brown DP, Bonneau R, Krebs P. Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot. In Pacific Symposium on Biocomputing 2016, PSB 2016. World Scientific Publishing Co. Pte Ltd. 2016. p. 480-491
Aphinyanaphongs, Yin ; Lulejian, Armine ; Brown, Duncan Penfold ; Bonneau, Richard ; Krebs, Paul. / Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter : A feasibility pilot. Pacific Symposium on Biocomputing 2016, PSB 2016. World Scientific Publishing Co. Pte Ltd, 2016. pp. 480-491
@inproceedings{2492246103e14d478e7a8f0828c9b5dd,
title = "Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot",
abstract = "Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.",
author = "Yin Aphinyanaphongs and Armine Lulejian and Brown, {Duncan Penfold} and Richard Bonneau and Paul Krebs",
year = "2016",
language = "English (US)",
pages = "480--491",
booktitle = "Pacific Symposium on Biocomputing 2016, PSB 2016",
publisher = "World Scientific Publishing Co. Pte Ltd",
address = "Singapore",

}

TY - GEN

T1 - Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter

T2 - A feasibility pilot

AU - Aphinyanaphongs, Yin

AU - Lulejian, Armine

AU - Brown, Duncan Penfold

AU - Bonneau, Richard

AU - Krebs, Paul

PY - 2016

Y1 - 2016

N2 - Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

AB - Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

UR - http://www.scopus.com/inward/record.url?scp=85012206641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85012206641&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 26776211

AN - SCOPUS:85012206641

SP - 480

EP - 491

BT - Pacific Symposium on Biocomputing 2016, PSB 2016

PB - World Scientific Publishing Co. Pte Ltd

ER -