Extracting signals from news streams for disease outbreak prediction

Sunandan Chakraborty, Lakshminarayanan Subramanian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14% improvement across different diseases.

Original languageEnglish (US)
Title of host publication2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1300-1304
Number of pages5
ISBN (Electronic)9781509045457
DOIs
StatePublished - Apr 19 2017
Event2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Washington, United States
Duration: Dec 7 2016Dec 9 2016

Other

Other2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016
CountryUnited States
CityWashington
Period12/7/1612/9/16

Fingerprint

Experiments

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications

Cite this

Chakraborty, S., & Subramanian, L. (2017). Extracting signals from news streams for disease outbreak prediction. In 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings (pp. 1300-1304). [7906051] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/GlobalSIP.2016.7906051

Extracting signals from news streams for disease outbreak prediction. / Chakraborty, Sunandan; Subramanian, Lakshminarayanan.

2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 1300-1304 7906051.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chakraborty, S & Subramanian, L 2017, Extracting signals from news streams for disease outbreak prediction. in 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings., 7906051, Institute of Electrical and Electronics Engineers Inc., pp. 1300-1304, 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016, Washington, United States, 12/7/16. https://doi.org/10.1109/GlobalSIP.2016.7906051
Chakraborty S, Subramanian L. Extracting signals from news streams for disease outbreak prediction. In 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 1300-1304. 7906051 https://doi.org/10.1109/GlobalSIP.2016.7906051
Chakraborty, Sunandan ; Subramanian, Lakshminarayanan. / Extracting signals from news streams for disease outbreak prediction. 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 1300-1304
@inproceedings{c304232c015d4cbe80052996d5c396ae,
title = "Extracting signals from news streams for disease outbreak prediction",
abstract = "Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14{\%} improvement across different diseases.",
author = "Sunandan Chakraborty and Lakshminarayanan Subramanian",
year = "2017",
month = "4",
day = "19",
doi = "10.1109/GlobalSIP.2016.7906051",
language = "English (US)",
pages = "1300--1304",
booktitle = "2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Extracting signals from news streams for disease outbreak prediction

AU - Chakraborty, Sunandan

AU - Subramanian, Lakshminarayanan

PY - 2017/4/19

Y1 - 2017/4/19

N2 - Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14% improvement across different diseases.

AB - Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14% improvement across different diseases.

UR - http://www.scopus.com/inward/record.url?scp=85019170631&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019170631&partnerID=8YFLogxK

U2 - 10.1109/GlobalSIP.2016.7906051

DO - 10.1109/GlobalSIP.2016.7906051

M3 - Conference contribution

AN - SCOPUS:85019170631

SP - 1300

EP - 1304

BT - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -