Real-Time understanding of humanitarian crises via targeted information retrieval

K. T. Pham, P. Sattigeri, A. Dhurandhar, A. C. Jacob, M. Vukovic, P. Chataigner, Juliana Freire, A. Mojsilovic, K. R. Varshney

Research output: Contribution to journalArticle

Abstract

Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.

Original languageEnglish (US)
Article number8167726
Pages (from-to)71-712
Number of pages642
JournalIBM Journal of Research and Development
Volume61
Issue number6
DOIs
StatePublished - Nov 1 2017

Fingerprint

Information retrieval
Classifiers
Feedback
Availability

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Pham, K. T., Sattigeri, P., Dhurandhar, A., Jacob, A. C., Vukovic, M., Chataigner, P., ... Varshney, K. R. (2017). Real-Time understanding of humanitarian crises via targeted information retrieval. IBM Journal of Research and Development, 61(6), 71-712. [8167726]. https://doi.org/10.1147/JRD.2017.2722799

Real-Time understanding of humanitarian crises via targeted information retrieval. / Pham, K. T.; Sattigeri, P.; Dhurandhar, A.; Jacob, A. C.; Vukovic, M.; Chataigner, P.; Freire, Juliana; Mojsilovic, A.; Varshney, K. R.

In: IBM Journal of Research and Development, Vol. 61, No. 6, 8167726, 01.11.2017, p. 71-712.

Research output: Contribution to journalArticle

Pham, KT, Sattigeri, P, Dhurandhar, A, Jacob, AC, Vukovic, M, Chataigner, P, Freire, J, Mojsilovic, A & Varshney, KR 2017, 'Real-Time understanding of humanitarian crises via targeted information retrieval', IBM Journal of Research and Development, vol. 61, no. 6, 8167726, pp. 71-712. https://doi.org/10.1147/JRD.2017.2722799
Pham KT, Sattigeri P, Dhurandhar A, Jacob AC, Vukovic M, Chataigner P et al. Real-Time understanding of humanitarian crises via targeted information retrieval. IBM Journal of Research and Development. 2017 Nov 1;61(6):71-712. 8167726. https://doi.org/10.1147/JRD.2017.2722799
Pham, K. T. ; Sattigeri, P. ; Dhurandhar, A. ; Jacob, A. C. ; Vukovic, M. ; Chataigner, P. ; Freire, Juliana ; Mojsilovic, A. ; Varshney, K. R. / Real-Time understanding of humanitarian crises via targeted information retrieval. In: IBM Journal of Research and Development. 2017 ; Vol. 61, No. 6. pp. 71-712.
@article{718839da077949d9a1b20f9813e39e32,
title = "Real-Time understanding of humanitarian crises via targeted information retrieval",
abstract = "Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.",
author = "Pham, {K. T.} and P. Sattigeri and A. Dhurandhar and Jacob, {A. C.} and M. Vukovic and P. Chataigner and Juliana Freire and A. Mojsilovic and Varshney, {K. R.}",
year = "2017",
month = "11",
day = "1",
doi = "10.1147/JRD.2017.2722799",
language = "English (US)",
volume = "61",
pages = "71--712",
journal = "IBM Journal of Research and Development",
issn = "0018-8646",
publisher = "IBM Corporation",
number = "6",

}

TY - JOUR

T1 - Real-Time understanding of humanitarian crises via targeted information retrieval

AU - Pham, K. T.

AU - Sattigeri, P.

AU - Dhurandhar, A.

AU - Jacob, A. C.

AU - Vukovic, M.

AU - Chataigner, P.

AU - Freire, Juliana

AU - Mojsilovic, A.

AU - Varshney, K. R.

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.

AB - Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.

UR - http://www.scopus.com/inward/record.url?scp=85038819553&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038819553&partnerID=8YFLogxK

U2 - 10.1147/JRD.2017.2722799

DO - 10.1147/JRD.2017.2722799

M3 - Article

AN - SCOPUS:85038819553

VL - 61

SP - 71

EP - 712

JO - IBM Journal of Research and Development

JF - IBM Journal of Research and Development

SN - 0018-8646

IS - 6

M1 - 8167726

ER -