Creating full individual-level location timelines from sparse social media data

Nabeel Abdur Rehman, Kunal Relia, Rumi Chunara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.

Original languageEnglish (US)
Title of host publication26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018
EditorsLi Xiong, Roberto Tamassia, Kashani Farnoush Banaei, Ralf Hartmut Guting, Erik Hoel
PublisherAssociation for Computing Machinery
Pages379-388
Number of pages10
ISBN (Electronic)9781450358897
DOIs
StatePublished - Nov 6 2018
Event26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018 - Seattle, United States
Duration: Nov 6 2018Nov 9 2018

Other

Other26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018
CountryUnited States
CitySeattle
Period11/6/1811/9/18

Fingerprint

Social Media
Computing
disease spread
heuristics
Prior Knowledge
social media
GPS
Predict
prediction
Global positioning system
Baseline
Trade-offs
Traffic
Entire

Keywords

  • Social media
  • Society
  • Sparse data
  • Spatial Information

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modeling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Rehman, N. A., Relia, K., & Chunara, R. (2018). Creating full individual-level location timelines from sparse social media data. In L. Xiong, R. Tamassia, K. F. Banaei, R. H. Guting, & E. Hoel (Eds.), 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018 (pp. 379-388). Association for Computing Machinery. https://doi.org/10.1145/3274895.3274982

Creating full individual-level location timelines from sparse social media data. / Rehman, Nabeel Abdur; Relia, Kunal; Chunara, Rumi.

26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018. ed. / Li Xiong; Roberto Tamassia; Kashani Farnoush Banaei; Ralf Hartmut Guting; Erik Hoel. Association for Computing Machinery, 2018. p. 379-388.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rehman, NA, Relia, K & Chunara, R 2018, Creating full individual-level location timelines from sparse social media data. in L Xiong, R Tamassia, KF Banaei, RH Guting & E Hoel (eds), 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018. Association for Computing Machinery, pp. 379-388, 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018, Seattle, United States, 11/6/18. https://doi.org/10.1145/3274895.3274982
Rehman NA, Relia K, Chunara R. Creating full individual-level location timelines from sparse social media data. In Xiong L, Tamassia R, Banaei KF, Guting RH, Hoel E, editors, 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018. Association for Computing Machinery. 2018. p. 379-388 https://doi.org/10.1145/3274895.3274982
Rehman, Nabeel Abdur ; Relia, Kunal ; Chunara, Rumi. / Creating full individual-level location timelines from sparse social media data. 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018. editor / Li Xiong ; Roberto Tamassia ; Kashani Farnoush Banaei ; Ralf Hartmut Guting ; Erik Hoel. Association for Computing Machinery, 2018. pp. 379-388
@inproceedings{0bbb462ad65b455e8e11373290d7170a,
title = "Creating full individual-level location timelines from sparse social media data",
abstract = "In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2{\%} accuracy (up to 6{\%} better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.",
keywords = "Social media, Society, Sparse data, Spatial Information",
author = "Rehman, {Nabeel Abdur} and Kunal Relia and Rumi Chunara",
year = "2018",
month = "11",
day = "6",
doi = "10.1145/3274895.3274982",
language = "English (US)",
pages = "379--388",
editor = "Li Xiong and Roberto Tamassia and Banaei, {Kashani Farnoush} and Guting, {Ralf Hartmut} and Erik Hoel",
booktitle = "26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Creating full individual-level location timelines from sparse social media data

AU - Rehman, Nabeel Abdur

AU - Relia, Kunal

AU - Chunara, Rumi

PY - 2018/11/6

Y1 - 2018/11/6

N2 - In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.

AB - In many domain applications, a continuous timeline of human locations is critical; for example for understanding possible locations where a disease may spread, or the flow of traffic. While data sources such as GPS trackers or Call Data Records are temporally-rich, they are expensive, often not publicly available or garnered only in select locations, restricting their wide use. Conversely, geo-located social media data are publicly and freely available, but present challenges especially for full timeline inference due to their sparse nature. We propose a stochastic framework, Intermediate Location Computing (ILC) which uses prior knowledge about human mobility patterns to predict every missing location from an individual’s social media timeline. We compare ILC with a state-of-the-art RNN baseline as well as methods that are optimized for next-location prediction only. For three major cities, ILC predicts the top 1 location for all missing locations in a timeline, at 1 and 2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all compared methods). Specifically, ILC also outperforms the RNN in settings of low data; both cases of very small number of users (under 50), as well as settings with more users, but with sparser timelines. In general, the RNN model needs a higher number of users to achieve the same performance as ILC. Overall, this work illustrates the tradeoff between prior knowledge of heuristics and more data, for an important societal problem of filling in entire timelines using freely available, but sparse social media data.

KW - Social media

KW - Society

KW - Sparse data

KW - Spatial Information

UR - http://www.scopus.com/inward/record.url?scp=85058658957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058658957&partnerID=8YFLogxK

U2 - 10.1145/3274895.3274982

DO - 10.1145/3274895.3274982

M3 - Conference contribution

SP - 379

EP - 388

BT - 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2018

A2 - Xiong, Li

A2 - Tamassia, Roberto

A2 - Banaei, Kashani Farnoush

A2 - Guting, Ralf Hartmut

A2 - Hoel, Erik

PB - Association for Computing Machinery

ER -