Analysis of Traffic Crashes Involving Pedestrians Using Big Data: Investigation of Contributing Factors and Identification of Hotspots

Kun Xie, Kaan Ozbay, Abdullah Kurkcu, Hong Yang

Research output: Contribution to journalArticle

Abstract

This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of "similar" sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.

Original languageEnglish (US)
JournalRisk Analysis
DOIs
StateAccepted/In press - 2017

Fingerprint

Costs and Cost Analysis
Risk analysis
Costs
Railroads
Social Media
Subways
Wounds and Injuries
Censuses
Land use
Probability density function
Safety
Pedestrians
Big data
Grid Cells

Keywords

  • Big data
  • Grid cell analysis
  • Pedestrian risk

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Physiology (medical)

Cite this

Analysis of Traffic Crashes Involving Pedestrians Using Big Data : Investigation of Contributing Factors and Identification of Hotspots. / Xie, Kun; Ozbay, Kaan; Kurkcu, Abdullah; Yang, Hong.

In: Risk Analysis, 2017.

Research output: Contribution to journalArticle

@article{bcdc165f1a1144288a6dca8df432ef42,
title = "Analysis of Traffic Crashes Involving Pedestrians Using Big Data: Investigation of Contributing Factors and Identification of Hotspots",
abstract = "This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of {"}similar{"} sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.",
keywords = "Big data, Grid cell analysis, Pedestrian risk",
author = "Kun Xie and Kaan Ozbay and Abdullah Kurkcu and Hong Yang",
year = "2017",
doi = "10.1111/risa.12785",
language = "English (US)",
journal = "Risk Analysis",
issn = "0272-4332",
publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - Analysis of Traffic Crashes Involving Pedestrians Using Big Data

T2 - Investigation of Contributing Factors and Identification of Hotspots

AU - Xie, Kun

AU - Ozbay, Kaan

AU - Kurkcu, Abdullah

AU - Yang, Hong

PY - 2017

Y1 - 2017

N2 - This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of "similar" sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.

AB - This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of "similar" sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.

KW - Big data

KW - Grid cell analysis

KW - Pedestrian risk

UR - http://www.scopus.com/inward/record.url?scp=85016436528&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016436528&partnerID=8YFLogxK

U2 - 10.1111/risa.12785

DO - 10.1111/risa.12785

M3 - Article

AN - SCOPUS:85016436528

JO - Risk Analysis

JF - Risk Analysis

SN - 0272-4332

ER -