Recurrent Neural Networks for Multivariate Time Series with Missing Values

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu

Research output: Contribution to journalArticle

Abstract

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

Original languageEnglish (US)
Article number6085
JournalScientific Reports
Volume8
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Recurrent neural networks
Time series
Time series analysis
Health care
Labels
Experiments

ASJC Scopus subject areas

  • General

Cite this

Recurrent Neural Networks for Multivariate Time Series with Missing Values. / Che, Zhengping; Purushotham, Sanjay; Cho, Kyunghyun; Sontag, David; Liu, Yan.

In: Scientific Reports, Vol. 8, No. 1, 6085, 01.12.2018.

Research output: Contribution to journalArticle

Che, Zhengping ; Purushotham, Sanjay ; Cho, Kyunghyun ; Sontag, David ; Liu, Yan. / Recurrent Neural Networks for Multivariate Time Series with Missing Values. In: Scientific Reports. 2018 ; Vol. 8, No. 1.
@article{e73a87ae51c24c47bab728e5b211bf32,
title = "Recurrent Neural Networks for Multivariate Time Series with Missing Values",
abstract = "Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.",
author = "Zhengping Che and Sanjay Purushotham and Kyunghyun Cho and David Sontag and Yan Liu",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41598-018-24271-9",
language = "English (US)",
volume = "8",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - Recurrent Neural Networks for Multivariate Time Series with Missing Values

AU - Che, Zhengping

AU - Purushotham, Sanjay

AU - Cho, Kyunghyun

AU - Sontag, David

AU - Liu, Yan

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

AB - Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

UR - http://www.scopus.com/inward/record.url?scp=85045746406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045746406&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-24271-9

DO - 10.1038/s41598-018-24271-9

M3 - Article

VL - 8

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 6085

ER -