Query-efficient imitation learning for end-to-end simulated driving

Jiakai Zhang, Kyunghyun Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One way to approach end-to-end autonomous driving is to learn a policy that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy is tuned to minimize the difference between the predicted and ground-truth actions. A policy trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often require a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

Original languageEnglish (US)
Title of host publication31st AAAI Conference on Artificial Intelligence, AAAI 2017
PublisherAAAI press
Pages2891-2897
Number of pages7
StatePublished - 2017
Event31st AAAI Conference on Artificial Intelligence, AAAI 2017 - San Francisco, United States
Duration: Feb 4 2017Feb 10 2017

Other

Other31st AAAI Conference on Artificial Intelligence, AAAI 2017
CountryUnited States
CitySan Francisco
Period2/4/172/10/17

Fingerprint

Facings
Supervised learning
Curricula
Railroad cars
Simulators
Cameras

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Zhang, J., & Cho, K. (2017). Query-efficient imitation learning for end-to-end simulated driving. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 2891-2897). AAAI press.

Query-efficient imitation learning for end-to-end simulated driving. / Zhang, Jiakai; Cho, Kyunghyun.

31st AAAI Conference on Artificial Intelligence, AAAI 2017. AAAI press, 2017. p. 2891-2897.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, J & Cho, K 2017, Query-efficient imitation learning for end-to-end simulated driving. in 31st AAAI Conference on Artificial Intelligence, AAAI 2017. AAAI press, pp. 2891-2897, 31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, United States, 2/4/17.
Zhang J, Cho K. Query-efficient imitation learning for end-to-end simulated driving. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017. AAAI press. 2017. p. 2891-2897
Zhang, Jiakai ; Cho, Kyunghyun. / Query-efficient imitation learning for end-to-end simulated driving. 31st AAAI Conference on Artificial Intelligence, AAAI 2017. AAAI press, 2017. pp. 2891-2897
@inproceedings{81617cdf1e2a4311926ceae15bda9ab8,
title = "Query-efficient imitation learning for end-to-end simulated driving",
abstract = "One way to approach end-to-end autonomous driving is to learn a policy that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy is tuned to minimize the difference between the predicted and ground-truth actions. A policy trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often require a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.",
author = "Jiakai Zhang and Kyunghyun Cho",
year = "2017",
language = "English (US)",
pages = "2891--2897",
booktitle = "31st AAAI Conference on Artificial Intelligence, AAAI 2017",
publisher = "AAAI press",

}

TY - GEN

T1 - Query-efficient imitation learning for end-to-end simulated driving

AU - Zhang, Jiakai

AU - Cho, Kyunghyun

PY - 2017

Y1 - 2017

N2 - One way to approach end-to-end autonomous driving is to learn a policy that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy is tuned to minimize the difference between the predicted and ground-truth actions. A policy trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often require a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

AB - One way to approach end-to-end autonomous driving is to learn a policy that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy is tuned to minimize the difference between the predicted and ground-truth actions. A policy trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often require a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

UR - http://www.scopus.com/inward/record.url?scp=85030456653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030456653&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2891

EP - 2897

BT - 31st AAAI Conference on Artificial Intelligence, AAAI 2017

PB - AAAI press

ER -