Hierarchical RL using an ensemble of proprioceptive periodic policies

Kenneth Marino, Abhinav Gupta, Arthur Szlam, Robert Fergus

Research output: Contribution to conferencePaper

Abstract

In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

Original languageEnglish (US)
StatePublished - Jan 1 2019
Event7th International Conference on Learning Representations, ICLR 2019 - New Orleans, United States
Duration: May 6 2019May 9 2019

Conference

Conference7th International Conference on Learning Representations, ICLR 2019
CountryUnited States
CityNew Orleans
Period5/6/195/9/19

Fingerprint

reward
Navigation
Ensemble
Reward
Values

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

Marino, K., Gupta, A., Szlam, A., & Fergus, R. (2019). Hierarchical RL using an ensemble of proprioceptive periodic policies. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.

Hierarchical RL using an ensemble of proprioceptive periodic policies. / Marino, Kenneth; Gupta, Abhinav; Szlam, Arthur; Fergus, Robert.

2019. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.

Research output: Contribution to conferencePaper

Marino, K, Gupta, A, Szlam, A & Fergus, R 2019, 'Hierarchical RL using an ensemble of proprioceptive periodic policies', Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States, 5/6/19 - 5/9/19.
Marino K, Gupta A, Szlam A, Fergus R. Hierarchical RL using an ensemble of proprioceptive periodic policies. 2019. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.
Marino, Kenneth ; Gupta, Abhinav ; Szlam, Arthur ; Fergus, Robert. / Hierarchical RL using an ensemble of proprioceptive periodic policies. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.
@conference{8767d9b3be454a578f569bd46ec29876,
title = "Hierarchical RL using an ensemble of proprioceptive periodic policies",
abstract = "In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.",
author = "Kenneth Marino and Abhinav Gupta and Arthur Szlam and Robert Fergus",
year = "2019",
month = "1",
day = "1",
language = "English (US)",
note = "7th International Conference on Learning Representations, ICLR 2019 ; Conference date: 06-05-2019 Through 09-05-2019",

}

TY - CONF

T1 - Hierarchical RL using an ensemble of proprioceptive periodic policies

AU - Marino, Kenneth

AU - Gupta, Abhinav

AU - Szlam, Arthur

AU - Fergus, Robert

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

AB - In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

UR - http://www.scopus.com/inward/record.url?scp=85071158559&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071158559&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85071158559

ER -