Intrinsic motivation and automatic curricula via asymmetric self-play

Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Robert Fergus

Research output: Contribution to conferencePaper

Abstract

We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will “propose” the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
CountryCanada
CityVancouver
Period4/30/185/3/18

Fingerprint

intrinsic motivation
Curricula
curriculum
reward
Curriculum
Intrinsic Motivation
Reward

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language

Cite this

Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., & Fergus, R. (2018). Intrinsic motivation and automatic curricula via asymmetric self-play. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Intrinsic motivation and automatic curricula via asymmetric self-play. / Sukhbaatar, Sainbayar; Lin, Zeming; Kostrikov, Ilya; Synnaeve, Gabriel; Szlam, Arthur; Fergus, Robert.

2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Research output: Contribution to conferencePaper

Sukhbaatar, S, Lin, Z, Kostrikov, I, Synnaeve, G, Szlam, A & Fergus, R 2018, 'Intrinsic motivation and automatic curricula via asymmetric self-play' Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, 4/30/18 - 5/3/18, .
Sukhbaatar S, Lin Z, Kostrikov I, Synnaeve G, Szlam A, Fergus R. Intrinsic motivation and automatic curricula via asymmetric self-play. 2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
Sukhbaatar, Sainbayar ; Lin, Zeming ; Kostrikov, Ilya ; Synnaeve, Gabriel ; Szlam, Arthur ; Fergus, Robert. / Intrinsic motivation and automatic curricula via asymmetric self-play. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
@conference{5b9eab21865d4131a41ecd77cf29a524,
title = "Intrinsic motivation and automatic curricula via asymmetric self-play",
abstract = "We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will “propose” the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.",
author = "Sainbayar Sukhbaatar and Zeming Lin and Ilya Kostrikov and Gabriel Synnaeve and Arthur Szlam and Robert Fergus",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
note = "6th International Conference on Learning Representations, ICLR 2018 ; Conference date: 30-04-2018 Through 03-05-2018",

}

TY - CONF

T1 - Intrinsic motivation and automatic curricula via asymmetric self-play

AU - Sukhbaatar, Sainbayar

AU - Lin, Zeming

AU - Kostrikov, Ilya

AU - Synnaeve, Gabriel

AU - Szlam, Arthur

AU - Fergus, Robert

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will “propose” the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

AB - We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will “propose” the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

UR - http://www.scopus.com/inward/record.url?scp=85054619737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054619737&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85054619737

ER -