Policy regret in repeated games

Raman Arora, Michael Dinitz, Teodor V. Marinov, Mehryar Mohri

Research output: Contribution to journalConference article

Abstract

The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

Original languageEnglish (US)
Pages (from-to)6732-6741
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - Jan 1 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Arora, R., Dinitz, M., Marinov, T. V., & Mohri, M. (2018). Policy regret in repeated games. Advances in Neural Information Processing Systems, 2018-December, 6732-6741.

Policy regret in repeated games. / Arora, Raman; Dinitz, Michael; Marinov, Teodor V.; Mohri, Mehryar.

In: Advances in Neural Information Processing Systems, Vol. 2018-December, 01.01.2018, p. 6732-6741.

Research output: Contribution to journalConference article

Arora, R, Dinitz, M, Marinov, TV & Mohri, M 2018, 'Policy regret in repeated games', Advances in Neural Information Processing Systems, vol. 2018-December, pp. 6732-6741.
Arora R, Dinitz M, Marinov TV, Mohri M. Policy regret in repeated games. Advances in Neural Information Processing Systems. 2018 Jan 1;2018-December:6732-6741.
Arora, Raman ; Dinitz, Michael ; Marinov, Teodor V. ; Mohri, Mehryar. / Policy regret in repeated games. In: Advances in Neural Information Processing Systems. 2018 ; Vol. 2018-December. pp. 6732-6741.
@article{489d1bb2c30948028c0c55889cc8616b,
title = "Policy regret in repeated games",
abstract = "The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.",
author = "Raman Arora and Michael Dinitz and Marinov, {Teodor V.} and Mehryar Mohri",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018-December",
pages = "6732--6741",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Policy regret in repeated games

AU - Arora, Raman

AU - Dinitz, Michael

AU - Marinov, Teodor V.

AU - Mohri, Mehryar

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

AB - The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

UR - http://www.scopus.com/inward/record.url?scp=85064803183&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064803183&partnerID=8YFLogxK

M3 - Conference article

VL - 2018-December

SP - 6732

EP - 6741

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -