### Abstract

The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

Original language | English (US) |
---|---|

Pages (from-to) | 6732-6741 |

Number of pages | 10 |

Journal | Advances in Neural Information Processing Systems |

Volume | 2018-December |

State | Published - Jan 1 2018 |

Event | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada Duration: Dec 2 2018 → Dec 8 2018 |

### ASJC Scopus subject areas

- Computer Networks and Communications
- Information Systems
- Signal Processing

### Cite this

*Advances in Neural Information Processing Systems*,

*2018-December*, 6732-6741.

**Policy regret in repeated games.** / Arora, Raman; Dinitz, Michael; Marinov, Teodor V.; Mohri, Mehryar.

Research output: Contribution to journal › Conference article

*Advances in Neural Information Processing Systems*, vol. 2018-December, pp. 6732-6741.

}

TY - JOUR

T1 - Policy regret in repeated games

AU - Arora, Raman

AU - Dinitz, Michael

AU - Marinov, Teodor V.

AU - Mohri, Mehryar

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

AB - The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold.

UR - http://www.scopus.com/inward/record.url?scp=85064803183&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064803183&partnerID=8YFLogxK

M3 - Conference article

VL - 2018-December

SP - 6732

EP - 6741

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -