Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

Yunhan Huang, Quanyan Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

Original languageEnglish (US)
Title of host publicationDecision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings
EditorsTansu Alpcan, Yevgeniy Vorobeychik, John S. Baras, György Dán
PublisherSpringer
Pages217-237
Number of pages21
ISBN (Print)9783030324292
DOIs
StatePublished - Jan 1 2019
Event10th International Conference on Decision and Game Theory for Security, GameSec 2019 - Stockholm, Sweden
Duration: Oct 30 2019Nov 1 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11836 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Decision and Game Theory for Security, GameSec 2019
CountrySweden
CityStockholm
Period10/30/1911/1/19

Fingerprint

Reinforcement learning
Reinforcement Learning
Manipulation
Costs
Q-learning
Attack
Vulnerability
Hazard
Learning algorithms
Learning Algorithm
Hazards
Control System
Converge
Control systems
Water
Policy
Learning

Keywords

  • Adversarial learning
  • Cybersecurity
  • Deception and counterdeception
  • Q-learning
  • Reinforcement learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Huang, Y., & Zhu, Q. (2019). Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals. In T. Alpcan, Y. Vorobeychik, J. S. Baras, & G. Dán (Eds.), Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings (pp. 217-237). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11836 LNCS). Springer . https://doi.org/10.1007/978-3-030-32430-8_14

Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals. / Huang, Yunhan; Zhu, Quanyan.

Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings. ed. / Tansu Alpcan; Yevgeniy Vorobeychik; John S. Baras; György Dán. Springer , 2019. p. 217-237 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11836 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Huang, Y & Zhu, Q 2019, Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals. in T Alpcan, Y Vorobeychik, JS Baras & G Dán (eds), Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11836 LNCS, Springer , pp. 217-237, 10th International Conference on Decision and Game Theory for Security, GameSec 2019, Stockholm, Sweden, 10/30/19. https://doi.org/10.1007/978-3-030-32430-8_14
Huang Y, Zhu Q. Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals. In Alpcan T, Vorobeychik Y, Baras JS, Dán G, editors, Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings. Springer . 2019. p. 217-237. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-32430-8_14
Huang, Yunhan ; Zhu, Quanyan. / Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals. Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings. editor / Tansu Alpcan ; Yevgeniy Vorobeychik ; John S. Baras ; György Dán. Springer , 2019. pp. 217-237 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{829bdadeb2414947a90d02c1254e6345,
title = "Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals",
abstract = "This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.",
keywords = "Adversarial learning, Cybersecurity, Deception and counterdeception, Q-learning, Reinforcement learning",
author = "Yunhan Huang and Quanyan Zhu",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-32430-8_14",
language = "English (US)",
isbn = "9783030324292",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "217--237",
editor = "Tansu Alpcan and Yevgeniy Vorobeychik and Baras, {John S.} and Gy{\"o}rgy D{\'a}n",
booktitle = "Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings",

}

TY - GEN

T1 - Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

AU - Huang, Yunhan

AU - Zhu, Quanyan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

AB - This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

KW - Adversarial learning

KW - Cybersecurity

KW - Deception and counterdeception

KW - Q-learning

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85076422155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076422155&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-32430-8_14

DO - 10.1007/978-3-030-32430-8_14

M3 - Conference contribution

AN - SCOPUS:85076422155

SN - 9783030324292

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 217

EP - 237

BT - Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings

A2 - Alpcan, Tansu

A2 - Vorobeychik, Yevgeniy

A2 - Baras, John S.

A2 - Dán, György

PB - Springer

ER -