### Abstract

In this chapter, we consider a class of two-player nonzero-sum stochastic games with incomplete information, which is inspired by recent applications of game theory in network security. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, the player updates the strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. The players' intelligence and rationality are captured by the weighted linear combination of different learning patterns.We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed for special classes of games, namely, games with two actions, and potential games. Results are applied to network security games between an intruder and an administrator, where the noncooperative behaviors are characterized well by the features of distributed hybrid learning.

Original language | English (US) |
---|---|

Title of host publication | Reinforcement Learning and Approximate Dynamic Programming for Feedback Control |

Publisher | John Wiley and Sons |

Pages | 303-329 |

Number of pages | 27 |

ISBN (Print) | 9781118104200 |

DOIs | |

State | Published - Feb 7 2013 |

### Fingerprint

### Keywords

- Games and learning algorithms, attacker/IDS
- Hybrid learning in games, network security
- Multiagent games, learning and control
- New paradigm of hybrid learning CODIPAS-RL
- Players' information limits, of payoff functions

### ASJC Scopus subject areas

- Engineering(all)

### Cite this

*Reinforcement Learning and Approximate Dynamic Programming for Feedback Control*(pp. 303-329). John Wiley and Sons. https://doi.org/10.1002/9781118453988.ch14

**Hybrid Learning in Stochastic Games and Its Application in Network Security.** / Zhu, Quanyan; Hamidou, Tembine; Başar, Tamer.

Research output: Chapter in Book/Report/Conference proceeding › Chapter

*Reinforcement Learning and Approximate Dynamic Programming for Feedback Control.*John Wiley and Sons, pp. 303-329. https://doi.org/10.1002/9781118453988.ch14

}

TY - CHAP

T1 - Hybrid Learning in Stochastic Games and Its Application in Network Security

AU - Zhu, Quanyan

AU - Hamidou, Tembine

AU - Başar, Tamer

PY - 2013/2/7

Y1 - 2013/2/7

N2 - In this chapter, we consider a class of two-player nonzero-sum stochastic games with incomplete information, which is inspired by recent applications of game theory in network security. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, the player updates the strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. The players' intelligence and rationality are captured by the weighted linear combination of different learning patterns.We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed for special classes of games, namely, games with two actions, and potential games. Results are applied to network security games between an intruder and an administrator, where the noncooperative behaviors are characterized well by the features of distributed hybrid learning.

AB - In this chapter, we consider a class of two-player nonzero-sum stochastic games with incomplete information, which is inspired by recent applications of game theory in network security. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, the player updates the strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. The players' intelligence and rationality are captured by the weighted linear combination of different learning patterns.We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed for special classes of games, namely, games with two actions, and potential games. Results are applied to network security games between an intruder and an administrator, where the noncooperative behaviors are characterized well by the features of distributed hybrid learning.

KW - Games and learning algorithms, attacker/IDS

KW - Hybrid learning in games, network security

KW - Multiagent games, learning and control

KW - New paradigm of hybrid learning CODIPAS-RL

KW - Players' information limits, of payoff functions

UR - http://www.scopus.com/inward/record.url?scp=84886348633&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886348633&partnerID=8YFLogxK

U2 - 10.1002/9781118453988.ch14

DO - 10.1002/9781118453988.ch14

M3 - Chapter

SN - 9781118104200

SP - 303

EP - 329

BT - Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

PB - John Wiley and Sons

ER -