Approximate dynamic programming for optimal stationary control with control-dependent noise

Research output: Contribution to journalArticle

Abstract

This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamic programming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using It calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.

Original languageEnglish (US)
Article number6026952
Pages (from-to)2392-2398
Number of pages7
JournalIEEE Transactions on Neural Networks
Volume22
Issue number12 PART 2
DOIs
StatePublished - Dec 2011

Fingerprint

Dynamic programming
Noise
Costs and Cost Analysis
Costs
Riccati equations
Calculi
Reinforcement learning
Learning
Efficiency

Keywords

  • Approximate dynamic programming
  • control-dependent noise
  • optimal stationary control
  • stochastic systems

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Software
  • Medicine(all)

Cite this

Approximate dynamic programming for optimal stationary control with control-dependent noise. / Jiang, Yu; Jiang, Zhong-Ping.

In: IEEE Transactions on Neural Networks, Vol. 22, No. 12 PART 2, 6026952, 12.2011, p. 2392-2398.

Research output: Contribution to journalArticle

@article{b64f20c88b914b7b9f23468df4e4b187,
title = "Approximate dynamic programming for optimal stationary control with control-dependent noise",
abstract = "This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamic programming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using It calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.",
keywords = "Approximate dynamic programming, control-dependent noise, optimal stationary control, stochastic systems",
author = "Yu Jiang and Zhong-Ping Jiang",
year = "2011",
month = "12",
doi = "10.1109/TNN.2011.2165729",
language = "English (US)",
volume = "22",
pages = "2392--2398",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "12 PART 2",

}

TY - JOUR

T1 - Approximate dynamic programming for optimal stationary control with control-dependent noise

AU - Jiang, Yu

AU - Jiang, Zhong-Ping

PY - 2011/12

Y1 - 2011/12

N2 - This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamic programming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using It calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.

AB - This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamic programming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using It calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.

KW - Approximate dynamic programming

KW - control-dependent noise

KW - optimal stationary control

KW - stochastic systems

UR - http://www.scopus.com/inward/record.url?scp=83655167263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83655167263&partnerID=8YFLogxK

U2 - 10.1109/TNN.2011.2165729

DO - 10.1109/TNN.2011.2165729

M3 - Article

C2 - 21954203

AN - SCOPUS:83655167263

VL - 22

SP - 2392

EP - 2398

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 12 PART 2

M1 - 6026952

ER -