Percentile performance criteria for limiting average Markov decision processes

Jerzy A. Filar, Dmitry Krass, Keith Ross

    Research output: Contribution to journalArticle

    Abstract

    In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP's): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.

    Original languageEnglish (US)
    Pages (from-to)2-10
    Number of pages9
    JournalIEEE Transactions on Automatic Control
    Volume40
    Issue number1
    DOIs
    StatePublished - Jan 1995

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Electrical and Electronic Engineering

    Cite this

    Percentile performance criteria for limiting average Markov decision processes. / Filar, Jerzy A.; Krass, Dmitry; Ross, Keith.

    In: IEEE Transactions on Automatic Control, Vol. 40, No. 1, 01.1995, p. 2-10.

    Research output: Contribution to journalArticle

    Filar, Jerzy A. ; Krass, Dmitry ; Ross, Keith. / Percentile performance criteria for limiting average Markov decision processes. In: IEEE Transactions on Automatic Control. 1995 ; Vol. 40, No. 1. pp. 2-10.
    @article{0866fbde10e54c44a2e7d15a0496d2a1,
    title = "Percentile performance criteria for limiting average Markov decision processes",
    abstract = "In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP's): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.",
    author = "Filar, {Jerzy A.} and Dmitry Krass and Keith Ross",
    year = "1995",
    month = "1",
    doi = "10.1109/9.362904",
    language = "English (US)",
    volume = "40",
    pages = "2--10",
    journal = "IEEE Transactions on Automatic Control",
    issn = "0018-9286",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    number = "1",

    }

    TY - JOUR

    T1 - Percentile performance criteria for limiting average Markov decision processes

    AU - Filar, Jerzy A.

    AU - Krass, Dmitry

    AU - Ross, Keith

    PY - 1995/1

    Y1 - 1995/1

    N2 - In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP's): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.

    AB - In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP's): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.

    UR - http://www.scopus.com/inward/record.url?scp=0029219995&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0029219995&partnerID=8YFLogxK

    U2 - 10.1109/9.362904

    DO - 10.1109/9.362904

    M3 - Article

    AN - SCOPUS:0029219995

    VL - 40

    SP - 2

    EP - 10

    JO - IEEE Transactions on Automatic Control

    JF - IEEE Transactions on Automatic Control

    SN - 0018-9286

    IS - 1

    ER -