Randomized and past-dependent policies for Markov decision processes with multiple constraints

Keith Ross

    Research output: Contribution to journalArticle

    Abstract

    The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K state. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

    Original languageEnglish (US)
    Pages (from-to)474-477
    Number of pages4
    JournalOperations Research
    Volume37
    Issue number3
    StatePublished - May 1989

    Fingerprint

    Markov processes
    Costs
    Markov decision process

    ASJC Scopus subject areas

    • Management Science and Operations Research

    Cite this

    Randomized and past-dependent policies for Markov decision processes with multiple constraints. / Ross, Keith.

    In: Operations Research, Vol. 37, No. 3, 05.1989, p. 474-477.

    Research output: Contribution to journalArticle

    @article{61b2efe81c994603b32af43b90f0af73,
    title = "Randomized and past-dependent policies for Markov decision processes with multiple constraints",
    abstract = "The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K state. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.",
    author = "Keith Ross",
    year = "1989",
    month = "5",
    language = "English (US)",
    volume = "37",
    pages = "474--477",
    journal = "Operations Research",
    issn = "0030-364X",
    publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",
    number = "3",

    }

    TY - JOUR

    T1 - Randomized and past-dependent policies for Markov decision processes with multiple constraints

    AU - Ross, Keith

    PY - 1989/5

    Y1 - 1989/5

    N2 - The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K state. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

    AB - The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K state. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

    UR - http://www.scopus.com/inward/record.url?scp=0024664332&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0024664332&partnerID=8YFLogxK

    M3 - Article

    VL - 37

    SP - 474

    EP - 477

    JO - Operations Research

    JF - Operations Research

    SN - 0030-364X

    IS - 3

    ER -