DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES.

Frederick J. Beutler, Keith Ross

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    A continuous-time average-reward Markov-decision-process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP). Customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. An analogous uniformization result is derived which is applicable to semi-Markov decision process (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, a past-dependent policy on a suitably constructed DMDP is specified. The new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete-time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a nonrandomized policy is optimal for the constrained problem.

    Original languageEnglish (US)
    Title of host publicationProceedings of the IEEE Conference on Decision and Control
    PublisherIEEE
    Pages1122-1123
    Number of pages2
    StatePublished - 1985

    Fingerprint

    Costs

    ASJC Scopus subject areas

    • Chemical Health and Safety
    • Control and Systems Engineering
    • Safety, Risk, Reliability and Quality

    Cite this

    Beutler, F. J., & Ross, K. (1985). DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES. In Proceedings of the IEEE Conference on Decision and Control (pp. 1122-1123). IEEE.

    DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES. / Beutler, Frederick J.; Ross, Keith.

    Proceedings of the IEEE Conference on Decision and Control. IEEE, 1985. p. 1122-1123.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Beutler, FJ & Ross, K 1985, DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES. in Proceedings of the IEEE Conference on Decision and Control. IEEE, pp. 1122-1123.
    Beutler FJ, Ross K. DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES. In Proceedings of the IEEE Conference on Decision and Control. IEEE. 1985. p. 1122-1123
    Beutler, Frederick J. ; Ross, Keith. / DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES. Proceedings of the IEEE Conference on Decision and Control. IEEE, 1985. pp. 1122-1123
    @inproceedings{7e84169faee74db1b405bacc229a459a,
    title = "DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES.",
    abstract = "A continuous-time average-reward Markov-decision-process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP). Customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. An analogous uniformization result is derived which is applicable to semi-Markov decision process (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, a past-dependent policy on a suitably constructed DMDP is specified. The new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete-time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a nonrandomized policy is optimal for the constrained problem.",
    author = "Beutler, {Frederick J.} and Keith Ross",
    year = "1985",
    language = "English (US)",
    pages = "1122--1123",
    booktitle = "Proceedings of the IEEE Conference on Decision and Control",
    publisher = "IEEE",

    }

    TY - GEN

    T1 - DISCRETE-TIME EQUIVALENCE FOR CONSTRAINED SEMI-MARKOV DECISION PROCESSES.

    AU - Beutler, Frederick J.

    AU - Ross, Keith

    PY - 1985

    Y1 - 1985

    N2 - A continuous-time average-reward Markov-decision-process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP). Customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. An analogous uniformization result is derived which is applicable to semi-Markov decision process (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, a past-dependent policy on a suitably constructed DMDP is specified. The new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete-time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a nonrandomized policy is optimal for the constrained problem.

    AB - A continuous-time average-reward Markov-decision-process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP). Customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. An analogous uniformization result is derived which is applicable to semi-Markov decision process (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, a past-dependent policy on a suitably constructed DMDP is specified. The new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete-time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a nonrandomized policy is optimal for the constrained problem.

    UR - http://www.scopus.com/inward/record.url?scp=0022290735&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0022290735&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 1122

    EP - 1123

    BT - Proceedings of the IEEE Conference on Decision and Control

    PB - IEEE

    ER -