UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES.

Frederick J. Beutler, Keith Ross

    Research output: Contribution to journalArticle

    Abstract

    Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.

    Original languageEnglish (US)
    Pages (from-to)644-656
    Number of pages13
    JournalJournal of Applied Probability
    Volume24
    Issue number3
    StatePublished - Sep 1987

    Fingerprint

    Semi-Markov Decision Process
    Uniformization
    Reward
    Constrained Optimization
    Stationary Process
    Anomaly
    Replacement
    Discrepancy
    Markov chain
    Jump
    Policy
    Markov decision process
    Valid
    Generalise

    ASJC Scopus subject areas

    • Mathematics(all)
    • Statistics and Probability

    Cite this

    UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES. / Beutler, Frederick J.; Ross, Keith.

    In: Journal of Applied Probability, Vol. 24, No. 3, 09.1987, p. 644-656.

    Research output: Contribution to journalArticle

    Beutler, Frederick J. ; Ross, Keith. / UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES. In: Journal of Applied Probability. 1987 ; Vol. 24, No. 3. pp. 644-656.
    @article{fae124a7577c4488a134d26e32c7a80e,
    title = "UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES.",
    abstract = "Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.",
    author = "Beutler, {Frederick J.} and Keith Ross",
    year = "1987",
    month = "9",
    language = "English (US)",
    volume = "24",
    pages = "644--656",
    journal = "Journal of Applied Probability",
    issn = "0021-9002",
    publisher = "University of Sheffield",
    number = "3",

    }

    TY - JOUR

    T1 - UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES.

    AU - Beutler, Frederick J.

    AU - Ross, Keith

    PY - 1987/9

    Y1 - 1987/9

    N2 - Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.

    AB - Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.

    UR - http://www.scopus.com/inward/record.url?scp=0023420571&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0023420571&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:0023420571

    VL - 24

    SP - 644

    EP - 656

    JO - Journal of Applied Probability

    JF - Journal of Applied Probability

    SN - 0021-9002

    IS - 3

    ER -