TARGET-LEVEL CRITERION IN MARKOV DECISION-PROCESSES

被引:38
作者
BOUAKIZ, M
KEBIR, Y
机构
[1] LOYOLA UNIV,DEPT MANAGEMENT SCI,CHICAGO,IL 60611
[2] LOYOLA UNIV,DEPT MATH SCI,CHICAGO,IL 60611
关键词
MARKOV DECISION PROCESSES; TARGET-LEVEL CRITERION; FIXED POINTS; DYNAMIC PROGRAMMING; SUCCESSIVE APPROXIMATIONS;
D O I
10.1007/BF02193458
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programming equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 13 条
[1]   DISCOUNTED MDP - DISTRIBUTION-FUNCTIONS AND EXPONENTIAL UTILITY MAXIMIZATION [J].
CHUNG, KJ ;
SOBEL, MJ .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1987, 25 (01) :49-62
[2]  
Dubins L.E., 1976, INEQUALITIES STOCHAS
[3]   PERCENTILES AND MARKOVIAN DECISION-PROCESSES [J].
FILAR, JA .
OPERATIONS RESEARCH LETTERS, 1983, 2 (01) :13-15
[4]  
HENIG MI, 1984, TARGENT PERCENTILE C
[5]  
HEYMAN DP, 1984, STOCHASTIC MODELS OP, V2
[6]   PREFERENCE ORDER DYNAMIC PROGRAM FOR A STOCHASTIC TRAVELING SALESMAN PROBLEM [J].
KAO, EPC .
OPERATIONS RESEARCH, 1978, 26 (06) :1033-1045
[7]  
KUMARASWAMY S, 1983, MANAGE SCI, V29, P512
[8]  
LAU HS, 1980, J OPERATIONAL RES SO, V26, P525
[9]  
RENDELMAN RJ, 1987, FINANCIAL ANAL J MAY, P27
[10]  
ROSS SM, 1970, APPLIED PROBABILITY