A two-state partially observable Markov decision process with uniformly distributed observations

被引：19

作者：

GrosfeldNir, A ^{[1
]}

机构：

[1] NORTHWESTERN UNIV,EVANSTON,IL 60208

来源：

OPERATIONS RESEARCH | 1996年 / 44卷 / 03期

关键词：

D O I：

10.1287/opre.44.3.458

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

A controller observes a production system periodically, over time. If the system is in the GOOD state during one period, there is a constant probability that it will deteriorate and be in the BAD state during the next period (and remains there). The true state of the system is unobservable and on only be inferred from observations (quality of output). Two actions are available: CONTINUE or REPLACE (for a fixed cost). The objective is to maximize the expected discounted value of the total future income. For both the finite- and infinite-horizon problems, the optimal policy is of a CONTROL LIMIT (CLT) type: continue if the good state probability exceeds the CLT, and replace otherwise. The computation of the CLT involves a functional equation. An analytical solution for this equation is as yet unknown. For uniformly distributed observations we obtain the infinite-horizon CLT analytically. We also show that the finite horizon CLTs, as a function of the time remaining, are not necessarily monotone, which is counterintuitive.

引用

页码：458 / 463

页数：6

共 14 条

[1] STRUCTURAL RESULTS FOR PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES [J].