A partially observable Markov decision process (POMDP) can be informally defined as a world in which an agent can take actions and gain rewards. The world has a set of possible world states S. This set can be finite (e.g. mine sweeper) or infinite (e. g. a robotic car in a parking lot). The world is only partially observable, so if we try to program an agent to act in this world, the robot does not know the state of the entire world, rather it only gets observations that partially reveal the state of the world. The set of possible observations is typically called Ω. The agent in a POMDP can take actions. The actions set is usually called A. The actions affect the world and result in rewards which also depend on the state of the world. (Technically, for every world state s and action a there is a reward R(s, a). R(s, a) is a real number. Also, for every world state s and action a there is a probability distribution of new possible world states that result after taking action a when the world is in state s.)
You are currently browsing the monthly archive for May 2015.