Welcome#
Welcome to emdp’s documentation!
Easy MDPs implemented in a gym like interface with access to transition dynamics.
Background#
MDP#
The Markov Decision Process (MDP) is defined as a five tuple \((\mathcal{S} ,\mathcal{A} ,r ,P ,\gamma)\), where \(\mathcal{S}\) is a set of states, \(\mathcal{A}\) is a set of actions, \(r:\mathcal{S}\times\mathcal{A} \mapsto \mathbb{R}\) is a reward function, \(P:\mathcal{S}\times\mathcal{A}\mapsto \Pr(\mathcal{S})\) is a state transition function, and \(\gamma \in (0,1]\) is a discount factor.