Welcome#

Welcome to emdp’s documentation!

Easy MDPs implemented in a gym like interface with access to transition dynamics.

Background#

MDP#

The Markov Decision Process (MDP) is defined as a five tuple \((\mathcal{S} ,\mathcal{A} ,r ,P ,\gamma)\), where \(\mathcal{S}\) is a set of states, \(\mathcal{A}\) is a set of actions, \(r:\mathcal{S}\times\mathcal{A} \mapsto \mathbb{R}\) is a reward function, \(P:\mathcal{S}\times\mathcal{A}\mapsto \Pr(\mathcal{S})\) is a state transition function, and \(\gamma \in (0,1]\) is a discount factor.

Contents#

Indices and tables#