API#
Subpackages#
Submodules#
emdp.actions module#
emdp.analytic module#
Tools to get analytic solutions from MDPs.
we can compute \(v_\pi(s)\) recursively by solving the system of Bellman equations below [Bellman1957]:
These equations can also be written in matrix form with \(\mathbf{v}_\pi, \mathbf{r}_\pi \in \mathbb{R}^{|\mathcal{S}|}\) and \(\mathbf{p}_\pi \in \mathbb{R}^{|S|\times|S|}\):
- Bellman1957(1,2)
Bellman, Richard. 1957. “A Markovian Decision Process.” Journal of mathematics and mechanics: 679–684.
- emdp.analytic.calculate_P_pi(P, pi)[source]#
Calculates the transition matrix \(P\) under policy \(pi\). \(p_\pi:=Pr(s'|s,a\sim\pi))\), which is represented as a matrix of shape \(|\mathcal{S}|\times|\mathcal{S}|\).
\[p_\pi(s,s') = \sum_a \pi(a|s) p(s'|s, a)\]where \(s\) and \(s'\) are the states before and after taking action \(a\).
- Parameters
P (np.ndarray) – transition matrix of size \(|\mathcal{S}|\times|\mathcal{A}|\times|\mathcal{S}|\)
pi (np.ndarray) – matrix of size \(|\mathcal{S}|\times|\mathcal{A}|\) indicating the policy
- Returns
a matrix of size \(|\mathcal{S}|\times|\mathcal{S}|\)
- Return type
np.ndarray
- emdp.analytic.calculate_R_pi(R, pi)[source]#
Calculates the expected reward \(r_\pi\) under policy \(\pi\), which is represented as a matrix of shape \(|\mathcal{S}|\).
\[r_\pi(s) = \sum_a \pi(a|s) r(s,a)\]- Parameters
R (np.ndarray) – reward matrix of size \(|\mathcal{S}|\times|\mathcal{A}|\)
pi (np.ndarray) – matrix of size \(|\mathcal{S}|\times|\mathcal{A}|\) indicating the policy
- Returns
a matrix of size \(|\mathcal{S}|\)
- Return type
np.ndarray
- emdp.analytic.calculate_V_pi(P, R, pi, gamma)[source]#
Calculates the state-value \(v_\pi\) from the successor representation using the analytic form:
\[(\mathbf{I} - \gamma \mathbf{p}_\pi)^{-1} \mathbf{r}_\pi\]where \(p_\pi(s,t) = \sum_a \pi(a|s) p(t|s, a)\) and \(r_\pi(s) = \sum_a \pi(a|s) r(s,a)\)
see also
emdp.analytic.calculate_P_pi()andemdp.analytic.calculate_R_pi().Note
we can compute \(v_\pi(s)\) recursively by solving the system of Bellman equations below [Bellman1957]:
\[\begin{split}\begin{align} v_\pi(s) &= \sum_{a} \left[ \pi(a|s) \left( r(s,a) + \gamma \sum_{s'} p(s'|s,a) v_\pi(s') \right) \right] \\ &=\sum_a \pi(a|s)r(s,a) + \gamma \sum_{s'} \left[ \left(\sum_a \pi(a|s)p(s'|s,a)\right) v_\pi(s') \right] \\ &=r_\pi(s) + \gamma \sum_{s'} p_\pi(s'|s) v_\pi(s') \end{align}\end{split}\]These equations can also be written in matrix form with \(\mathbf{v}_\pi, \mathbf{r}_\pi \in \mathbb{R}^{|\mathcal{S}|}\) and \(\mathbf{p}_\pi \in \mathbb{R}^{|S|\times|S|}\):
\[\begin{split}\begin{align} \mathbf{v}_\pi &= \mathbf{r}_\pi + \gamma \mathbf{p}_\pi \mathbf{v}_\pi \\ &= (\mathbf{I} - \gamma \mathbf{p}_\pi)^{-1} \mathbf{r}_\pi \\ &= \Phi \mathbf{r}_\pi \end{align}\end{split}\]- Parameters
P (np.ndarray) – Transition matrix
R (np.ndarray) – Reward matrix
pi (np.ndarray) – policy matrix
gamma (float) – discount factor
- Returns
state-value vector under policy \(\pi\).
- Return type
np.ndarray
- emdp.analytic.calculate_V_pi_from_successor_representation(Phi, R_pi)[source]#
Calculates the state-value vector \(\mathbf{v}_\pi\) from the successor representation \(\Phi\) and the expected reward \(\mathbf{r}_\pi\).
see also:
emdp.analytic.calculate_V_pi()- Parameters
Phi (np.ndarray) – successor representation of size \(|\mathcal{S}|\times|\mathcal{S}|\)
R_pi (np.ndarray) – expected reward of size \(|\mathcal{S}|\)
- Returns
value function of size \(|\mathcal{S}|\)
- Return type
np.ndarray
- emdp.analytic.calculate_successor_representation(P_pi, gamma)[source]#
Calculates the successor representation \(\Phi\)
\[\Phi := (\mathbf{I} - \gamma \mathbf{p}_\pi)^{-1}\]see also:
emdp.analytic.calculate_V_pi()- Parameters
P_pi –
gamma –
- Returns
successor representation
- Return type
np.ndarray
emdp.common module#
emdp.exceptions module#
emdp.torch_analytic module#
Tools to get analytic solutions from MDPs.
These functions are differentiable as they are written in torch.
emdp.utils module#
- emdp.utils.convert_int_rep_to_onehot(state, vector_size: int)[source]#
convert the int representation of a state (or states) to onehot representation.
Examples
>>> convert_int_rep_to_onehot(1,5) array([0, 1, 0, 0, 0])
>>> convert_int_rep_to_onehot(np.array([1,2]),5) array([[0, 1, 0, 0, 0], [0, 0, 1, 0, 0]])
- Parameters
state – int representation of state (or states).
vector_size (int) – size of onehot representation.
- Returns
onehot representation of state (or states).
- Return type
np.ndarray
- emdp.utils.convert_onehot_to_int(state: numpy.ndarray)[source]#
convert the onehot representation of a state (or states) to index (or indices).
Examples
>>> convert_onehot_to_int(np.array([0,0,0,1,0])) 3
>>> convert_onehot_to_int(np.array([[0,0,0,1,0],[0,1,0,0,0]])) array([3, 1])
- Parameters
state (np.ndarray) – onehot representation of state (or states).
- Returns
index (or indices).