Successor Representation#
we can compute \(v_\pi(s)\) recursively by solving the system of Bellman equations below [Bellman1957]:
\[\begin{split}\begin{align}
v_\pi(s) &= \sum_{a} \left[
\pi(a|s) \left( r(s,a) + \gamma \sum_{s'} p(s'|s,a) v_\pi(s') \right)
\right] \\
&=\sum_a \pi(a|s)r(s,a) + \gamma \sum_{s'} \left[ \left(\sum_a \pi(a|s)p(s'|s,a)\right) v_\pi(s') \right] \\
&=r_\pi(s) + \gamma \sum_{s'} p_\pi(s'|s) v_\pi(s')
\end{align}\end{split}\]
These equations can also be written in matrix form with \(\mathbf{v}_\pi, \mathbf{r}_\pi \in \mathbb{R}^{|\mathcal{S}|}\) and \(\mathbf{p}_\pi \in \mathbb{R}^{|S|\times|S|}\):
\[\begin{split}\begin{align}
\mathbf{v}_\pi &= \mathbf{r}_\pi + \gamma \mathbf{p}_\pi \mathbf{v}_\pi \\
&= (\mathbf{I} - \gamma \mathbf{p}_\pi)^{-1} \mathbf{r}_\pi \\
&= \Phi \mathbf{r}_\pi
\end{align}\end{split}\]
- Bellman1957
Bellman, Richard. 1957. “A Markovian Decision Process.” Journal of mathematics and mechanics: 679–684.
see also: emdp.analytic