emdp.gridworld package#
Submodules#
emdp.gridworld.builder_tools module#
Utilities to help build more complex grid worlds.
- class emdp.gridworld.builder_tools.TransitionMatrixBuilder(grid_size, action_space=4, has_terminal_state=True)[source]#
Bases:
objectBuilder object to build a transition matrix for a grid world
- property P#
Returns a new array with the transition matrix built so far.
- Parameters
nocopy (bool, optional) – Defaults to False.
- Returns
the transition model matrix
- Return type
np.array
- add_grid(terminal_states: Optional[List[int]] = None, p_success: float = 1)[source]#
Adds a grid so that you cant walk off the edges of the grid
- Parameters
terminal_states (List[int], optional) – Terminal states. Defaults to
[].p_success (float, optional) – Defaults to 1.
- Raises
ValueError –
- add_wall_at(tuple_location)[source]#
Add a blockade at this position :param tuple_location: (x,y) location of the wall :return:
- add_wall_between(start: Tuple[int, int], end: Tuple[int, int])[source]#
Adds a wall between the starting and ending location
- Parameters
start (Tuple[int,int]) – tuple (x,y) representing the starting position of the wall
end (Tuple[int,int]) – tuple (x,y) representing the ending position of the wall
- Raises
ValueError –
- emdp.gridworld.builder_tools.build_simple_grid_world_with_terminal_states(reward_spec, size, p_success=1, gamma=0.99, seed=2017, start_state=0)[source]#
A simple size x size grid world where agents actions has a prob of p_success of executing correctly. rewards are given by a dict where the indices and the x,y positions and the value is the magnitude of the reward. Upon reaching a state with a reward, every action gives a reward. The episode then goes to an absorbing state and terminates.
- Parameters
reward_spec – Reward specification
size – Size of the gridworld (grid world will be size x size)
p_success – The probability the action is successful.
gamma – The discount factor.
seed – Seed for the GridWorldMDP object.
start_state – The index of the starding state.
- Returns
- emdp.gridworld.builder_tools.build_simple_grid_world_without_terminal_states(reward_spec, size, p_success=1, gamma=0.99, seed=2017, start_state=0)[source]#
A simple size x size grid world where agents actions has a prob of p_success of executing correctly. rewards are given by a dict where the indices and the x,y positions and the value is the magnitude of the reward. Upon reaching a state with a reward, every action gives a reward. The episode does not terminate.
- Parameters
reward_spec – Reward specification
size – Size of the gridworld (grid world will be size x size)
p_success – The probability the action is successful.
gamma – The discount factor.
seed – Seed for the GridWorldMDP object.
start_state – The index of the starting state.
- Returns
- emdp.gridworld.builder_tools.create_reward_matrix(state_space, size, reward_spec: Dict[Tuple[int, int], float], action_space=4)[source]#
Abstraction to create reward matrices.
- Parameters
state_space (int) – Size of the state space, \(|\mathcal{S}|\).
size (int) – size of the gird world (width or height).
reward_spec (Dict[Tuple[int,int], float]) – the reward specification.
action_space (int) – the size of the action space
- Returns
the reward matrix.
- Return type
np.ndarray
emdp.gridworld.env module#
A simple grid world environment
- class emdp.gridworld.env.GridWorldMDP(P, R, gamma, p0, terminal_states: List[Tuple[int, int]], size: int, seed=1337, skip_check=False, convert_terminal_states_to_ints=False)[source]#
Bases:
emdp.common.MDPNote
if
terminal_statesis not empty then there will be an absorbing state. So the actual number of states will be \(size^2 + 1\) if there is a terminal state, it should be the last one.- Parameters
P (np.ndarray) – state transition matrix \(P: \mathcal{S}\times\mathcal{A}\times\mathcal{S}\mapsto\mathbb{R}\), the shape is \(|S| \times |A| \times |S|\).
R (np.ndarray) – reward matrix \(r: \mathcal{S}\times \mathcal{A}\mapsto \mathbb{R}\), the shape is:math:|S| times |A|.
gamma (float) – discount factor \(\gamma\)
p0 (np.ndarray) – initial starting distribution \(p_0\). The array shape is \(|\mathcal{S}|=size\times size\).
terminal_states (List[Tuple[int,int]]) – Must be a list of (x,y) tuples. use skip_terminal_state_conversion if giving ints
size (int) – the size of the grid world (i.e there are \(size \times size + 1 = |\mathcal{S}|\) states in total).
seed (int, optional) – the random seed for simulations. Defaults to 1337.
skip_check (bool, optional) – _description_. Defaults to False.
convert_terminal_states_to_ints (bool, optional) – _description_. Defaults to False.
- flatten_state(state)[source]#
Flatten state (row, col) into a one-hot vector.
see also:
emdp.gridworld.helper_utilities.flatten_state()- Parameters
state (Tuple[int,int]) – (row, col) pair
- Returns
one-hot vector of shape (size * size)
- Return type
np.ndarray
- unflatten_state(onehot) Tuple[int, int][source]#
Unflatten a one-hot state vector into a (row, col) pair
see also:
emdp.gridworld.helper_utilities.unflatten_state()- Parameters
onehot (np.ndarray) – one-hot vector of shape (size, size)
- Returns
(row, col) pair
- Return type
Tuple[int,int]
emdp.gridworld.helper_utilities module#
- emdp.gridworld.helper_utilities.is_P_valid_stochastic(P: numpy.ndarray) bool[source]#
return
Trueis transition modelPis a valid stochastic transition model. \(P\) is a valid stochastic transition model if\[\sum_{s'\in\mathcal{S}} Pr(s'|s,a) = 1\]- Parameters
P (np.ndarray) – transition model.
- Return type
bool
- emdp.gridworld.helper_utilities.flatten_state(state, size, state_space)[source]#
Flatten state as (row, col) pair into a one-hot vector.
Example
>>> flatten_state((1,2), 3, 9) array([0, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int32)
- Parameters
state (Tuple[int, int]) – (row, col) pair
size (int) – width (number of columns) of the grid world.
state_space (int) – size of the state space, i.e. \(|\mathcal{S}|\).
- Returns
one-hot representation of the state.
- Return type
np.ndarray
- emdp.gridworld.helper_utilities.unflatten_state(onehot: numpy.ndarray, size, has_absorbing_state: bool) Tuple[int, int][source]#
Unflatten a one-hot vector into a (row, col) pair.
Examples
>>> unflatten_state(np.array([0,0,0,1]), 2, False) (1, 1)
>>> unflatten_state(np.array([0, 0, 0, 0, 0, 1, 0, 0, 0]), 3, False) (1, 2)
- Parameters
onehot (np.ndarray) – one hot representation of a state
size (int) – size of the grid world
has_absorbing_state (bool) – whether the grid world has an absorbing state
- Returns
(row, col) pair
- Return type
Tuple[int,int]
- emdp.gridworld.helper_utilities.get_state_after_executing_action(action, state, grid_size)[source]#
Gets the state after executing an action
- Parameters
action –
state –
grid_size –
- Returns
- emdp.gridworld.helper_utilities.check_can_take_action(action, state, grid_size)[source]#
checks if you can take an action in a state. :param action: :param state: :param grid_size: :return:
- emdp.gridworld.helper_utilities.get_possible_actions(state, grid_size)[source]#
Gets all possible actions at a given state.
- Parameters
state (_type_) – _description_
grid_size (_type_) – _description_
- Returns
_description_
- Return type
_type_
- emdp.gridworld.helper_utilities.build_simple_grid(size=5, terminal_states: Optional[List] = None, p_success=1)[source]#
Builds a simple grid where an agent can move LEFT, RIGHT, UP or DOWN and actions success with probability
p_success. A terminal state is added iflen(terminal_states) > 0and will return matrix of size \((|S|+1)\times|A|\times(|S|+1)\).Moving into walls does nothing.
Examples
Builds a simple 5x5 grid world where there is a terminal state at (0, 4). The probability of successfully executing the action is 0.9. This function returns the transition matrix.
>>> grid = build_simple_grid(size=5, terminal_states=[(0, 4)], p_success=0.9) >>> print(grid.shape) (26, 4, 26)
- Parameters
size (int, optional) – size of the grid world. Defaults to 5. \(|S| = size \times size\)
terminal_states (list, optional) – the location of terminal states: a list of (x, y) tuples. Defaults to [].
p_success (int, optional) – the probabilty that an action will be successful. Defaults to 1.
- Raises
- Returns
the transition matrix of the given grid world. The shape is \(\left(|S|+1,|A|,|S|+1\right)\), or \(\left(|S|,|A|,|S|\right)\) if there is no terminal state.
- Return type
np.ndarray
emdp.gridworld.plotting module#
- class emdp.gridworld.plotting.GridWorldPlotter(grid_size, has_absorbing_state=True)[source]#
Bases:
objectUtility to plot gridworlds
- Parameters
grid_size (int) – size of the gridworld
has_absorbing_state (bool, optional) – boolean representing if the gridworld has an absorbing state
- static from_mdp(mdp: emdp.gridworld.env.GridWorldMDP)[source]#
- plot_environment(ax, wall_locs=None, plot_grid=False)[source]#
Plots the environment with walls.
- Parameters
ax – The axes to plot this on
wall_locs (List[Tuple[int,int]]) – Locations of the walls for plotting them in a different color. The locations is a list of (row, col) tuples.
plot_grid (bool) – Boolean to plot the grid.
- Returns
ax: The axes of the final plot.
imshow_ax: The final plot.
- Return type
Tuple
- plot_heatmap(ax, trajectories, dont_unflatten=False, wall_locs=None)[source]#
Plots a state-visitation heatmap with walls.
- Parameters
ax – The axes to plot this on.
trajectories – a list of trajectories. Each trajectory is a list of states (numpy arrays) These states should be obtained by using the mdp.step() operation. To prevent this automatic conversion use dont_unflatten
dont_unflatten – will not automatically unflatten the trajectories into (x,y) pairs. (!) this assumes you have already unflattened them!
wall_locs – Locations of the walls for plotting them in a different color..
- Returns
- plot_trajectories(ax, trajectories, dont_unflatten=False, jitter_scale=1)[source]#
Plots a individual trajectory paths with some jitter.
- Parameters
ax – The axes to plot this on
trajectories – a list of trajectories. Each trajectory is a list of states (numpy arrays) These states should be obtained by using the mdp.step() operation. To prevent this automatic conversion use dont_unflatten
dont_unflatten – will not automatically unflatten the trajectories into (x,y) pairs. (!) this assumes you have already unflattened them!
- Returns
emdp.gridworld.txt_utilities module#
Utilities to help load gridworlds from a text file.
- emdp.gridworld.txt_utilities.build_gridworld_from_char_matrix(char_matrix, p_success=1, seed=2017, gamma=1, skip_checks=False, transition_matrix_builder_cls=<class 'emdp.gridworld.builder_tools.TransitionMatrixBuilder'>) Tuple[emdp.gridworld.env.GridWorldMDP, List[Tuple[int, int]]][source]#
A parser to build a gridworld from a text file. Each grid has ONE start and goal location. A reward of +1 is positioned at the goal location.
Examples
>>> char_matrix = get_char_matrix(['#####', '# g#', '# #', '#s# #', '#####']) >>> mdp, wall_locs = build_gridworld_from_char_matrix(char_matrix) (<emdp.gridworld.env.GridWorldMDP at 0x7fb4a67cb640>, [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 4), (2, 0), (2, 4), (3, 0), (3, 2), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)])
- Parameters
char_matrix – Matrix of characters.
p_success – Probability that the action is successful.
seed – The seed for the GridWorldMDP object.
skip_checks – Skips assertion checks.
transition_matrix_builder_cls – The transition matrix builder to use.
- Returns
MDP object, wall locations as list of
(rwo, col)tuple.- Return type
Tuple[GridWorldMDP, List[Tuple[int,int]]]
- emdp.gridworld.txt_utilities.get_char_matrix(raw_file)[source]#
Examples
>>> get_char_matrix(['#####', '# g#', '# #', '#s# #', '#####']) [['#', '#', '#', '#', '#'], ['#', ' ', ' ', 'g', '#'], ['#', ' ', ' ', ' ', '#'], ['#', 's', '#', ' ', '#'], ['#', '#', '#', '#', '#']]
- Parameters
raw_file – Either a opened python file object or a list of strings containing the lines.