emdp.gridworld package#

Submodules#

emdp.gridworld.builder_tools module#

Utilities to help build more complex grid worlds.

class emdp.gridworld.builder_tools.TransitionMatrixBuilder(grid_size, action_space=4, has_terminal_state=True)[source]#

Bases: object

Builder object to build a transition matrix for a grid world

property P#

Returns a new array with the transition matrix built so far.

Parameters

nocopy (bool, optional) – Defaults to False.

Returns

the transition model matrix

Return type

np.array

add_grid(terminal_states: Optional[List[int]] = None, p_success: float = 1)[source]#

Adds a grid so that you cant walk off the edges of the grid

Parameters
  • terminal_states (List[int], optional) – Terminal states. Defaults to [].

  • p_success (float, optional) – Defaults to 1.

Raises

ValueError

add_wall_at(tuple_location)[source]#

Add a blockade at this position :param tuple_location: (x,y) location of the wall :return:

add_wall_between(start: Tuple[int, int], end: Tuple[int, int])[source]#

Adds a wall between the starting and ending location

Parameters
  • start (Tuple[int,int]) – tuple (x,y) representing the starting position of the wall

  • end (Tuple[int,int]) – tuple (x,y) representing the ending position of the wall

Raises

ValueError

emdp.gridworld.builder_tools.build_simple_grid_world_with_terminal_states(reward_spec, size, p_success=1, gamma=0.99, seed=2017, start_state=0)[source]#

A simple size x size grid world where agents actions has a prob of p_success of executing correctly. rewards are given by a dict where the indices and the x,y positions and the value is the magnitude of the reward. Upon reaching a state with a reward, every action gives a reward. The episode then goes to an absorbing state and terminates.

Parameters
  • reward_spec – Reward specification

  • size – Size of the gridworld (grid world will be size x size)

  • p_success – The probability the action is successful.

  • gamma – The discount factor.

  • seed – Seed for the GridWorldMDP object.

  • start_state – The index of the starding state.

Returns

emdp.gridworld.builder_tools.build_simple_grid_world_without_terminal_states(reward_spec, size, p_success=1, gamma=0.99, seed=2017, start_state=0)[source]#

A simple size x size grid world where agents actions has a prob of p_success of executing correctly. rewards are given by a dict where the indices and the x,y positions and the value is the magnitude of the reward. Upon reaching a state with a reward, every action gives a reward. The episode does not terminate.

Parameters
  • reward_spec – Reward specification

  • size – Size of the gridworld (grid world will be size x size)

  • p_success – The probability the action is successful.

  • gamma – The discount factor.

  • seed – Seed for the GridWorldMDP object.

  • start_state – The index of the starting state.

Returns

emdp.gridworld.builder_tools.create_reward_matrix(state_space, size, reward_spec: Dict[Tuple[int, int], float], action_space=4)[source]#

Abstraction to create reward matrices.

Parameters
  • state_space (int) – Size of the state space, \(|\mathcal{S}|\).

  • size (int) – size of the gird world (width or height).

  • reward_spec (Dict[Tuple[int,int], float]) – the reward specification.

  • action_space (int) – the size of the action space

Returns

the reward matrix.

Return type

np.ndarray

emdp.gridworld.env module#

A simple grid world environment

class emdp.gridworld.env.GridWorldMDP(P, R, gamma, p0, terminal_states: List[Tuple[int, int]], size: int, seed=1337, skip_check=False, convert_terminal_states_to_ints=False)[source]#

Bases: emdp.common.MDP

Note

if terminal_states is not empty then there will be an absorbing state. So the actual number of states will be \(size^2 + 1\) if there is a terminal state, it should be the last one.

Parameters
  • P (np.ndarray) – state transition matrix \(P: \mathcal{S}\times\mathcal{A}\times\mathcal{S}\mapsto\mathbb{R}\), the shape is \(|S| \times |A| \times |S|\).

  • R (np.ndarray) – reward matrix \(r: \mathcal{S}\times \mathcal{A}\mapsto \mathbb{R}\), the shape is:math:|S| times |A|.

  • gamma (float) – discount factor \(\gamma\)

  • p0 (np.ndarray) – initial starting distribution \(p_0\). The array shape is \(|\mathcal{S}|=size\times size\).

  • terminal_states (List[Tuple[int,int]]) – Must be a list of (x,y) tuples. use skip_terminal_state_conversion if giving ints

  • size (int) – the size of the grid world (i.e there are \(size \times size + 1 = |\mathcal{S}|\) states in total).

  • seed (int, optional) – the random seed for simulations. Defaults to 1337.

  • skip_check (bool, optional) – _description_. Defaults to False.

  • convert_terminal_states_to_ints (bool, optional) – _description_. Defaults to False.

flatten_state(state)[source]#

Flatten state (row, col) into a one-hot vector.

see also: emdp.gridworld.helper_utilities.flatten_state()

Parameters

state (Tuple[int,int]) – (row, col) pair

Returns

one-hot vector of shape (size * size)

Return type

np.ndarray

reset()[source]#
set_current_state_to(tuple_state)[source]#
step(action)[source]#
Parameters

action – An integer representing the action taken.

Returns

unflatten_state(onehot) Tuple[int, int][source]#

Unflatten a one-hot state vector into a (row, col) pair

see also: emdp.gridworld.helper_utilities.unflatten_state()

Parameters

onehot (np.ndarray) – one-hot vector of shape (size, size)

Returns

(row, col) pair

Return type

Tuple[int,int]

emdp.gridworld.helper_utilities module#

emdp.gridworld.helper_utilities.is_P_valid_stochastic(P: numpy.ndarray) bool[source]#

return True is transition model P is a valid stochastic transition model. \(P\) is a valid stochastic transition model if

\[\sum_{s'\in\mathcal{S}} Pr(s'|s,a) = 1\]
Parameters

P (np.ndarray) – transition model.

Return type

bool

emdp.gridworld.helper_utilities.flatten_state(state, size, state_space)[source]#

Flatten state as (row, col) pair into a one-hot vector.

Example

>>> flatten_state((1,2), 3, 9)
array([0, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int32)
Parameters
  • state (Tuple[int, int]) – (row, col) pair

  • size (int) – width (number of columns) of the grid world.

  • state_space (int) – size of the state space, i.e. \(|\mathcal{S}|\).

Returns

one-hot representation of the state.

Return type

np.ndarray

emdp.gridworld.helper_utilities.unflatten_state(onehot: numpy.ndarray, size, has_absorbing_state: bool) Tuple[int, int][source]#

Unflatten a one-hot vector into a (row, col) pair.

Examples

>>> unflatten_state(np.array([0,0,0,1]), 2, False)
(1, 1)
>>> unflatten_state(np.array([0, 0, 0, 0, 0, 1, 0, 0, 0]), 3, False)
(1, 2)
Parameters
  • onehot (np.ndarray) – one hot representation of a state

  • size (int) – size of the grid world

  • has_absorbing_state (bool) – whether the grid world has an absorbing state

Returns

(row, col) pair

Return type

Tuple[int,int]

emdp.gridworld.helper_utilities.get_state_after_executing_action(action, state, grid_size)[source]#

Gets the state after executing an action

Parameters
  • action

  • state

  • grid_size

Returns

emdp.gridworld.helper_utilities.check_can_take_action(action, state, grid_size)[source]#

checks if you can take an action in a state. :param action: :param state: :param grid_size: :return:

emdp.gridworld.helper_utilities.get_possible_actions(state, grid_size)[source]#

Gets all possible actions at a given state.

Parameters
  • state (_type_) – _description_

  • grid_size (_type_) – _description_

Returns

_description_

Return type

_type_

emdp.gridworld.helper_utilities.build_simple_grid(size=5, terminal_states: Optional[List] = None, p_success=1)[source]#

Builds a simple grid where an agent can move LEFT, RIGHT, UP or DOWN and actions success with probability p_success. A terminal state is added if len(terminal_states) > 0 and will return matrix of size \((|S|+1)\times|A|\times(|S|+1)\).

Moving into walls does nothing.

Examples

Builds a simple 5x5 grid world where there is a terminal state at (0, 4). The probability of successfully executing the action is 0.9. This function returns the transition matrix.

>>> grid = build_simple_grid(size=5, terminal_states=[(0, 4)], p_success=0.9)
>>> print(grid.shape)
(26, 4, 26)
Parameters
  • size (int, optional) – size of the grid world. Defaults to 5. \(|S| = size \times size\)

  • terminal_states (list, optional) – the location of terminal states: a list of (x, y) tuples. Defaults to [].

  • p_success (int, optional) – the probabilty that an action will be successful. Defaults to 1.

Raises

InvalidActionError

Returns

the transition matrix of the given grid world. The shape is \(\left(|S|+1,|A|,|S|+1\right)\), or \(\left(|S|,|A|,|S|\right)\) if there is no terminal state.

Return type

np.ndarray

emdp.gridworld.helper_utilities.add_walls()[source]#

emdp.gridworld.plotting module#

class emdp.gridworld.plotting.GridWorldPlotter(grid_size, has_absorbing_state=True)[source]#

Bases: object

Utility to plot gridworlds

Parameters
  • grid_size (int) – size of the gridworld

  • has_absorbing_state (bool, optional) – boolean representing if the gridworld has an absorbing state

static from_mdp(mdp: emdp.gridworld.env.GridWorldMDP)[source]#
plot_environment(ax, wall_locs=None, plot_grid=False)[source]#

Plots the environment with walls.

Parameters
  • ax – The axes to plot this on

  • wall_locs (List[Tuple[int,int]]) – Locations of the walls for plotting them in a different color. The locations is a list of (row, col) tuples.

  • plot_grid (bool) – Boolean to plot the grid.

Returns

ax: The axes of the final plot.

imshow_ax: The final plot.

Return type

Tuple

plot_grid(ax)[source]#

Plots the skeleton of the grid world

Parameters

ax

Returns

plot_heatmap(ax, trajectories, dont_unflatten=False, wall_locs=None)[source]#

Plots a state-visitation heatmap with walls.

Parameters
  • ax – The axes to plot this on.

  • trajectories – a list of trajectories. Each trajectory is a list of states (numpy arrays) These states should be obtained by using the mdp.step() operation. To prevent this automatic conversion use dont_unflatten

  • dont_unflatten – will not automatically unflatten the trajectories into (x,y) pairs. (!) this assumes you have already unflattened them!

  • wall_locs – Locations of the walls for plotting them in a different color..

Returns

plot_trajectories(ax, trajectories, dont_unflatten=False, jitter_scale=1)[source]#

Plots a individual trajectory paths with some jitter.

Parameters
  • ax – The axes to plot this on

  • trajectories – a list of trajectories. Each trajectory is a list of states (numpy arrays) These states should be obtained by using the mdp.step() operation. To prevent this automatic conversion use dont_unflatten

  • dont_unflatten – will not automatically unflatten the trajectories into (x,y) pairs. (!) this assumes you have already unflattened them!

Returns

unflat_trajectories(trajectories)[source]#

Returns a generator where the trajectories have been unflattened.

Parameters

trajectories

Returns

emdp.gridworld.txt_utilities module#

Utilities to help load gridworlds from a text file.

emdp.gridworld.txt_utilities.build_gridworld_from_char_matrix(char_matrix, p_success=1, seed=2017, gamma=1, skip_checks=False, transition_matrix_builder_cls=<class 'emdp.gridworld.builder_tools.TransitionMatrixBuilder'>) Tuple[emdp.gridworld.env.GridWorldMDP, List[Tuple[int, int]]][source]#

A parser to build a gridworld from a text file. Each grid has ONE start and goal location. A reward of +1 is positioned at the goal location.

Examples

>>> char_matrix = get_char_matrix(['#####',
                                   '#  g#',
                                   '#   #',
                                   '#s# #',
                                   '#####'])
>>> mdp, wall_locs = build_gridworld_from_char_matrix(char_matrix)
(<emdp.gridworld.env.GridWorldMDP at 0x7fb4a67cb640>,
[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4),
 (1, 0), (1, 4), (2, 0), (2, 4), (3, 0),
 (3, 2), (3, 4), (4, 0), (4, 1), (4, 2),
 (4, 3), (4, 4)])
Parameters
  • char_matrix – Matrix of characters.

  • p_success – Probability that the action is successful.

  • seed – The seed for the GridWorldMDP object.

  • skip_checks – Skips assertion checks.

  • transition_matrix_builder_cls – The transition matrix builder to use.

Returns

MDP object, wall locations as list of (rwo, col) tuple.

Return type

Tuple[GridWorldMDP, List[Tuple[int,int]]]

emdp.gridworld.txt_utilities.get_char_matrix(raw_file)[source]#

Examples

>>> get_char_matrix(['#####',
                     '#  g#',
                     '#   #',
                     '#s# #',
                     '#####'])
[['#', '#', '#', '#', '#'],
['#', ' ', ' ', 'g', '#'],
['#', ' ', ' ', ' ', '#'],
['#', 's', '#', ' ', '#'],
['#', '#', '#', '#', '#']]
Parameters

raw_file – Either a opened python file object or a list of strings containing the lines.

Module contents#