emdp.chainworld package#

Submodules#

emdp.chainworld.env module#

emdp.chainworld.env.build_chain_MDP(n_states=3, p_success=1, reward_spec=[(1, 1, 5)], starting_distribution=array([0, 0, 1]), terminal_states=[0], gamma=0.9, seed=1337, return_MDP=True)[source]#

A simple chain world with states and 2 actions. Actions can fail with probability 1-p_success

Note

you probably want your terminal state to be separate from the state where the reward is obtained.

Example of how to use:

# a 7 state MDP where the agent starts in the middle # at the two ends are absorbing states (given by terminal states) # if the agent reaches the state before the terminal state it gets a reward # if the agent is at the left of the world and it takes an action LEFT it gets a -1 # otherwise it gets nothing # if the agent is at the right of the world and it takes an action RIGHT it gets a +1 # otherwise it gets nothing build_chain_MDP(n_states=7, p_success=0.9, reward_spec=[(5, RIGHT, +1), (1, LEFT, -1)]

starting_distribution=np.array([0,0,0,1,0,0,0]), terminal_states=[0, 6], gamma=0.9)

Parameters
  • n_states – the number of states in the chain world.

  • p_success – the probability of successfully executing an action.

  • reward_spec – a list of tuples which represent (location_of_reward, magnitude_of_reward)

  • starting_distribution – a distribution over starting states.

  • terminal_states – a list of integers representing the terminal states

  • return_MDP – returns an MDP object, else will return the components to create one.

Returns

Module contents#