createGridWorld
Create a two-dimensional grid world for reinforcement learning
Description
Examples
Create Grid World Environment
For this example, consider a 5-by-5 grid world with the following rules:
A 5-by-5 grid world bounded by borders, with 4 possible actions (North = 1, South = 2, East = 3, West = 4).
The agent begins from cell [2,1] (second row, first column).
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue).
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward.
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
All other actions result in -1 reward.
First, create a GridWorld
object using the createGridWorld
function.
GW = createGridWorld(5,5)
GW = GridWorld with properties: GridSize: [5 5] CurrentState: "[1,1]" States: [25x1 string] Actions: [4x1 string] T: [25x25x4 double] R: [25x25x4 double] ObstacleStates: [0x1 string] TerminalStates: [0x1 string] ProbabilityTolerance: 8.8818e-16
Now, set the initial, terminal and obstacle states.
GW.CurrentState = '[2,1]'; GW.TerminalStates = '[5,5]'; GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
Update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updateStateTranstionForObstacles(GW) GW.T(state2idx(GW,"[2,4]"),:,:) = 0; GW.T(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 1;
Next, define the rewards in the reward transition matrix.
nS = numel(GW.States); nA = numel(GW.Actions); GW.R = -1*ones(nS,nS,nA); GW.R(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 5; GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
Now, use rlMDPEnv
to create a grid world environment using the GridWorld
object GW
.
env = rlMDPEnv(GW)
env = rlMDPEnv with properties: Model: [1x1 rl.env.GridWorld] ResetFcn: []
You can visualize the grid world environment using the plot
function.
plot(env)
Input Arguments
m
— Number of rows of the grid world
scalar
Number of rows of the grid world, specified as a scalar.
n
— Number of columns of the grid world
scalar
Number of columns of the grid world, specified as a scalar.
moves
— Action names
'Standard'
(default) | 'Kings'
Action names, specified as either 'Standard'
or
'Kings'
. When moves
is set to
'Standard'
, the actions are['N';'S';'E';'W']
.'Kings'
, the actions are['N';'S';'E';'W';'NE';'NW';'SE';'SW']
.
Output Arguments
GW
— Two-dimensional grid world
GridWorld
object
Two-dimensional grid world, returned as a GridWorld
object with
properties listed below. For more information, see Create Custom Grid World Environments.
GridSize
— Size of the grid world
[m,n]
vector
Size of the grid world, specified as a [m,n]
vector.
CurrentState
— Name of the current state
string
Name of the current state, specified as a string.
Actions
— Action names
string vector
Action names, specified as a string vector. The length of the
Actions
vector is determined by the
moves
argument.
Actions
is a string vector of length:
Four, if
moves
is specified as'Standard'
.Eight,
moves
is specified as'Kings'
.
T
— State transition matrix
3D array
State transition matrix, specified as a 3-D array, which determines the
possible movements of the agent in an environment. State transition matrix
T
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
T
is given by,
T
is:
A
K
-by-K
-by-4 array, ifmoves
is specified as'Standard'
. Here,K
=m
*n
.A
K
-by-K
-by-8 array, ifmoves
is specified as'Kings'
.
R
— Reward transition matrix
3D array
Reward transition matrix, specified as a 3-D array, determines how much reward
the agent receives after performing an action in the environment.
R
has the same shape and size as state transition matrix
T
. Reward transition matrix R
is given by,
R
is:
A
K
-by-K
-by-4 array, ifmoves
is specified as'Standard'
. Here,K
=m
*n
.A
K
-by-K
-by-8 array, ifmoves
is specified as'Kings'
.
ObstacleStates
— State names that cannot be reached in the grid world
string vector
State names that cannot be reached in the grid world, specified as a string vector.
TerminalStates
— Terminal state names in the grid world
string vector
Terminal state names in the grid world, specified as a string vector.
Version History
Introduced in R2019a
See Also
Functions
Objects
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)