update
Update the state of on optimizer object and a set of learnable parameters using the gradient value
Since R2022a
Syntax
Description
[
updates the internal state of newFcnAppx
,newOptimizer
] = update(optimizer
,fcnAppx
,grad
)optimizer
and the learnable parameters of
fcnAppx
according to the gradient value grad
.
Returns the updated approximator newFcnAppx
and the updated optimizer
newOptimizer
.
Examples
Update Function Approximator and Optimizer State Using Gradient Value
For this example, create a value function critic and update its parameters.
First, create an finite set observation specification for a scalar that can have four different values.
obsInfo = rlFiniteSetSpec(1:4);
Create a table object. Table values are initialized to zero by default.
table = rlTable(obsInfo);
Set the table values to different values.
table.Table = [1 -1 -10 100]';
Create the critic.
critic = rlValueFunction(table,obsInfo);
Create an optimizer object.
opt = rlOptimizer(rlOptimizerOptions(Algorithm="sgdm",LearnRate=0.2))
opt = rlSGDMOptimizer with properties: Momentum: 0.9000 LearnRate: 0.2000 L2RegularizationFactor: 1.0000e-04 GradientThreshold: Inf GradientThresholdMethod: "l2norm"
For this example, assume a gradient value for the set of parameters equal to {dlarray([0.1 0.2 0.3 0.4]')}
.
Update the parameter set, and display the updated optimizer and parameter set.
[newCritic,newOpt] = update(opt,critic,{dlarray([0.1 0.2 0.3 0.4]')})
newCritic = rlValueFunction with properties: ObservationInfo: [1x1 rl.util.rlFiniteSetSpec] Normalization: "none" UseDevice: "cpu" Learnables: {[4x1 dlarray]} State: {}
newOpt = rlSGDMOptimizer with properties: Momentum: 0.9000 LearnRate: 0.2000 L2RegularizationFactor: 1.0000e-04 GradientThreshold: Inf GradientThresholdMethod: "l2norm"
Display the learnable parameters of the updated critic.
newCritic.Learnables{1}
ans = 4x1 dlarray 0.9800 -1.0400 -10.0598 99.9180
You can subsequently update both the critic and the optimizer object. For an example on how using update
in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.
Update Learnable Parameter Set and Optimizer State Using Gradient Value
Create a default optimizer object.
opt = rlOptimizer
opt = rlADAMOptimizer with properties: GradientDecayFactor: 0.9000 SquaredGradientDecayFactor: 0.9990 Epsilon: 1.0000e-08 LearnRate: 0.0100 L2RegularizationFactor: 1.0000e-04 GradientThreshold: Inf GradientThresholdMethod: "l2norm"
For this example, assume a parameter set given by the two-element cell array {1
-1}
, and a gradient value of {0.1
0.1}
.
Update the parameter set, and display the updated optimizer and parameter set.
[pars,opt] = update(opt,{1 -1},{0.1 0.1})
pars=1×2 cell array
{[0.9900]} {[-1.0100]}
opt = rlADAMOptimizer with properties: GradientDecayFactor: 0.9000 SquaredGradientDecayFactor: 0.9990 Epsilon: 1.0000e-08 LearnRate: 0.0100 L2RegularizationFactor: 1.0000e-04 GradientThreshold: Inf GradientThresholdMethod: "l2norm"
You can subsequently update both the parameter set and the optimizer object. For an example on how using update
in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.
Input Arguments
optimizer
— Optimizer object to update
rlADAMOptimizer
object | rlSGDMOptimizer
object |
rlRMSPropOptimizer
object
Optimizer object to update, specified as an rlADAMOptimizer
,
rlSGDMOptimizer
, or rlRMSPropOptimizer
object. The
runEpisode
function uses the update method of the returned object
to update the learnable parameter of an actor or critic.
fcnAppx
— Function approximator object to update
function approximator object
Function approximator object to update, specified as one of the following:
rlValueFunction
object — Value function criticrlQValueFunction
object — Q-value function criticrlVectorQValueFunction
object — Multi-output Q-value function critic with a discrete action spacerlContinuousDeterministicActor
object — Deterministic policy actor with a continuous action spacerlDiscreteCategoricalActor
— Stochastic policy actor with a discrete action spacerlContinuousGaussianActor
object — Stochastic policy actor with a continuous action spacerlContinuousDeterministicTransitionFunction
object — Continuous deterministic transition function for a model based agentrlContinuousGaussianTransitionFunction
object — Continuous Gaussian transition function for a model based agentrlContinuousDeterministicRewardFunction
object — Continuous deterministic reward function for a model based agentrlContinuousGaussianRewardFunction
object — Continuous Gaussian reward function for a model based agentrlIsDoneFunction
object — Is-done function for a model based agent
params
— Parameter set to update
cell array
Parameter set to update, specified as a cell array.
Example: {1 2 -1 -3}
grad
— Value of the gradient
cell array
Value of the gradient, returned as a cell array with elements consistent in size and
data type with the learnable parameters of fcnAppx
or with
params
.
Specifically, each element of grad
contains the gradient of a
loss function with respect to a group of learnable parameters of
fcnAppx
.
The numerical array in each cell has dimensions D-by-LB-by-LS, where:
D corresponds to the dimensions of the input channel of
fcnAppx
.LB is the batch size (length of a batch of independent inputs).
LS is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If
fcnAppx
does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then LS = 1.
The gradient is calculated using the whole history of
LS inputs, and all the
LB gradients with respect to the
independent input sequences are added together in grad
. Therefore,
grad
has always the same size as the output of getLearnableParameters
.
For more information on input and output formats for recurrent neural networks, see
the Algorithms section of lstmLayer
.
Example: {0.2 -0.1 0 -0.01}
Output Arguments
newFcnAppx
— Updated function approximator object
function approximator object
Updated function approximator object, returned as a function approximator object of
the same type and configuration as fcnAppx
.
newOptimizer
— Updated optimizer object
rlADAMOptimizer
object | rlSGDMOptimizer
object |
rlRMSPropOptimizer
object
Optimizer object to update, specified as an rlADAMOptimizer
,
rlSGDMOptimizer
, or rlRMSPropOptimizer
object. The
object implements an optimization algorithms.
newPars
— Updated parameter set
cell array
Updated parameter set, returned as a cell array of the same dimension and type as
params
.
Version History
Introduced in R2022a
See Also
Functions
rlOptimizer
|syncParameters
|runEpisode
|evaluate
|dlfeval
|dlaccelerate
|getValue
|getAction
|getMaxQValue
|setup
|cleanup
Objects
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)