Main Content

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. During training, the agent tunes the parameters of its policy approximator to maximize the long-term reward.

Reinforcement Learning Toolbox™ software provides approximator objects for actor and critic. The actor implements the policy that selects the best action to take. The critic implements the value (or Q-value) function that estimates the value (the cumulative long-term reward) of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, see Create Policies and Value Functions.

Blocks

PolicyReinforcement learning policy

Functions

expand all

rlTableValue table or Q table
rlValueFunctionValue function approximator object for reinforcement learning agents
rlQValueFunction Q-Value function approximator object for reinforcement learning agents
rlVectorQValueFunction Vector Q-value function approximator for reinforcement learning agents
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents
rlOptimizerOptionsOptimization options for actors and critics
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment
quadraticLayerQuadratic layer for actor or critic network
scalingLayerScaling layer for actor or critic network
softplusLayerSoftplus layer for actor or critic network
featureInputLayerFeature input layer
reluLayerRectified Linear Unit (ReLU) layer
tanhLayerHyperbolic tangent (tanh) layer
fullyConnectedLayerFully connected layer
lstmLayerLong short-term memory (LSTM) layer
softmaxLayerSoftmax layer
getActorGet actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticGet critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
getModelGet function approximator model from actor or critic
setModelSet function approximation model for actor or critic
getActionObtain action from agent, actor, or policy object given environment observations
getValueObtain estimated value from a critic given environment observations and actions
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradientEvaluate gradient of function approximator object given observation and action input data
accelerateOption to accelerate computation of gradient for approximator object based on neural network

Topics