Policies and Value Functions

Define policy and value function approximators, such as actors and critics

During training, most agents rely on an actor, a critic, or both. The actor learns the policy that selects the action to take. The critic learns the value (or Q-value) function that estimates the value of a policy.

Reinforcement Learning Toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. Approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.

For an introduction to policies, value functions, actors and critics, see Create Policies and Value Functions.

Blocks

Policy

Reinforcement learning policy (Since R2022b)

Functions

expand all

Create Actors and Critics

`rlTable`	Value table or Q table
`rlValueFunction`	Value function approximator object for reinforcement learning agents (Since R2022a)
`rlQValueFunction`	Q-Value function approximator with a continuous or discrete action space reinforcement learning agents (Since R2022a)
`rlVectorQValueFunction`	Vector Q-value function approximator with hybrid or discrete action space for reinforcement learning agents (Since R2022a)
`rlContinuousDeterministicActor`	Deterministic actor with a continuous action space for reinforcement learning agents (Since R2022a)
`rlDiscreteCategoricalActor`	Stochastic categorical actor with a discrete action space for reinforcement learning agents (Since R2022a)
`rlContinuousGaussianActor`	Stochastic Gaussian actor with a continuous action space for reinforcement learning agents (Since R2022a)
`rlHybridStochasticActor`	Hybrid stochastic actor with a hybrid action space for reinforcement learning agents (Since R2024b)

Get and Set Actors and Critics from and to Agents

`getActor`	Extract actor from reinforcement learning agent
`setActor`	Set actor of reinforcement learning agent
`getCritic`	Extract critic from reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent

Get and Set Approximation Models and Learnable Parameters

`getModel`	Get approximation model from function approximator object
`setModel`	Set approximation model in function approximator object
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (Since R2022a)

Normalize Inputs

`rlNormalizer`	Configure normalization for input of function approximator object (Since R2024a)
`getNormalizer`	Get normalizer from function approximator object (Since R2024a)
`setNormalizer`	Set normalizer in function approximator object (Since R2024a)
`normalize`	Normalize input data using method defined in normalizer object (Since R2024a)

Training Options for Actors and Critics

rlOptimizerOptions Optimization options for actors and critics (Since R2022a)

Extract Policy Objects from Agents

`getGreedyPolicy`	Extract greedy (deterministic) policy object from agent (Since R2022a)
`getExplorationPolicy`	Extract exploratory (stochastic) policy object from agent (Since R2023a)

Create Policy Objects for Custom Training and Deployment

`rlOptimizer`	Creates an optimizer object for actors and critics (Since R2022a)
`rlMaxQPolicy`	Policy object to generate discrete max-Q actions for custom training loops and application deployment (Since R2022a)
`rlEpsilonGreedyPolicy`	Policy object to generate discrete epsilon-greedy actions for custom training loops (Since R2022a)
`rlDeterministicActorPolicy`	Policy object to generate continuous deterministic actions for custom training loops and application deployment (Since R2022a)
`rlAdditiveNoisePolicy`	Policy object to generate continuous noisy actions for custom training loops (Since R2022a)
`rlStochasticActorPolicy`	Policy object to generate stochastic actions for custom training loops and application deployment (Since R2022a)
`rlHybridStochasticActorPolicy`	Policy object to generate hybrid stochastic actions for custom training loops and application deployment (Since R2024b)

Get and Set Parameters

`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (Since R2022a)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`policyParameters`	Obtain structure of policy parameters to update policy during simulation or deployment (Since R2025a)
`updatePolicyParameters`	Update policy according to structure of policy parameters given as input argument (Since R2025a)

Approximators for Neural Network Environments

`rlContinuousDeterministicTransitionFunction`	Deterministic transition function approximator object for neural network-based environment (Since R2022a)
`rlContinuousGaussianTransitionFunction`	Stochastic Gaussian transition function approximator object for neural network-based environment (Since R2022a)
`rlContinuousDeterministicRewardFunction`	Deterministic reward function approximator object for neural network-based environment (Since R2022a)
`rlContinuousGaussianRewardFunction`	Stochastic Gaussian reward function approximator object for neural network-based environment (Since R2022a)
`rlIsDoneFunction`	Is-done function approximator object for neural network-based environment (Since R2022a)

Get Actions and Values

`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getValue`	Obtain estimated value from a critic given environment observations and actions
`getMaxQValue`	Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (Since R2022a)

Deep Neural Network Layers

`quadraticLayer`	Quadratic layer
`scalingLayer`	Scaling layer
`softplusLayer`	Softplus layer
`featureInputLayer`	Feature input layer
`reluLayer`	Rectified Linear Unit (ReLU) layer
`tanhLayer`	Hyperbolic tangent (tanh) layer
`fullyConnectedLayer`	Fully connected layer
`lstmLayer`	Long short-term memory (LSTM) layer for recurrent neural network (RNN)
`softmaxLayer`	Softmax layer

Topics

Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
Import Neural Network Models Using ONNX
You can import existing policies from other deep learning frameworks using the ONNX™ model format.