Main Content

rlContinuousDeterministicRewardFunction

Deterministic reward function approximator object for neural network-based environment

    Description

    When creating a neural network-based environment using rlNeuralNetworkEnvironment, you can specify the reward function approximator using an rlContinuousDeterministicRewardFunction object. Do so when you do not know a ground-truth reward signal for your environment but you expect the reward signal to be deterministic.

    The reward function approximator object uses a deep neural network as internal approximation model to predict the reward signal for the environment given one of the following input combinations.

    • Observations, actions, and next observations

    • Observations and actions

    • Actions and next observations

    • Next observations

    To specify a stochastic reward function, use an rlContinuousGaussianRewardFunction object.

    Creation

    Description

    example

    rwdFcnAppx = rlContinuousDeterministicRewardFunction(net,observationInfo,actionInfo,Name=Value) creates the deterministic reward function approximator object rwdFcnAppx using the deep neural network net and sets the ObservationInfo and ActionInfo properties.

    When creating a reward function you must specify the names of the deep neural network inputs using one of the following combinations of name-value pair arguments.

    You can also specify the UseDevice property using and an optional name-value pair argument. For example, to use a GPU for prediction, specify UseDevice="gpu".

    Input Arguments

    expand all

    Deep neural network with a scalar output value, specified as a dlnetwork object.

    The input layer names for this network must match the input names specified using the ObservationInputNames, ActionInputNames, and NextObservationInputNames. The dimensions of the input layers must match the dimensions of the corresponding observation and action specifications in ObservationInfo and ActionInfo, respectively.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: ObservationInputNames="velocity"

    Observation input layer names, specified as a string or string array. Specify ObservationInputNames when you expect the reward signal to depend on the current environment observation.

    The number of observation input names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

    Action input layer names, specified as a string or string array. Specify ActionInputNames when you expect the reward signal to depend on the current action value.

    The number of action input names must match the length of ActionInfo and the order of the names must match the order of the specifications in ActionInfo.

    Next observation input layer names, specified as a string or string array. Specify NextObservationInputNames when you expect the reward signal to depend on the next environment observation.

    The number of next observation input names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

    Properties

    expand all

    This property is read-only.

    Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data types, and names of the observation signals.

    You can extract the observation specifications from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

    This property is read-only.

    Action specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data types, and names of the action signals.

    You can extract the action specifications from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

    Computation device used to perform operations such as gradient computation, parameter updates, and prediction during training and simulation, specified as either "cpu" or "gpu".

    The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA®-enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

    You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

    Training or simulating a network on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations using a CPU.

    Object Functions

    rlNeuralNetworkEnvironmentEnvironment model with deep neural network transition models

    Examples

    collapse all

    Create an environment interface and extract observation and action specifications. Alternatively, you can create specifications using rlNumericSpec and rlFiniteSetSpec.

    env = rlPredefinedEnv("CartPole-Continuous");
    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    To approximate the reward function, create a deep neural network. For this example, the network has two input channels, one for the current action and one for the next observations. The single output channel contains a scalar, which represents the value of the predicted reward.

    Define each network path as an array of layer objects. Get the dimensions of the observation and action spaces from the environment specifications, and specify a name for the input layers, so you can later explicitly associate them with the appropriate environment channel.

    actionPath = featureInputLayer( ...
        actInfo.Dimension(1), ...
        Name="action");
    
    nextStatePath = featureInputLayer( ...
        obsInfo.Dimension(1), ...
        Name="nextState");
    
    commonPath = [concatenationLayer(1,2,Name="concat")
        fullyConnectedLayer(64,Name="FC1")
        reluLayer(Name="CriticRelu1")
        fullyConnectedLayer(64,Name="FC2")
        reluLayer(Name="CriticCommonRelu2")
        fullyConnectedLayer(64,Name="FC3")
        reluLayer(Name="CriticCommonRelu3")
        fullyConnectedLayer(1,Name="reward")];
    
    net = layerGraph(nextStatePath);
    net = addLayers(net,actionPath);
    net = addLayers(net,commonPath);
    
    net = connectLayers(net,"nextState","concat/in1");
    net = connectLayers(net,"action","concat/in2");
    
    plot(net)

    Figure contains an axes object. The axes object contains an object of type graphplot.

    Create a dlnetwork object and display the number of weights.

    net = dlnetwork(net);
    summary(net);
       Initialized: true
    
       Number of learnables: 8.7k
    
       Inputs:
          1   'nextState'   4 features
          2   'action'      1 features
    

    Create a deterministic transition function object.

    rwdFcnAppx = rlContinuousDeterministicRewardFunction(...
        net,obsInfo,actInfo,...
        ActionInputNames="action", ...
        NextObservationInputNames="nextState");

    Using this reward function object, you can predict the next reward value based on the current action and next observation. For example, predict the reward for a random action and next observation. Since, for this example, only the action and the next observation influence the reward, use an empty cell array for the current observation.

    act = rand(actInfo.Dimension);
    nxtobs = rand(obsInfo.Dimension);
    reward = predict(rwdFcnAppx,{}, {act}, {nxtobs})
    reward = single
        0.1034
    

    To predict the reward, you can also use evaluate.

    reward_ev = evaluate(rwdFcnAppx, {act,nxtobs} )
    reward_ev = 1x1 cell array
        {[0.1034]}
    
    

    Version History

    Introduced in R2022a