Main Content

cleanup

Clean up reinforcement learning environment after running multiple simulations

    Description

    When you define a custom training loop for reinforcement learning, you can simulate an agent or policy against an environment using the runEpisode function. Use the cleanup function to clean up the environment after running simulations using multiple calls to runEpisode.

    To clean up the environment after each simulation, you can configure runEpisode to automatically call the cleanup function at the end of each episode.

    example

    cleanup(env) cleans up the specified reinforcement learning environment after running multiple simulations using runEpisode.

    Examples

    collapse all

    Create a reinforcement learning environment and extract its observation and action specifications.

    env = rlPredefinedEnv("CartPole-Discrete");
    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    Create a Q-value function approximator.

    actorNetwork = [...
        featureInputLayer(obsInfo.Dimension(1),...
            Normalization="none",Name="state")
        fullyConnectedLayer(24,Name="fc1")
        reluLayer(Name="relu1")
        fullyConnectedLayer(24,Name="fc2")
        reluLayer(Name="relu2")
        fullyConnectedLayer(2,Name="output")
        softmaxLayer(Name="actionProb")];
    actorNetwork = dlnetwork(actorNetwork);
    
    actor = rlDiscreteCategoricalActor(actorNetwork,obsInfo,actInfo);

    Create a policy object using the function approximator.

    policy = rlStochasticActorPolicy(actor);

    Create an experience buffer.

    buffer = rlReplayMemory(obsInfo,actInfo);

    Set up the environment for running multiple simulations. For this example, configure the training to log any errors rather than send them to the command window.

    setup(env,StopOnError="off")

    Simulate multiple episodes using the environment and policy. After each episode, append the experiences to the buffer. For this example, run 100 episodes.

    for i=1:100
        output = runEpisode(env,policy,MaxSteps=300);
        append(buffer,output.AgentData.Experiences)
    end

    Cleanup the environment.

    cleanup(env)

    Sample a mini-batch of experiences from the buffer. For this example, sample 10 experiences.

    batch = sample(buffer,10);

    You can then learn from the sampled experiences and update the policy and actor.

    Input Arguments

    collapse all

    Reinforcement learning environment, specified as one of the following objects.

    If env is a SimulinkEnvWithAgent object and the associated Simulink model is configured to use fast restart, then cleanup terminates the model compilation.

    Version History

    Introduced in R2022a