My DDPG agent model is generating same output from every simulation.

Question

Anna el 14 de Mayo de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2118471-my-ddpg-agent-model-is-generating-same-output-from-every-simulation

Comentada: Anna el 29 de Mayo de 2024

I trained a rlDDPGagent(a biped walking robot) and simulated, but it always generates a same walking from every simulation.

However, I need a different gaits to acquire several sensor data. The code below is the training and agent option code. I used the msra-walking-robot-master code from matlab github.

% Create DDPG agent and training options for walking robot example
%
% Copyright 2019 The MathWorks, Inc.
%% DDPG Agent Options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.99;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.NoiseOptions.MeanAttractionConstant = 5;
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
%% Training Options
trainingOptions = rlTrainingOptions;
trainingOptions.MaxEpisodes = 5000; 
trainingOptions.MaxStepsPerEpisode = Tf/Ts;
trainingOptions.ScoreAveragingWindowLength = 100;
trainingOptions.StopTrainingCriteria = 'AverageReward';
trainingOptions.StopTrainingValue = 110;
trainingOptions.SaveAgentCriteria = 'EpisodeReward';
trainingOptions.SaveAgentValue = 150;
trainingOptions.Plots = 'training-progress';
trainingOptions.Verbose = true;
if useParallel
    trainingOptions.Parallelization = 'async';
    trainingOptions.ParallelizationOptions.StepsUntilDataIsSent = 32;
end

The code below is training code:

% Walking Robot -- DDPG Agent Training Script (2D)
% Copyright 2019 The MathWorks, Inc.
warning off parallel:gpu:device:DeviceLibsNeedsRecompiling %don't show the warning
%% SET UP ENVIRONMENT
% Speedup options
useFastRestart = true;
useGPU = false;
useParallel = true;
% Create the observation info
numObs = 31;
observationInfo = rlNumericSpec([numObs 1]);
observationInfo.Name = 'observations';
% create the action info
numAct = 6;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',-1,'UpperLimit', 1);
actionInfo.Name = 'foot_torque';
% Environment
mdl = 'walkingRobotRL2D';
load_system(mdl);
blk = [mdl,'/RL Agent'];
env = rlSimulinkEnv(mdl,blk,observationInfo,actionInfo);
env.ResetFcn = @(in)walkerResetFcn(in,upper_leg_length/100,lower_leg_length/100,h/100,'2D');
if ~useFastRestart
   env.UseFastRestart = 'off';
end
%% CREATE NEURAL NETWORKS
createDDPGNetworks;
                     
%% CREATE AND TRAIN AGENT
createDDPGOptions;
agent = rlDDPGAgent(actor,critic,agentOptions);
trainingResults = train(agent,env,trainingOptions)
%% SAVE AGENT
reset(agent); % Clears the experience buffer
curDir = pwd;
saveDir = 'savedAgents';
cd(saveDir)
save(['trainedAgent_2D_' datestr(now,'mm_DD_YYYY_HHMM')],'agent','trainingResults','trainingOptions.MaxEpisodes');
cd(curDir)

The code below is the simulation code:

% Simulates the walking robot model
%% Setup
clc; close all;
robotParametersRL
% Create the observation info
numObs = 31;
observationInfo = rlNumericSpec([numObs 1]);
observationInfo.Name = 'observations';
% create the action info
numAct = 6;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',-1,'UpperLimit', 1);
actionInfo.Name = 'foot_torque';
% Environment
mdl = 'walkingRobotRL2D';
load_system(mdl);
blk = [mdl,'/RL Agent'];
env = rlSimulinkEnv(mdl,blk,observationInfo,actionInfo);
load trainedAgent_2D_04_25_2024_1541_5000 %load agent
%action = getAction(agent);
simOpts = rlSimulationOptions;
simOpts.MaxSteps = 1000;
simOpts.NumSimulations = 3;
%plot(env);
reset(env); 
experience = sim(env,agent,simOpts);

The result of the code always show the same gait. Is there any method to get different output from every simulation?

Thank you so much!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ronit el 22 de Mayo de 2024

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2118471-my-ddpg-agent-model-is-generating-same-output-from-every-simulation#answer_1461406

Abrir en MATLAB Online

Hello,

To generate different gaits from each simulation with your trained Deep Deterministic Policy Gradient (DDPG) agent, you can introduce variability in the initial conditions of the simulation or apply noise to the action output during simulation (not during training). This approach can help in exploring a range of behaviours from the trained model.

Here's an example how to do it:

Upate the 'walkerResetFcn’

function in = walkerResetFcn(in,upper_leg_length,lower_leg_length,init_height,dim)
    % Increase the max displacement and speed for more variability
    max_displacement_x = 0.1; % was 0.05
    max_speed_x = 0.1;  % was 0.05
    max_displacement_y = 0.05; % was 0.025
    max_speed_y = 0.05; % was 0.025
end

2. Add custom Simulation Loop with Action Noise

% Define simulation parameters
numSteps = 1000; % Number of steps per simulation
numSimulations = 3; % Number of simulations
for simIdx = 1:numSimulations
    % Reset the environment
    observation = reset(env);
    
    for stepIdx = 1:numSteps
        % Generate action from the agent
        action = getAction(agent, observation);
        
        % Add exploration noise to the action
        noise = randn(size(action))*0.1; % Adjust noise level as needed
        noisyAction = action + noise;
        
        % Ensure action is within bounds
        noisyAction = max(min(noisyAction, actionInfo.UpperLimit), actionInfo.LowerLimit);
        
        % Step the environment using the noisy action
        [observation, reward, isDone, info] = step(env, noisyAction);
        
        % Optionally, break the loop if the episode is done
        if isDone
            break;
        end
    end
end

This will help in generating different results in every simulation.

Hope this helps!

2 comentarios
Mostrar NingunoOcultar Ninguno

Anna el 28 de Mayo de 2024

Abrir en MATLAB Online

Thank you so much! I am working on this problem with your solution. But I have got an error "rl.agent.AbstractPolicy/getAction (line 103)

Invalid observation type or size."

I couldn't clearly understand about "observation" in this problem. The observation I defined is:

obsInfo = getObservationInfo(env);
obsInfo(1)
ans = 
  rlNumericSpec - (The word here was not an English, so I deleted):
     LowerLimit: -Inf
     UpperLimit: Inf
           Name: "observations"
    Description: [0×0 string]
      Dimension: [31 1]
       DataType: "double"

Could you help me with how to change the observation? I appreciate your help!!!

Anna el 29 de Mayo de 2024

Abrir en MATLAB Online

I resolved that problem by changing this:

action = getAction(agent, observation);

to this:

action = getAction(agent, {rand(observationInfo.Dimension)});

However, is kept show an error regarding on step function, which says "step function is not supported in Simulink environment."

So, I changed the model itself. I added a "Add constant" block, and typed "noiseStd * randn(size(action))"

This worked! Finally my model is giving me a different gait in every simulation.

I am so thankful for your help, Ronit!!!

Iniciar sesión para comentar.

Answer 2

Mubashir Rasool el 18 de Mayo de 2024

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2118471-my-ddpg-agent-model-is-generating-same-output-from-every-simulation#answer_1459721

I am facing the same issue that my DPPG agent is not generating desired results. I thhink the main thing in DDPG design is reward function selection. Lets wait for some professional response

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

My DDPG agent model is generating same output from every simulation.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

My DDPG agent model is generating same output from every simulation.

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos