Difference RL Agent training plot and result plot

Question

Hi, below the grahp shows the action during the training and second one shows different action  after training. just constant..
can you please help me?



%% Create observation specification
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
numObs = obsInfo.Dimension(1);
%% Create action specification
actInfo = rlNumericSpec([1 1],'LowerLimit',-50,'UpperLimit',50);
%actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'current';
numActions = actInfo.Dimension(1);
%% Create the environment
blk= [mdl '/RL Agent'];
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
env.ResetFcn= @(in)setVariable(in,'current',5,'Workspace',mdl);
env.UseFastRestart = 'off';
Ts= param.dt;
Tf= param.end_time;
rng(0)
%% Create DDPG Agent
statePath = [
    featureInputLayer(numObs,'Normalization','none','Name','observations')
    fullyConnectedLayer(200,'Name','CriticStateFC1')
    reluLayer('Name', 'CriticRelu1')
    fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(1,'Normalization','none','Name','action')
    fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
%% Create the criticrepresentation using the specified deep neural 
% network and options
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...
    {'observations'},'Action',{'action'},criticOpts);
%% create the actor
actorNetwork = [
    featureInputLayer(numObs,'Normalization','none','Name','observations')
    fullyConnectedLayer(400,'Name','ctorFC1')
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(300,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(1,'Name','ActorFC3')
    tanhLayer('Name','ActorTanh')
    scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))
    ];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...
    'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);
%% Create the DDPG agent option
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'SaveExperienceBufferWithAgent',true,...
    'DiscountFactor',0.99,...
    "ResetExperienceBufferBeforeTraining",false,...
    'MiniBatchSize',256);
agentOpts.NoiseOptions.Variance = 0.2;
agentOpts.NoiseOptions.VarianceDecayRate = 0;
agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train Agent
maxepisodes = 10;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'ScoreAveragingWindowLength',5,...
    'Verbose',true,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',5000,...
    'SaveAgentCriteria','EpisodeReward',...
    'SaveAgentValue',5000);
doTraining = true;

%end
trainingStats = train(agent,env,trainOpts);

Difference RL Agent training plot and result plot

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (0)

Categorías

Productos

Etiquetas

Community Treasure Hunt

Difference RL Agent training plot and result plot

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (0)

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos