Borrar filtros
Borrar filtros

RL: Continuous action space, but within a desired range(use PPO)

34 visualizaciones (últimos 30 días)
Peijie
Peijie el 26 de Oct. de 2023
Respondida: Nicolas CRETIN el 19 de Jul. de 2024 a las 17:56
I am now trying to use a PPO in RL training with continuous action space.
However, I want to ensure that the output of my actor always stays within the upper and lower bounds I set. In my environment, I'm using the following code, and my actor network and critic network are as follows.
% observation info
ObservationInfo = rlNumericSpec([n_Pd+n_Pg+1, 1]);
% action info
ActionInfo = rlNumericSpec([n_Pg, 1], ...
'Lowerlimit', Pgmin, ...
'Upperlimit', Pgmax);
Actor network
%% Actor Network
% Input path layers
inPath = [featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(128,'Name','ActorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(128,'Name','ActorFC2')
reluLayer('Name', 'ActorRelu2')
fullyConnectedLayer(numActions,'Name','Action')
];
% Path layers for mean value
meanPath = [
tanhLayer(Name="tanhMean");
fullyConnectedLayer(numActions);
scalingLayer('Name','ActorScaling','Scale',actInfo.UpperLimit)
];
% Path layers for standard deviations
% Using softplus layer to make them non negative
sdevPath = [
tanhLayer(Name="tanhStdv");
fullyConnectedLayer(numActions);
softplusLayer(Name="Splus")
];
% Add layers to network object
actorNetwork = layerGraph(inPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,sdevPath);
% Connect layers
actorNetwork = connectLayers(actorNetwork,"Action","tanhMean/in");
actorNetwork = connectLayers(actorNetwork,"Action","tanhStdv/in");
actorNetwork = dlnetwork(actorNetwork);
% figure(2)
% plot(layerGraph(actorNetwork))
% Setting Actor
actorOptions = rlOptimizerOptions('LearnRate',0.1,'GradientThreshold',inf);
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo, ...
"ActionMeanOutputNames","ActorScaling", ...
"ActionStandardDeviationOutputNames","Splus");
Critic network
%% Critic Network
criticNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(128,'Name','CriticFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = dlnetwork(criticNetwork);
% Setting Critic
criticOptions = rlOptimizerOptions('LearnRate',0.1,'GradientThreshold',inf);
critic = rlValueFunction(criticNetwork,obsInfo);
something eles
%% Create PPO Agent
% Setting PPO Agent Options
agentOptions = rlPPOAgentOptions(...
'SampleTime',Ts,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceHorizon',600,...
'ClipFactor',0.02,...
'EntropyLossWeight',0.01,...
'MiniBatchSize',300, ...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'DiscountFactor',0.99);
% Create Agent
agent = rlPPOAgent(actor,critic,agentOptions);
%% Train Agent
maxepisodes = 10000;
maxsteps = ceil(Nt/Ts);
trainingOptions = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'StopOnError',"on",...
'Plots',"training-progress",...
'StopTrainingCriteria',"AverageReward",...
'StopTrainingValue',-14500,...
'SaveAgentCriteria',"EpisodeReward",...
'SaveAgentValue',-14500);
% train? 1-train; 0-not train
doTraining = 1;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainingOptions);
save('XXX.mat','agent')
else
% Load the pretrained agent for the example.
load('XXX.mat','agent')
end
THANKS!

Respuestas (2)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 27 de Oct. de 2023
You can always clip the agent output on the environment side. PPO is stochastic so the upper and lower limits are not guaranteed to be respected with the current implementation.
  2 comentarios
Peijie
Peijie el 4 de Jul. de 2024 a las 1:32
Pls is this algorithm updated in the 2024a version, thank you
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 4 de Jul. de 2024 a las 5:54
Updated with respect to what? What I mentioned above is still true

Iniciar sesión para comentar.


Nicolas CRETIN
Nicolas CRETIN el 19 de Jul. de 2024 a las 17:56
Hello,
I'm quite beginner in this field, but I faced the same issue and I used a sigmoïd layer and then a scaling layer to bypass the issue.
The sigmoïd layer outputs a value between 0 and one, and then you can rescale it with a linear function within the desired range. This is to be applied only to the action path of the actor net, excepted if you also want to scale your standard deviation.
But I'm quite surprised that Emmanouil said there is no way to do it. Did I miss a side effect or something ?
Hope it helps Regards, Nicolas

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by