DDPG multiple action noise variance error
12 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi,
I am working on developing an adaptive PID for a water tank level controller shown here:
The outputs of the RL Agent block are the 3 controller gains. As the 3 gains have very different range of values, I thought it was a good idea to use different variance for every action as suggested in the rlDDPGAgentOptions page.
However, when I initiate training, I get the following error:
Caused by:
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
For 'Output Port 1' of 'rlwatertankAdaptivePID/RL Agent/AgentWrapper', the 'outputImpl' method of the System object
'rl.simulink.blocks.AgentWrapper' returned a value whose size [3x3], does not match the value returned by the 'getOutputSizeImpl' method. Either
change the size of the value returned by 'outputImpl', or change the size returned by 'getOutputSizeImpl'.
I defined the agent options as follow:
%% Specify DDPG agent options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.9;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 5e-3;
% due to large range of action values, variance needs to be individually
% defined for every action [kp, ki and kd]
% range of kp, ki and kd should be taken into account
% kp =[-6, 6], range = 12
% ki = [-0.2, 0.2], range = 0.4
% kd = [-2, 2], range = 4
% rule states that variance should be var*sqrt(Ts) between 1% to 10% of the
% range
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
agentOptions.NoiseOptions.Variance = [0.8, 0.02, 0.2];
%agentOptions.NoiseOptions.Variance = 0.2;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-4;
How do I work around this?
Note: If I only specify one variance it works fine, but the exploration and acheived results is not good
3 comentarios
张 冠宇
el 18 de Nov. de 2021
may i ask how can i get 3 actions like kp ki kd, should i set as follows?
actInfo = rlNumericSpec([3 1]);
or [1 3]
or other settings
as i meet the error
Input data dimensions must match the dimensions specified in the corresponding observation and action info
specifications.
obsInfo = rlNumericSpec([3 1],... % rlNumericSpec:代表连续的动作或观测数据。rlFiniteSetSpec:代表离散的动作或观测数据。
'LowerLimit',[-inf -inf -inf]',...
'UpperLimit',[ inf inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1); % 取观测矩阵的维度
actInfo = rlNumericSpec([3 1]);
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);
%构建环境接口对象
env = rlSimulinkEnv('Load_Freq_Ctrl_rl2','Load_Freq_Ctrl_rl2/RL Agent',...
obsInfo,actInfo);
%设置自定义重置功能,以随机化模型的参考值。
env.ResetFcn = @(in)localResetFcn(in);
%以秒为单位指定模拟时间Tf和智能体采样时间Ts。
Ts = 0.2;
Tf = 30;
%修复随机生成器种子以提高可重复性。
rng(0)
%创建DDPG智能体
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([3 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(25,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
%观察评论者网路的配置。
figure
plot(criticNetwork)
%使用指定评论者表示的选项rlRepresentationOptions。
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
%创建critic
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(3, 'Name','actorFC')
tanhLayer('Name','actorTanh')
fullyConnectedLayer(3,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%创建智能体
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1.0, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%训练agent
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);% 'SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100',
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',5, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',2000);%155较好
%自己为true
doTraining = true;
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);
thank you
勇刚 张
el 30 de Mzo. de 2022
构造深度网络为什么用imgeInputLayer() 而不用featureLayer() 作为输入层;
actorInfo中得上下限也要像obsInfo中一样重新声明一下,记得用列向量。
Good luck
Respuestas (0)
Ver también
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!