This is my code:
%% 读取环境
clc;
mdl = 'rlpmsmSimscapeModel';
open_system(mdl)
env = rlSimulinkEnv('rlpmsmSimscapeModel','rlpmsmSimscapeModel/RL Agent');
obsInfo = getObservationInfo(env);
numObservations = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
numActions = actInfo.Dimension(1);
%%
Ts = 0.02;
Tf = 1;
rng(0)
%% 初始化agent
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([numActions 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(25,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [
imageInputLayer([numObservations 1 1], 'Normalization', 'none', 'Name', 'State')
fullyConnectedLayer(3, 'Name', 'actorFC')
tanhLayer('Name', 'actorTanh')
fullyConnectedLayer(numActions, 'Name', 'Action')
];
actorOptions = rlRepresentationOptions('LearnRate',2e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'State'},'Action',{'Action'});
agentOptions = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1.0, ...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',64);
agentOptions.NoiseOptions.Variance = 0.3;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
%% 设置训练参数
maxepisodes = 2000;
maxsteps = ceil(Tf/Ts);
trainingOptions = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',5,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',800,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',-400);
%% 并行学习设置
trainOpts.UseParallel = true;
trainOpts.ParallelizationOptions.Mode = "async";
trainOpts.ParallelizationOptions.DataToSendFromWorkers = "Gradients";
trainOpts.ParallelizationOptions.StepsUntilDataIsSent = -1;
%% 训练
trainingStats = train(agent,env,trainingOptions);
%% 结果展示
simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);
totalReward = sum(experience.Reward);
% bdclose(mdl)
警告: 执行为类 rl.env.SimulinkEnvWithAgent 定义的事件 EpisodeFinished 的侦听程序回调时出现错误:
错误使用 rl.policy.AbstractPolicy/getAction (第 258 行)
Invalid observation type or size.
出错 rl.agent.rlDDPGAgent/evaluateQ0Impl (第 141 行)
action = getAction(this, observation);
出错 rl.agent.AbstractAgent/evaluateQ0 (第 275 行)
q0 = evaluateQ0Impl(this,observation);
出错 rl.train.TrainingManager/update (第 134 行)
q0 = evaluateQ0(this.Agents(idx),epinfo(idx).InitialObservation);
出错 rl.train.TrainingManager>@(info)update(this,info) (第 437 行)
trainer.FinishedEpisodeFcn = @(info) update(this,info);
出错 rl.train.Trainer/notifyEpisodeFinishedAndCheckStopTrain (第 56 行)
stopTraining = this.FinishedEpisodeFcn(info);
出错 rl.train.SeriesTrainer>iUpdateEpisodeFinished (第 31 行)
notifyEpisodeFinishedAndCheckStopTrain(this,ed.Data);
出错 rl.train.SeriesTrainer>@(src,ed)iUpdateEpisodeFinished(this,ed) (第 17 行)
@(src,ed) iUpdateEpisodeFinished(this,ed));
出错 rl.env.AbstractEnv/notifyEpisodeFinished (第 324 行)
notify(this,'EpisodeFinished',ed);
出错 rl.env.SimulinkEnvWithAgent/executeSimsWrapper/nestedSimFinishedBC (第 222 行)
notifyEpisodeFinished(this,...
出错 rl.env.SimulinkEnvWithAgent>@(src,ed)nestedSimFinishedBC(ed) (第 232 行)
simlist(1) = event.listener(this.SimMgr,'SimulationFinished' ,@(src,ed) nestedSimFinishedBC(ed));
出错 Simulink.SimulationManager/handleSimulationOutputAvailable
出错 Simulink.SimulationManager>@(varargin)obj.handleSimulationOutputAvailable(varargin{:})
出错 MultiSim.internal.SimulationRunnerSerial/executeImplSingle
出错 MultiSim.internal.SimulationRunnerSerial/executeImpl
出错 Simulink.SimulationManager/executeSims
出错 Simulink.SimulationManagerEngine/executeSims
出错 rl.env.SimulinkEnvWithAgent/executeSimsWrapper (第 244 行)
executeSims(this.SimEngine,simfh,in);
出错 rl.env.SimulinkEnvWithAgent/simWrapper (第 267 行)
simouts = executeSimsWrapper(this,in,simfh,simouts,opts);
出错 rl.env.SimulinkEnvWithAgent/simWithPolicyImpl (第 424 行)
simouts = simWrapper(env,policy,simData,in,opts);
出错 rl.env.AbstractEnv/simWithPolicy (第 82 行)
[experiences,varargout{1:(nargout-1)}] = simWithPolicyImpl(this,policy,opts,varargin{:});
出错 rl.task.SeriesTrainTask/runImpl (第 33 行)
[varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
出错 rl.task.Task/run (第 21 行)
[varargout{1:nargout}] = runImpl(this);
出错 rl.task.TaskSpec/internal_run (第 166 行)
[varargout{1:nargout}] = run(task);
出错 rl.task.TaskSpec/runDirect (第 170 行)
[this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
出错 rl.task.TaskSpec/runScalarTask (第 194 行)
runDirect(this);
出错 rl.task.TaskSpec/run (第 69 行)
runScalarTask(task);
出错 rl.train.SeriesTrainer/run (第 24 行)
run(seriestaskspec);
出错 rl.train.TrainingManager/train (第 421 行)
run(trainer);
出错 rl.train.TrainingManager/run (第 211 行)
train(this);
出错 rl.agent.AbstractAgent/train (第 78 行)
TrainingStatistics = run(trainMgr);
出错 simulink_pmsm_ddpg (第 75 行)
trainingStats = train(agent,env,trainingOptions);
原因:
错误使用 rl.representation.rlAbstractRepresentation/validateInputData (第 525 行)
Input data dimensions must match the dimensions specified in the corresponding observation and action info specifications.
> 位置:rl.env/AbstractEnv/notifyEpisodeFinished (第 324 行)
位置: rl.env.SimulinkEnvWithAgent.executeSimsWrapper/nestedSimFinishedBC (第 222 行)
位置: rl.env.SimulinkEnvWithAgent>@(src,ed)nestedSimFinishedBC(ed) (第 232 行)
位置: Simulink/SimulationManager/handleSimulationOutputAvailable
位置: Simulink.SimulationManager>@(varargin)obj.handleSimulationOutputAvailable(varargin{:})
位置: MultiSim.internal/SimulationRunnerSerial/executeImplSingle
位置: MultiSim.internal/SimulationRunnerSerial/executeImpl
位置: Simulink/SimulationManager/executeSims
位置: Simulink/SimulationManagerEngine/executeSims
位置: rl.env/SimulinkEnvWithAgent/executeSimsWrapper (第 244 行)
位置: rl.env/SimulinkEnvWithAgent/simWrapper (第 267 行)
位置: rl.env/SimulinkEnvWithAgent/simWithPolicyImpl (第 424 行)
位置: rl.env/AbstractEnv/simWithPolicy (第 82 行)
位置: rl.task/SeriesTrainTask/runImpl (第 33 行)
位置: rl.task/Task/run (第 21 行)
位置: rl.task/TaskSpec/internal_run (第 166 行)
位置: rl.task/TaskSpec/runDirect (第 170 行)
位置: rl.task/TaskSpec/runScalarTask (第 194 行)
位置: rl.task/TaskSpec/run (第 69 行)
位置: rl.train/SeriesTrainer/run (第 24 行)
位置: rl.train/TrainingManager/train (第 421 行)
位置: rl.train/TrainingManager/run (第 211 行)
位置: rl.agent.AbstractAgent/train (第 78 行)
位置: simulink_pmsm_ddpg (第 75 行)
错误使用 rl.env.AbstractEnv/simWithPolicy (第 82 行)
An error occurred while simulating "rlpmsmSimscapeModel" with the agent "agent".
出错 rl.task.SeriesTrainTask/runImpl (第 33 行)
[varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
出错 rl.task.Task/run (第 21 行)
[varargout{1:nargout}] = runImpl(this);
出错 rl.task.TaskSpec/internal_run (第 166 行)
[varargout{1:nargout}] = run(task);
出错 rl.task.TaskSpec/runDirect (第 170 行)
[this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
出错 rl.task.TaskSpec/runScalarTask (第 194 行)
runDirect(this);
出错 rl.task.TaskSpec/run (第 69 行)
runScalarTask(task);
出错 rl.train.SeriesTrainer/run (第 24 行)
run(seriestaskspec);
出错 rl.train.TrainingManager/train (第 421 行)
run(trainer);
出错 rl.train.TrainingManager/run (第 211 行)
train(this);
出错 rl.agent.AbstractAgent/train (第 78 行)
TrainingStatistics = run(trainMgr);
出错 simulink_pmsm_ddpg (第 75 行)
trainingStats = train(agent,env,trainingOptions);
原因:
错误使用 rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (第 681 行)
Invalid observation type or size.
错误使用 rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (第 681 行)
Input data dimensions must match the dimensions specified in the corresponding observation and action info specifications.