Problems with control of PID parameters using reinforcement learning

Question

jh j el 15 de Mayo de 2023

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1963919-problems-with-control-of-pid-parameters-using-reinforcement-learning

Respondida: Divyanshu el 26 de Dic. de 2024

Hello,I am currently learning about reinforcement learning. I want to use reinforcement learning to generate my PID controller parameters. But when I build the simulink model and start training, my reward value keeps going to 0 and when I look at the actions generated by the model in simulink it keeps going to null.I have no idea why this is happening and I am looking for some help.Thank you in advance for your assistance.

clc;clear;close all;
open_system('BLDCRL')
Unable to find system or file 'BLDCRL'.
obsInfo = rlNumericSpec([3 1],...
    'LowerLimit',[-inf -inf -inf ]',...
    'UpperLimit',[ inf  inf inf]');
numObservations = obsInfo.Dimension(1);
actInfo = rlNumericSpec([3 1]);
numActions = actInfo.Dimension(1);
env = rlSimulinkEnv('BLDCRL','BLDCRL/RL Agent',...
    obsInfo,actInfo);
env.ResetFcn = @localResetFcn;
Ts = 5;
Tf = 100;
rng(0)
statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(500,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(500,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','Action')
    fullyConnectedLayer(500,'Name','ActionFC1')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'ActionFC1','add/in2');
% figure
% plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
% critic = rlQValueFunction(criticNetwork,...
%              obsInfo,actInfo, ...
%              "ActionInputNames",{'Action'},"ObservationInputNames",{'State'},"UseDevice","gpu");
critic = rlRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(500,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1') 
    fullyConnectedLayer(500,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2') 
    fullyConnectedLayer(numActions,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-05,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
% actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo, ...
%     "ObservationInputNames",{'State'},"UseDevice","gpu");
% act = getAction(actor, ...
%     {rand(obsInfo.Dimension)}); 
% act{1}
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'DiscountFactor',1.0, ...
    'MiniBatchSize',64, ...
    'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 500;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',5, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',998);
trainingStats = train(agent,env,trainOpts);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Divyanshu el 26 de Dic. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1963919-problems-with-control-of-pid-parameters-using-reinforcement-learning#answer_1556495

Hello @jh j,

If you are following some official example of Reinforcement Learning Toolbox, then ensure that the Reinforcement Learning Toolbox is installed properly on your system along with MATLAB. Because the error indicates that the model file 'BLDCRL' is missing.

However, if this is a custom code then ensure that the model 'BLDCRL' is present on MATLAB path.

Additionally, you can take reference of the following example documentation to get more details.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Problems with control of PID parameters using reinforcement learning

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Problems with control of PID parameters using reinforcement learning

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos