Singularity problems with RL designer for control moment gyro control
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello,
I am trying to find a RL solution to control a control moment gyroscope using the reinforcement learning designer. The control moment gyroscope is in the inner loop. The outer loop consists of a 1DOF spacecraft attitude dynamics model (but is not very relevant to the RL control problem). To give you a general idea of the dynamic system:
Inside the 'SGCMG RL control and dynamics' block, I implemented the control as well as the dynamics of the control moment gyro. In the 'SGCMG RL control law block' is the RL agent. The action is the angular velocity of the control moment gyro's gimbal angle. The observation consists of the angular velocity of the spacecraft (s/c) and the difference between commanded torque and total torque acting on the s/c. Finally, the reward function consists of the error state, cos(delta) which is a measure of proximity to singularity of the system and the use of the actuator. These are all penalties and it is therefore more of a cost function than a reward function.
As for the RL process, I initiated the environment as follows:
%% Clear vars
close all
clear
clc
%% Define the system and input
Md=0.01; %disturbance moment Nm
J=400; %intertia kgm2
%% Initial conditions
delta0=deg2rad(0); %initial gimbal angle of control moment gyro rad
omega0=0; %initiate angular velocity of the spacecraft
theta0=deg2rad(20); %initial value for theta1
%% Create environment object from simulink model
mdl = 'RL_SGCMG_simp';
open_system(mdl)
obsInfo=rlNumericSpec([1 2]);
obsInfo.Name='observation';
actInfo = rlNumericSpec([1 1],'LowerLimit',-10^20,'UpperLimit',10^20);
actInfo.Name='action';
agentBlk=[mdl '/SGCMG RL control and dynamics/SGCMG RL control law/RL Agent'];
env=rlSimulinkEnv(mdl,agentBlk,obsInfo,actInfo);
%% Simulatation with trained agent --> run after agent is trained in RL designer
sim('RL_SGCMG_simpel');
%%
figure(1)
plot(tout,rad2deg(delta(1,:)))
xlabel('Time [s]')
ylabel('delta [deg]')
figure(2)
plot(tout,theta)
xlabel('Time [s]')
ylabel('theta [deg]')
Next, I imported the environment into the reinforcement learning designer and used a default DDPG agent. My main problem at this moment is that the action will go to its maximum value, after 1 or 2 timesteps into the episode and won't change (by a lot). For instance, with a lowerlimit, upperlimit of -10^20; 10^20, the penalty for the actuator will go straight to the maximum value (deltadot*10^-5=10^15 for the current reward function). Note that this value will be subtracted before going into the reward port of the RL block. If I do not implement a lower and upper limit on the action, it will go to infinity after some timesteps and singularity problems kick in. I would say, that with this severe penalty for the actuator (action) its value wouldn't go all the way up. It's almost like my RL algorithm is trying to minimize the reward... However, if I delete the reward function alltogether (no input for the RL agent at all), the same thing happens. So it cannot be the reward function I'd say.
Would that mean there's a problem with the default DDPG agent? The default actor and critic NNs are, respectively:
Allthough I'm pretty experienced in Matlab and Simulink, RL and NNs are relatively new to me. I already tried to play with the exploration setting (stand. dev.) but this won't solve the problem. I was hoping anybody with a bit more experience in RL could spot the problem and help me on my way.
With kind regards,
Noah
2 comentarios
Sam Chak
el 17 de Jul. de 2022
Editada: Sam Chak
el 17 de Jul. de 2022
Hi @Noah Stam
Although I don't know what your RL does when you have the circled block takes care of the attitude system, if the governing mathematics in the circled block is rigorously proven to 100% stable, does your SGCMG output () go to the infinity and beyond as well?
Or is this a momentum dumping problem for the CMG? When it gets saturated (cannot spin faster than the hardware limit), RL gives an infinite value (NaN) at the Output?
Respuestas (0)
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!