Reinforcement agent learning always stucks

9 visualizaciones (últimos 30 días)
Eugen Fekete
Eugen Fekete el 6 de Feb. de 2025
Editada: Gayathri el 10 de Feb. de 2025
I'm working on creating an agent that can learn the sine function as a simple warm-up, but I'm already stuck. The problem is that after a few iterations, the agent reaches a certain level and then hits a ceiling and won't learn further no matter how long I let the learning process run. My code:
obs_info = rlNumericSpec([1 1], LowerLimit=-pi, UpperLimit=pi);
obs_info.Name = "Sinus Value";
act_info = rlNumericSpec([1 1], LowerLimit=-1, UpperLimit=1);
act_info.Name = "Predicted Value";
reset_fcn_handle = @()reset_train();
step_fcn_handle = @(action, portfolio)step_train( ...
action, portfolio);
sinus_train_env = rlFunctionEnv( ...
obs_info, act_info, step_fcn_handle, reset_fcn_handle);
function [initial_observation, portfolio] = reset_train()
initial_observation = 2*pi*rand(1)-pi;
portfolio = struct;
portfolio.LastValue = initial_observation;
end
function [next_observation, reward, is_done, portfolioOut] = step_train( ...
action, portfolio)
expected_prediction = sin(portfolio.LastValue);
reward = 1 / 100 / (0.01 + abs(action - expected_prediction));
next_observation = 2*pi*rand(1)-pi;
portfolioOut = portfolio;
portfolioOut.LastValue = next_observation;
is_done = false;
end
I use the Reinforcement Learning Designer to construct the agent. The "Compatible algorithm" is set to TD3 (default option) and the number of hidden units is 32. The hyperparameters and Exploration Model settings:
For the training Max Episode Length = 1000, Average Window Length = 5, Stopping Criteria = AverageReward, Stopping Value = 900. The result after 30 minutes:
The result after an hour of training:
I tried to modify the reward function and let it run for two hours:
reward = 1 / (0.01 + abs(action - expected_prediction));
The result after 30 minutes of training:
Second try at modifying the reward function:
if (abs(action-expected_prediction) > 0.05)
reward = -1;
else
reward = 1;
end
The result:
As you can see, none of the results show a sine wave. No matter how long I let it run (I even let it run overnight), the result is always one of the images above and the learning process always stucks at a certain level.

Respuestas (1)

Gayathri
Gayathri el 10 de Feb. de 2025
Editada: Gayathri el 10 de Feb. de 2025
I understand that you are experiencing issues with your reinforcement learning agent effectively learning the sine function. You can try following the below steps to solve the issue.
  • The reward function is crucial for guiding the agent's learning. Your current reward functions might be too sparse or not providing enough gradient information for effective learning. Consider using a continuous reward function that smoothly penalizes errors as shown below.
reward = -(action - expected_prediction)**2;
  • Normalize the input and output of your neural networks. Since the sine function outputs values between -1 and 1, ensure that the network's output layer (action) is appropriately bounded.
  • Try reducing the learning rate to 0.001/0/00001. A small learning rate takes tiny steps, ensuring stability but slowing down the process.
  • Also, you can try adjusting the "Gradient Decay" parameter.
For more information on training using the reinforcement Learning Designer, please refer to the following documentation link,

Categorías

Más información sobre Environments en Help Center y File Exchange.

Productos


Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by