I'm working on creating an agent that can learn the sine function as a simple warm-up, but I'm already stuck. The problem is that after a few iterations, the agent reaches a certain level and then hits a ceiling and won't learn further no matter how long I let the learning process run. My code:
obs_info = rlNumericSpec([1 1], LowerLimit=-pi, UpperLimit=pi);
obs_info.Name = "Sinus Value";
act_info = rlNumericSpec([1 1], LowerLimit=-1, UpperLimit=1);
act_info.Name = "Predicted Value";
reset_fcn_handle = @()reset_train();
step_fcn_handle = @(action, portfolio)step_train( ...
sinus_train_env = rlFunctionEnv( ...
obs_info, act_info, step_fcn_handle, reset_fcn_handle);
function [initial_observation, portfolio] = reset_train()
initial_observation = 2*pi*rand(1)-pi;
portfolio.LastValue = initial_observation;
function [next_observation, reward, is_done, portfolioOut] = step_train( ...
expected_prediction = sin(portfolio.LastValue);
reward = 1 / 100 / (0.01 + abs(action - expected_prediction));
next_observation = 2*pi*rand(1)-pi;
portfolioOut = portfolio;
portfolioOut.LastValue = next_observation;
I use the Reinforcement Learning Designer to construct the agent. The "Compatible algorithm" is set to TD3 (default option) and the number of hidden units is 32. The hyperparameters and Exploration Model settings:
For the training Max Episode Length = 1000, Average Window Length = 5, Stopping Criteria = AverageReward, Stopping Value = 900. The result after 30 minutes:
The result after an hour of training:
I tried to modify the reward function and let it run for two hours:
reward = 1 / (0.01 + abs(action - expected_prediction));
The result after 30 minutes of training:
Second try at modifying the reward function:
if (abs(action-expected_prediction) > 0.05)
The result:
As you can see, none of the results show a sine wave. No matter how long I let it run (I even let it run overnight), the result is always one of the images above and the learning process always stucks at a certain level.