why I get a different action result every new time with same sample observations after deploying trained RL policies?
    3 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    de y
 el 23 de Feb. de 2021
  
    
    
    
    
    Editada: liang zhang
 el 2 de Mzo. de 2022
            load("agent0218_300016_40000.mat","agent");
obsInfo = getObservationInfo(agent);
actInfo = getActionInfo(agent);
ResetHandle = @() myResetFunction(test_sss);
StepHandle = @(Action,LoggedSignals) myStepFunction(Action,LoggedSignals,test_sss);
envT = rlFunctionEnv(obsInfo,actInfo,StepHandle,ResetHandle);
simOpts = rlSimulationOptions('MaxSteps',size(test_sss,1));
experience = sim(envT,agent,simOpts);
ac3=squeeze(experience.Action.bs.Data);
%******************************************************************************
%******************************************************************************
generatePolicyFunction(agent);
%******************************************************************************
%******************************************************************************
for iii=1:size(ac3,1)
    observation1=test_sss{iii,:};
    action1(iii,1) = evaluatePolicy(observation1);
end
sum(abs(ac3-action1))
0 comentarios
Respuesta aceptada
  Emmanouil Tzorakoleftherakis
    
 el 23 de Feb. de 2021
        Which agent are you using? Some agents are stochastic, meaning that the output is sampled based on probability distributions so by construction they won't give you the same result. 
Another possible reason is the reset function. It seems you are saving simulation data and running inference again, but every time you call 'sim', the reset function is called first. So if there are any components that randomize initial conditions/parameters, then you are not comparing with the same data.
1 comentario
  liang zhang
 el 2 de Mzo. de 2022
				
      Editada: liang zhang
 el 2 de Mzo. de 2022
  
			I also encountered the same problem when I used the DDPG agent for verification, my reset function doesn't randomize initial any conditions/parameters,I guess if the trained DDPG agent also has its own noise? Shouldn't a trained agent be a fixed set of neural network parameters?
Más respuestas (1)
Ver también
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


