Why reinforcement learning has different results of action between sim() and getAction()?
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Shuyue Li
el 7 de Sept. de 2023
Respondida: Emmanouil Tzorakoleftherakis
el 25 de Sept. de 2023
Hi Matlab reinforcement learning team
I have a well-trained PPO actor-critic agent and turned UseExplorationPolicy to 0 to obtain actions from sim() and getAction() respectively without any random setting in env. They share the same observations and agents.
However, the actions obtained from sim() and getAction() are different, though the actions can be reproduced respectively.
Thus, I would like to know how sim() generates actions. Does action come from actor network? If so, why the results are different with the same network?
code
actoraction = getAction(saved_agent,{testobstate});
ResetHandleT = @() myResetFunctionCNsim(testData,testobstate);
StepHandleT = @(Action,StockSaved) myStepFunctionCNsim(Action,StockSaved,testData,testobstate);
envT = rlFunctionEnv(observationInfo,actionInfo,StepHandleT,ResetHandleT);
experience = sim(envT,saved_agent,simOpts);
Look forward to your reply.
Sincerely,
Shuyue
0 comentarios
Respuestas (1)
Emmanouil Tzorakoleftherakis
el 25 de Sept. de 2023
Hi,
Which release are you using? We tried in R2023a and R2023b with UseExplorationPolicy =0 and getAction and sim provide the same results. A reproduction model would be great.
0 comentarios
Ver también
Categorías
Más información sobre Signal Processing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!