number of look ahead steps in DDPG Agent Options

I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?
  1. Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
  2. Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...
Any help is highly appreciated.

Respuestas (1)

Anh Tran
Anh Tran el 1 de Mzo. de 2020

1 voto

I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.
Assume g is the discount factor, the critic target will be as followed

4 comentarios

Thanks for the reply. I have one more query regarding above equation. The R_t here is the instant reward or the future discounted reward at the instance? Because as per standard notation R denotes cumulative discounted reward and r denotes instantaneous scalar reward.
Anh Tran
Anh Tran el 2 de Mzo. de 2020
It is the instant reward. The future discounted reward would be:
I used to be consistent with the documentation DDPG training algorithm.
Thanks for your help.??
Dingshan Sun
Dingshan Sun el 1 de Sept. de 2022
Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?

Iniciar sesión para comentar.

Categorías

Preguntada:

el 21 de Feb. de 2020

Comentada:

el 1 de Sept. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by