I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works? Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q} Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+... Any help is highly appreciated.

number of look ahead steps in DDPG Agent Options

ALOK RANJAN SWAIN

21 Feb. 2020

1 Respuesta

Actualizado a las 1 Sept. 2022

24 Visualizaciones (30 días)

Iniciar sesión para responder a esta pregunta.

Follow Question

Iniciar sesión para responder a esta pregunta.

Follow Question

Mostrar comentarios más antiguos

0 votos

I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?

Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...

Any help is highly appreciated.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Respuestas (1)

Anh Tran el 1 de Mzo. de 2020

1 voto

I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.

Assume g is the discount factor, the critic target will be as followed

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

ALOK RANJAN SWAIN el 4 de Mzo. de 2020

Thanks for your help.??

Dingshan Sun el 1 de Sept. de 2022

Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.