How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

Question

houssam deboucha el 28 de Ag. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2148454-how-to-compute-the-gradients-of-sac-agent-for-custom-training-in-additon-is-the-target-critics-are

Editada: praguna manvi el 4 de Sept. de 2024

I'm trying to train multi SAC agent using parallel computing, i don't know how to compute the gradients of agents using dlfeval function, knowing that i have created minibatchqueue for data processing. In addition, given that the agents have been created as agent=rlSACAgent(actor1,[critic1,critic2],agentOpts) , should i introduce the critics targets or they are internally handled by MATLAB by specifying the smoothing factor tau or updating frequency of target critic, and how i can update them?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

praguna manvi el 4 de Sept. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2148454-how-to-compute-the-gradients-of-sac-agent-for-custom-training-in-additon-is-the-target-critics-are#answer_1510569

Editada: praguna manvi el 4 de Sept. de 2024

Abrir en MATLAB Online

Hi @houssam deboucha,

The critic and actor networks are updated internally using the “train” function for agents defined as:

agent = rlSACAgent(actor,[critic1,critic2],agentOpts);

You can find an example of training a rlSACAgent in this documentation:

https://www.mathworks.com/help/reinforcement-learning/ug/train-sac-agent-for-ball-balance-control.html#TrainSACAgentForBallBalanceControlExample-2

For custom training you can refer to this documentation, which outlines the functions needed:

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-policy-using-custom-training.html#TrainRLPolicyUsingCustomTrainLoopExample-6

Typically, you could use “getValue” or “getAction” functions to extract outputs, calculate loss and compute gradients with “dlgradient”. Here is a link to another example with custom training using sampled minibatch experiences:

https://www.mathworks.com/help/reinforcement-learning/ug/custom-training-loop-with-simulink-action-noise.html#CustomTrainingLoopWithSimulinkActionNoiseExample-11

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos