Centralized vs Decentralized Training for Multi Agent Reinforcement Learning

43 visualizaciones (últimos 30 días)
What exactly are the differences in centralized and decentralized training for multi agent reinforcement learning? Is centralized learning the same as the paradigm of CTDE (centralized training and decentralized execution) that is seen in much of the multi agent RL literature? When I run centralized training , the main difference I notice is that it appears that all agents are receiving the same Q0 value, which I believe means they have the same critic. I see that both methods are used in the tutorials, so I'm trying to get a clearer picture of what the differences are and when to use one versus the other.

Respuesta aceptada

Ashu
Ashu el 31 de Jul. de 2023
Hi Kyle,
I understand that you want to know the difference between "centralized" and "decentralized" learning strategies in Reinforcement Learning.
In MATLAB, the terms "centralized" and "decentralized" refer to different learning strategies for agent groups. Let's explore the differences between these two strategies:
1. Decentralized Training:
  • In decentralized training, each agent collects its own set of experiences during the episodes and learns independently from those experiences.
  • Agents maintain their own critics (value functions) and policies, which are updated based on their own experiences.
  • There is no sharing of experiences or learning updates between agents.
  • This approach is suitable when agents have distinct roles or objectives and should learn independently without coordination.
2. Centralized Training:
  • In centralized training, agents share the collected experiences and learn from them together.
  • All agents within a specific agent group (as defined by `AgentGroups`) share the same critic (value function) and policy.
  • The critic is updated based on the collective experiences of all agents in the group, allowing them to learn from a shared knowledge base.
  • Policies are shared among agents to promote coordination and collaboration.
  • This approach is useful when agents need to coordinate their actions and learn from a common perspective, such as in cooperative tasks or when there is a need for centralized decision-making.
'AgentGroups' and 'LearningStrategy' must be used together to specify whether agent groups learn in a centralized manner or decentralized manner.
For example, you can use the following command to configure training for three agent groups with different learning strategies. The agents with indices [1,2] and [3,5] learn in a centralized manner, while agent 4 learns in a decentralized manner.
trainOpts = rlMultiAgentTrainingOptions(AgentGroups={[1,2],4,[3,5]}, ...
LearningStrategy=["centralized","decentralized","centralized"])
The paradigm of CTDE (centralized training and decentralized execution) is indeed related to the concept of centralized training in multi-agent RL. CTDE refers to training agents in a centralized manner, where they share a common critic and policy, but during execution or deployment, agents act independently without communication or coordination.
When to use centralized or decentralized training depends on the problem and the desired behavior of the agents. If coordination and collaboration are essential, centralized training can be beneficial. On the other hand, if agents have distinct roles or should act independently, decentralized training is more appropriate.
Please refer the following documentation of 'rlmultiagenttrainingoptions' to learn more about the usage of 'centralized' and 'decentralized' learning strategies.
I Hope this information was helpful.
  3 comentarios
Lin
Lin el 22 de Jul. de 2024
Hello Ashu:
Can you provide some references for matlab multi-agent example?
Thank you!
Yiwen Zhang
Yiwen Zhang el 16 de Oct. de 2024
Editada: Yiwen Zhang el 16 de Oct. de 2024
Hello @Ashu:
I have tried the centralized training, and I extracted all the neural networks of actors and critics in every agents. I found all the actor networks share the same parameters, as well as critic networks. Does each actor or critic using all agents' mini-batches to update itself?
I mean, for example, if there are 3 agents and the mini-batch size of each of them is 128, is 128*3 samples applied for actor or critic training?
Another question is: What's the input of critic network? The state space of each agent or some kinds of joint state space?

Iniciar sesión para comentar.

Más respuestas (0)

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by