Many real-world multi-agent scenarios can be naturally modeled as partially observable cooperative multi-agent reinforcement learning (MARL) problems. Agents act in a decentralized manner and are endowed with different observations. Uncertainty about the state and action of other agents can hinder coordination, particularly during decentralized sequential execution, leading to catastrophic miscoordination and suboptimal policies.
Successfully achieving coordination in such multi-agent systems often requires agents to achieve consensus. The most common way to accomplish this is through information sharing. By equipping agents with information-sharing skills, many challenges can be alleviated, such as partial observability and non-stationarity. Recent methods either exchange messages explicitly using communication protocols or share information implicitly by behavior modeling and centralized training.
Sharing information explicitly heavily depends on the existence of a communication channel, which may come with limitations such as transmission bandwidth, cost, and information delay in real-world scenarios. Implicit sharing needs to either learn a behavior prediction model (behavior modeling) or a centralized learner (centralized training and decentralized execution, CTDE). Behavior modeling enables the environment to appear stationary to an agent, but inaccurate prediction models may lead to significant bias in the agent′s behavior. CTDE enables information to flow in centralized training but lacks mechanisms for information sharing during decentralized execution.
In this paper published in Machine Intelligence Research, researchers from the Institute of Automation, Chinese Academy of Sciences, propose developing dual-channel consensus (DuCC) to enhance multi-agent coordination. DuCC enables agents to establish a shared understanding of the environmental state by training consensus representations inferred by agents that correlated to the same state to be similar and those of different states to be distinct. This enables agents to overcome the limitations of partial observability and consistently assess the current situation (cognitive consistency), facilitating effective coordination among them. Furthermore, agents independently infer consensus representations during decentralized execution and use them as an additional factor for decision-making. This enables timely information sharing during decentralized execution, without the need for explicit communication. Their method is flexible and can be integrated with various existing MARL algorithms to enhance coordination.
Specifically, researchers learn consensus representations that reflect DuCC using contrastive representation learning. Contrastive representation learning employs contrastive loss functions to distinguish between similar and dissimilar instances, which is well-suited for establishing common knowledge, as in the case of DuCC. Researchers achieve DuCC through three steps. First, they introduce consensus inference models that map local observations to latent consensus representations, which capture environmental decision-relevant information. Then, they design an inner-agent contrastive representation learning objective to capture slow-changing environmental features and maintain slow information dynamics within each agent over time (inner-agent consensus). Researchers also design an inter-agent contrastive representation learning objective to align inner-agent consensus across multiple agents (inter-agent consensus), thus realizing cognitive consistency. DuCC is considered to be achieved when the learned consensus representations satisfy the two objectives. Finally, researchers incorporate the consensus representations into MARL algorithms to enhance multi-agent coordination.
Researchers evaluate their method on StarCraft multi-agent challenge and Google research football. The results demonstrate that DuCC significantly improves the performance of various MARL algorithms. In particular, the combination of their method with QMIX, called DuCC-QMIX, outperforms state-of-the-art MARL algorithms. Researchers also design individual consensus metric and group consensus metric to illustrate the efficiency of the two contrastive representation learning objectives in developing DuCC. The visualization of consensus representations shows that their method effectively helps existing MARL algorithms achieve consensus.
The contribution of this paper is three-fold: 1) Researchers propose enhancing multi-agent coordination via dual-channel consensus (DuCC), which includes inner-agent and inter-agent consensus. 2) Researchers design two contrastive representation learning objectives to develop DuCC and propose two metrics to evaluate their effectiveness. 3) Researchers demonstrate their method can be easily integrated with existing MARL algorithms and enhance their performance. Additionally, researchers provide valuable insights into the effectiveness of their method by visualizing the learned DuCC representations and the DuCC-guided strategies.
Section 2 reviews work related to multi-agent coordination through information sharing and contrastive representation learning. Researchers in this paper propose that their method considers capturing slow features and maintaining slow information dynamics within each agent over time (inner-agent consensus) and aligning inner-agent consensus across multiple agents (inter-agent consensus) to realize multi-agent coordination. Unlike other methods, their method uses self-supervised contrastive representation learning to learn continuous representations for inner-agent and inter-agent consensus.
Section 3 introduces preliminaries, which include four parts: decentralized partially observable Markov decision processes, consensus learning, slow features, and representation learning.
In Section 4, researchers provide a formal definition of DuCC and describe how to develop and use DuCC to enhance multi-agent coordination within the context of MARL. They start by defining two types of consensus, the temporally extended consensus within each agent (inner-agent consensus) and mutual consensus across agents (inter-agent consensus), which together form DuCC. Next, researchers develop consensus inference models that map local observations to latent consensus representations, capturing environmental decision-relevant information. Researchers then introduce an inner-agent contrastive representation learning objective to develop an inner-agent consensus that captures slow features and maintains slow information dynamics within each agent over time. Researchers also design an inter-agent contrastive representation learning objective to align inner-agent consensus across agents, forming inter-agent consensus. Finally, they discuss how to leverage DuCC to enhance multi-agent coordination.
Section 5 is about experiments. Researchers conduct experiments on two challenging environments, StarCraft multi-agent challenges (SMAC) and Google research football (GRF), to answer the following questions: 1) Can this method improve performance by enhancing multi-agent coordination? 2) How does this method compare to various implicit and explicit coordination methods? 3) Do the two contrastive representation learning objectives effectively develop consensus? 4) How can this method be integrated with various MARL algorithms? 5) What is the role of each contrastive representation learning objective?
This paper proposes enhancing multi-agent coordination via dual-channel consensus (DuCC), which comprises temporally extended consensus within each agent (inner-agent consensus) and mutual consensus across agents (inter-agent consensus). Researchers design two contrastive representation learning objectives to simultaneously develop both types of consensus and two metrics to evaluate their efficiency. Extensive experiments on StarCraft multi-agent challenges and Google research football demonstrate that this method outperforms state-of-the-art MARL algorithms and can be flexibly combined with various MARL algorithms to enhance their performance. Finally, researchers provide visualizations of DuCC and DuCC-guided strategies to facilitate a better understanding of this method.
See the article:
Enhancing Multi-agent Coordination via Dual-channel Consensus
http://doi.org/10.1007/s11633-023-1464-2
Journal
Machine Intelligence Research
Article Title
Enhancing Multi-agent Coordination via Dual-channel Consensus
Article Publication Date
16-Mar-2024