News Release

MA3C: Enhancing communication robustness in multi-agent learning through adaptable auxiliary multi-agent adversary generation

Peer-Reviewed Publication

Higher Education Press

Figure 1

image: 

The overall relationship between the attacker and the ego system. The black solid arrows indicate the direction of data flow, the red solid ones indicate the direction of gradient flow and the red dotted ones mean the attack actions from the attacker onto specific communication channels.

view more 

Credit: Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU

Communication is a fundamental aspect of promoting effective coordination in Cooperative Multi-Agent Reinforcement Learning (MARL). While existing research focuses on enhancing the efficiency of agent communication, it often overlooks the challenges associated with real-world communication scenarios. In reality, communication is often subject to various sources of noise and potential attacks, making the robustness of communication-based policies a crucial and pressing issue that demands further exploration.

To solve the problems, a research team led by Yang Yu from LAMDA, Nanjing University published their new research on 15 December 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team posits a robust communication-based policy should be robust to scenarios where every message channel may be perturbed under different degrees at any time, and the ego system trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy.

In the research, they model the message adversary training process as a cooperative MARL problem, where each adversary obtains the local state of one message sender, then outputs  stochastic actions as message perturbations for each message to be sent to other teammates. For the optimization of the adversary, as there are  adversaries coordinating to minimize the ego system's return, they can use any cooperative MARL approach to train the attacker system. Moreover, to alleviate the overfitting problem of using a single attacker, they introduce an attacker population learning paradigm, with which they can obtain a set of attackers with high attacking quality and behavior diversity. The ego system and the attacker are then trained in an alternative way to obtain a robust communication-based policy.

Extensive experiments are conducted on various cooperative multi-agent benchmarks that need communication to coordination, including Hallway, two maps from StarCraft Multi-Agent Challenge (SMAC), a newly created environment Gold Panner (GP), and Traffic Junction. The experimental results show that MA3C outperforms multiple baselines. Other results also show its high generalization ability for various perturbation ranges, and the learned policy can transfer learned robustness ability to new tasks after fine-tuning with a few samples.
Future work on how to develop an autonomous paradigm like curriculum learning to find the communication ability boundary is an invaluable direction, and developing efficient and effective MARL communication methods under open-environment scenarios is challenging but of great value in the future.
DOI: 10.1007/s11704-023-2733-5
 


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.