RDHNet: Addressing Rotational and Permutational Symmetries in Continuous Multi-Agent Systems

Dongzi Wang,Lilan Huang,Muning Wen,Xiao Teng,TENG LI,Minglong Li

OpenReview PDF

提交: 2024-09-26更新: 2024-11-18

TL;DR

Rotational invariance is used to compress redundant representation space to accelerate learning efficiency in MARL.

摘要

关键词

Multi-agentReinforcement LearningSymmetry

评审与讨论

审稿意见

评分: 5置信度: 32024-10-20

The paper presents RDHNet, a novel approach for addressing rotational symmetries in multi-agent reinforcement learning (MARL) systems with continuous action spaces. Rotational symmetry in MARL introduces redundant state representations, which can hinder learning efficiency. RDHNet introduces a rotation-invariant architecture that utilizes relative coordinate systems and hypernetworks to enhance its ability to model complex multi-agent dynamics.

优点

The authors formalize the symmetry problem in MARL and distinguishing between permutation and rotational symmetry. They propose a novel RDHNet architecture, which extracts relative directional and positional information, compressing redundant representations caused by symmetry. The empirical results demonstrate the superiority of the proposed method over baselines.

缺点

The authors considers coordinate transformation to deal with the redundancy problem. Would this coordinate transformation misunderstand the meaning of the original observation and further affects the action-decision making.
The authors should give more explanations on why coordinate transformation can reduce redundancy.
When the number of agents in the environment changes, can the original network structure still be applied to this change.
Whether the increased network complexity would affect the learning efficiency.

问题

See the weaknesses above.

审稿意见

评分: 3置信度: 42024-10-22

This paper presents RDHNet to address rotational and permutational symmetry in comtinuous MARL. The author propose a rotation- invariant network for continuous action space, which utilize relative coordinate between agents, and use a hypernet to enehance the fitting capability of models. Experiments in cooperative navigation and predator prey demonstrates the effectiveness of the proposed algorithm.

优点

The paper deals with an important problem of aggregating real-world rules in MARL algorithms. The writing is easy to follow.

缺点

The main contribution proposed by authors is a method to handle continuous transformations. However, this seems only a minor technical detail in achieving symmetry. Also, while authors claim "They neither consider nor can be applied to continuous random rotational symmetry, which is precisely the focus of our work and is more aligned with real-world scenarios", invariance to contiuous transformations are already studied in [1, 2]. So I wonder what are the contributions made by authors.
The proposed method lack theoretical guarantees and seems largely empirical. I would recommend authors to add additional formal analysis for the proposed method.
MARL should be considered as a Markov Game or Dec-POMDP, not a MDP, as stated in Section 3. The problem stated by author sees more like a Dec-POMDP, which is cooperative MARL. This should be explicitly stated.
The authors could consider evaluating their method on some real-world tasks instead of toy simulations to better demonstrate their applicability in "real-world scenarios".

Minors: please check the typos, such as Sec. 4.3, ALGORITHM INPLEMENTATION should be ALGORITHM IMPLEMENTATION. Also check grammar errors.

[1] Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning. ICML 2024.

[2] Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance. Arxiv 2024.

问题

Please see "weakness" section.

审稿意见

评分: 1置信度: 32024-10-26

The authors propose a network architecture for multiagent RL problems where absolute coordinates are autonomously converted to rotation invariant features.

优点

On the very high level, the authors investigate an important problem: bisimulation, or how to compute similarity between different states to find representations where equivalent states are merged.

缺点

Pretty much the whole of the paper is only applicable to domains in which the state is described through coordinates, which not only is a very restricting assumption, but also if this is the case of the application of interest, it sounds to me trivial to just change the state representation to use rotation-invariant coordinates, instead of having a dedicated layer to perform this translation.
I suggest the authors focus instead in developing an architecture able to identify autonomously equivalent states (that is not only applicable in navigation domains).
A much more complex experimentation evaluation will also be needed, as well as the incorporation of benchmarks of other approaches that compute state similarity.

问题

No specific question.

撤稿通知

2024-11-18

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.