6.0

/10

Poster3 位审稿人

最低5最高7标准差0.8

3.3

置信度

正确性3.0

贡献度3.0

表达3.3

NeurIPS 2024

Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training

Pihe Hu,Shaolong Li,Zhuoran Li,Ling Pan,Longbo Huang

OpenReview PDF

提交: 2024-05-15更新: 2024-11-06

摘要

Deep Multi-agent Reinforcement Learning (MARL) relies on neural networks with numerous parameters in multi-agent scenarios, often incurring substantial computational overhead. Consequently, there is an urgent need to expedite training and enable model compression in MARL. This paper proposes the utilization of dynamic sparse training (DST), a technique proven effective in deep supervised learning tasks, to alleviate the computational burdens in MARL training. However, a direct adoption of DST fails to yield satisfactory MARL agents, leading to breakdowns in value learning within deep sparse value-based MARL models. Motivated by this challenge, we introduce an innovative Multi-Agent Sparse Training (MAST) framework aimed at simultaneously enhancing the reliability of learning targets and the rationality of sample distribution to improve value learning in sparse models. Specifically, MAST incorporates the Soft Mellowmax Operator with a hybrid TD-($\lambda$) schema to establish dependable learning targets. Additionally, it employs a dual replay buffer mechanism to enhance the distribution of training samples. Building upon these aspects, MAST utilizes gradient-based topology evolution to exclusively train multiple MARL agents using sparse networks. Our comprehensive experimental investigation across various value-based MARL algorithms on multiple benchmarks demonstrates, for the first time, significant reductions in redundancy of up to $20\times$ in Floating Point Operations (FLOPs) for both training and inference, with less than 3% performance degradation.

关键词

Multi-Agent Reinforcement LearningDynamic Sparse TrainingValue Learning

评审与讨论

审稿意见

评分: 5置信度: 32024-07-11

The paper introduces the Multi-Agent Sparse Training (MAST) framework to address computational overhead in Multi-agent Reinforcement Learning (MARL) by enhancing value learning through the Soft Mellowmax Operator with a hybrid TD-(λ) schema and a dual replay buffer mechanism. MAST achieves significant reductions in computational redundancy with minimal performance degradation.

优点

The problem of applying sparsity training to MARL is a topic.
The paper is well-written and easy to understand.
Comprehensive evaluation and ablation results provide a good understanding of the method and design decision.

缺点

This work focuses only on one benchmark (StarCraft II), applying it to other benchmarks can give us a better idea of the generalizability of the approach.
How does the proposed technique compare to single-agent RL sparse training work such as "Sokar et al., 2022". Are there existing methods that are already able to achieve high performance with the techniques proposed in this work? The authors also mentioned that "Wang et al., 2019" also prunes agent networks throughout training, so it would be good to compare also to this work to see the computation reduction and performance comparison.

问题

Is Figure 7 for illustration purposes (i.e. not using real data)? if so, I would suggest using real data to illustrate the idea and better show the significance of the issue.
How does this technique generalize to test environments other than StartCraft II?
How does the proposed approach compare to the single agent sparse training other than the RLx2 method?

局限性

More analysis of the method's limitations would be helpful.

作者回复

2024-08-07

Thanks for your time and effort in reviewing our paper! Please find our responses to your comments below. We will be happy to answer any further questions you may have.

Weaknesses

W1: This work focuses only on one benchmark (StarCraft II), applying it to other benchmarks can give us a better idea of the generalizability of the approach.

We validate the generalizability of MAST in two main ways:

Other Benchmarks: We conducted a comprehensive performance evaluation of MAST across various tasks in the SMAC benchmark. Additional experiments on the multi-agent MuJoCo (MAMuJoCo) benchmark (Peng et al., 2021) are also included in Appendix B.9.
Other Algorithms: MAST is designed as a versatile sparse training framework for value decomposition-based MARL algorithms. We integrate MAST with state-of-the-art value-based deep MARL algorithms, including QMIX, WQMI, and RES. Additionally, we apply MAST to a hybrid value-based and policy-based algorithm, FACMAC (Peng et al., 2021). Results are presented in Section 4 and Appendix B.

This comprehensive evaluation demonstrates the effectiveness and generalizability of MAST across different benchmarks and algorithms.

W2: How does the proposed technique compare to single-agent RL sparse training work such as "Sokar et al., 2022". Are there existing methods that are already able to achieve high performance with the techniques proposed in this work? The authors also mentioned that "Wang et al., 2019" also prunes agent networks throughout training, so it would be good to compare also to this work to see the computation reduction and performance comparison.

The single-agent RL sparse training work in (Sokar et al., 2022) uses SET (Mocanu et al., 2018) for topology evolution, but does not improve value learning under sparse models, resulting in low-sparsity RL models. In our experiments, the SET baseline can be viewed as the MARL version of that in (Sokar et al., 2022). As shown in Table 1, applying SET alone is insufficient to achieve high sparsity levels in MARL scenarios. We will annotate SET in Table 1 as (Sokar et al., 2022) in our revision.
The algorithm in (Wang et al., 2019) fails to maintain sparsity throughout training and only achieves a final model sparsity of 80%, which is lower than our results. Additionally, their experiments are limited to a two-agent environment, PredatorPrey-v2 in MuJoCo (Todorov et al., 2012). Therefore, we did not include a direct comparison with this work in our paper.

Questions

Q1: Is Figure 7 for illustration purposes (i.e. not using real data)? if so, I would suggest using real data to illustrate the idea and better show the significance of the issue.

Thank you for your suggestion. Figure 7 was originally intended for illustration purposes. We will update Figure 7 with real data in our revision to better demonstrate the significance of the issue. Additionally, the effectiveness of the dual buffer in improving training data distribution is validated in the ablation study in Appendix B.7.

Q2: How does this technique generalize to test environments other than StartCraft II?

We have conducted a comprehensive performance evaluation of MAST across various tasks in the SMAC benchmark. Additionally, we performed experiments on the multi-agent MuJoCo (MAMuJoCo) benchmark (Peng et al., 2021). Please refer to Appendix B.9 for details.

Q3: How does the proposed approach compare to the single agent sparse training other than the RLx2 method?

We have compared our algorithm with several single-agent sparse training methods, including SET, RigL, and RLx2. Specifically, SET and RigL were originally developed for deep supervised learning and later adapted for single-agent sparse training in DRL as shown in (Sokar et al., 2022) and (Graesser et al., 2022), respectively. Our results demonstrate that MAST significantly outperforms these baselines in the MARL setting. For detailed comparisons, please refer to Table 1 in Section 4.1.

Limitations

L1: More analysis of the method's limitations would be helpful.

Our paper does address the limitations of the MAST framework, including the challenge of managing multiple hyperparameters. This discussion is provided in Appendix A.6.

We are grateful for your constructive suggestions, which have significantly guided our improvements. We hope our response addresses your concerns. If so, we would like to know if you could kindly consider raising your score rating. We will also be happy to answer any further questions you may have. Thank you very much!

评论- Reminder to Reviewer 8jXp

2024-08-11

Dear Reviewer,

Thank you for your time and effort in reviewing our paper.

We hope our response has adequately addressed your concerns. If you feel that our rebuttal has clarified the issues raised, we kindly ask you to consider adjusting your score accordingly. Should you have any further questions or need additional clarification, we would be more than happy to discuss them with you.

Thank you once again for your valuable feedback.

评论- Reminder to Reviewer 8jXp

2024-08-14

Dear Reviewer,

Thank you for your time and effort in reviewing our paper.

Thank you once again for your valuable feedback.

审稿意见

评分: 6置信度: 42024-07-13

The paper presents a significant advancement in the field of MARL by introducing the MAST framework which aims at improving the Reliability of Training Targets and Improving the Rationality of Sample Distribution. Overall this paper is well-written and easy to follow, on a very interesting research direction with promising results.

优点

This paper did thorough research on finding the reasons with theoretical contribution and possible solutions for the poor performance of previous specification methods and introduced 2 novel designs to solve them.

The paper provides solid theoretical underpinnings for the proposed methods.

The experimental results look very promising.

缺点

An ablation study would be good to tell to how much extent the 2 designs are contributing to the performance improvement.

The overhead brought by the new design was not discussed.

问题

What kind of sparsification techniques are used in MAST? Also are mixing networks sparsified? The agent network in qmix is relatively very small, how is it possible to reach a 95% sparsification while maintaining a relatively high or even better results compared to the dense networks?

局限性

The authors did not address the limitations but I don't see a direct or potential negative societal impact from this work.

作者回复

2024-08-07

Thank you for your time and effort in reviewing our paper! Please find our responses to your comments below. We will be happy to answer any further questions you may have.

Weaknesses

W1: An ablation study would be good to tell to how much extent the 2 designs are contributing to the performance improvement.

We have provided an ablation study for our proposed techniques, including the Hybrid TD $(\lambda)$ mechanism, Soft Mellowmax Operator, and Dual Buffers, in Appendix B.7. Our findings indicate that all three components contribute significantly to the overall performance improvement.

W2: The overhead brought by the new design was not discussed.

MAST incorporates novel TD targets with a dual buffer mechanism to train ultra-sparse MARL models. The designed TD targets can be efficiently computed by our algorithm, and the dual replay buffer mechanism does not introduce additional computational overhead compared to a same-size single buffer. Detailed FLOP calculations are provided in Appendix B.4.2.
Additionally, MAST uses gradient-based topology evolution to train sparse MARL agents exclusively. The sparse network topology evolves every 10,000 gradient update steps, making the overhead from this evolution negligible compared to the gradient updates.

Following the reviewer's advice, we will include this discussion in our revision.

Questions

Q1: What kind of sparsification techniques are used in MAST? Also are mixing networks sparsified? The agent network in qmix is relatively very small, how is it possible to reach a 95% sparsification while maintaining a relatively high or even better results compared to the dense networks?

MAST employs the RigL method (Evci et al., 2020), which enhances the optimization of sparse neural networks by leveraging weight magnitude and gradient information to jointly optimize model parameters and connectivity. The mixing network is also sparsified to a specified degree, as illustrated in Fig. 3.
Sparse networks often have fewer parameters, which can make them easier to train under certain conditions. This can lead to improved performance compared to dense networks, as demonstrated in our experiments. Similar observations have also been reported in the literature (Evci et al., 2020; Tan et al., 2023).

Limitations

L1: The authors did not address the limitations.

Our paper does address the limitations of the MAST framework, including the challenge of managing multiple hyperparameters. This discussion can be found in Appendix A.6.

评论- Reminder to Reviewer zAbe

2024-08-11

Dear Reviewer,

Thank you for your time and effort in reviewing our paper.

Thank you once again for your valuable feedback.

2024-08-13

I thank the authors for their response. I'd like to remain my original rating of this paper.

评论- Thanks for Reviewer zAbe

2024-08-14

Thank you very much for your response. We really appreciate your time and effort in reviewing our paper.

审稿意见

评分: 7置信度: 32024-07-13

This paper introduces dynamic sparse training (DST) to the Deep Multi-Agent Reinforcement Learning (MARL) settings for the first time in the literature. Furthermore, it shows that applying directly DST algorithms to MARL does not lead to optimal results. Consequently, it proposes a new framework named Multi-Agent Sparse Training (MAST) which enhances the DST RigL algorithm with a hybrid TD-(λ) schema and a dual replay buffer mechanism in order to successfully cope with the challenging MARL settings. An extensive empirical validation is performed showing that MAST can reduce up to 20x the computational requirements at virtually no loss in performance.

优点

This is an original paper which introduces for the first time DST to MARL.
The paper solves the inherent problems and the suboptimal behavior of directly applying DST to MARL by proposing a new framework MAST which is specially designed for MARL.
The paper is clear and well written. The source code is provided for easy reproducibility.
The extensive empirical validation shows the superiority of the proposed framework in comparison with the most common sense baselines as there is no other DST method specially designed for MARL.
It is likely that the paper to have a fair impact on the sparse training and multi-agent reinforcement learning communities.

缺点

Up to my best understanding, there seems to be no striking weak points.

问题

Q1) While the theoretical reduction in terms of computational resources is impressive, can you comment on the real wall-clock running time? I know that this is not possible when simulating sparsity with binary masks, but have you considered using some truly sparse implementation of the neural networks? While I am not sure how easy would be to do this for the GRU layer, there exists some sparse MLP implementations for supervised learning (e.g., Curci et al., Truly Sparse Neural Networks at Scale, arXiv:2102.01732, 2021) which may be easy to adapt to the MAST framework. This may allow you to design an experiment where you can scale up (in terms of the number of neurons) seriously the neural network for very large state or action spaces which as you mentioned is a typical challenge in MARL.

Q2) (minor) I suggest adopting a uniform citation style in order to improve related work chronological readability. Currently, some of the references are cited using the year of the first preprint release on arXiv, while others are cited using the official publication year.

局限性

n/a

作者回复

2024-08-07

Thanks for your time and effort in reviewing our paper! Please find our responses to your comments below. We will be happy to answer any further questions you may have.

Questions

Q1: While the theoretical reduction in terms of computational resources is impressive, can you comment on the real wall-clock running time? I know that this is not possible when simulating sparsity with binary masks, but have you considered using some truly sparse implementation of the neural networks? While I am not sure how easy would be to do this for the GRU layer, there exists some sparse MLP implementations for supervised learning (e.g., Curci et al., Truly Sparse Neural Networks at Scale, arXiv:2102.01732, 2021) which may be easy to adapt to the MAST framework. This may allow you to design an experiment where you can scale up (in terms of the number of neurons) seriously the neural network for very large state or action spaces which as you mentioned is a typical challenge in MARL.

Our primary objective is to exploit the computational redundancy in training MARL agents. Thus, we design the MAST framework to achieve this goal and aid various MARL algorithms specifically in sparse training scenarios. Existing works also focus on algorithmic FLOP reduction, such as (Sokar et al., 2022; Graesser et al., 2022; Tan et al., 2023). Our work, alongside these contributions, contributes to paving the way for more efficient and scalable multi-agent systems.

As sparse methods evolve in tandem with hardware co-design, the anticipated translation of FLOP reduction into wall-clock speedups becomes increasingly viable. Recent developments, such as specialized software kernels and dedicated hardware solutions, e.g., DeepSparse (NeuralMagic, 2021) and Cerebras CS-2 (Lie et al., 2022), as well as (Curci et al., 2021), signify promising strides toward realizing the benefits of unstructured sparsity during both training and inference stages. We believe this will be a very interesting future direction

(NeuralMagic et al., 2021) NeuralMagic. Deepsparse, 2021. URL https://github.com/neuralmagic/deepsparse.
(Lie et al., 2022) Lie, S. Harnessing the Power of Sparsity for Large GPT AI Models. https://www.cerebras.net/blog/harnessing-the-power-of-sparsity-forlarge-gpt-ai-models, 2022.

Q2: (minor) I suggest adopting a uniform citation style in order to improve related work chronological readability. Currently, some of the references are cited using the year of the first preprint release on arXiv, while others are cited using the official publication year.

Thank you for your suggestion. We will adopt a uniform citation style in our revision to improve the chronological readability of related works.

评论- Rebuttal acknowledgement

2024-08-13

Dear authors,

Thank you for your answers. I will keep my original rating (accept).

Best wishes,

评论- Thanks for Reviewer fbhH

2024-08-14

Thank you very much for your response. We really appreciate your time and effort in reviewing our paper.

最终决定Accept (poster)

2024-09-25

A somewhat complex extension to multi-agent RL training, but thoroughly executed. I concur with the reviewers' consensus that this paper can be accepted: while the two more lukewarm reviewers did not explicitly respond to the rebuttal, my read of their concerns and the authors' response is that all major points are addressed when taking the (extensive) appendix into account.