PaperHub
5.7
/10
Poster3 位审稿人
最低5最高6标准差0.5
6
5
6
3.0
置信度
正确性2.7
贡献度2.7
表达2.7
NeurIPS 2024

Autonomous Agents for Collaborative Task under Information Asymmetry

OpenReviewPDF
提交: 2024-05-14更新: 2024-11-06
TL;DR

This paper propose iAgents, a new LLM Multi-Agent framework where agents collaborate on behalf of human in the mirrored agent network and deal with information asymmetry problems.

摘要

关键词
autonomous agentsocial networklarge language model

评审与讨论

审稿意见
6

This paper focuses on the cooperation of LLM-based agents under the information asymmetry condition, which is a practical problem in the real world. It provides a clear definition of this new scenario. It proposes the method of InfoNav and mixed memory to improve the capability of agents. It constructs a new benchmark, which is the first one to evaluate the agent collaboration task under the information asymmetry scenarios. The results show the effectiveness of the proposed methods, and provide further analyses and discussions.

优点

  1. The information asymmetry scenario is interesting, and also practical in the real world. The authors also provide a great preliminary definition.
  2. The proposed methods are intuitive for solving this task, especially for the InfoNav and Mixed Memory.
  3. The proposed benchmark can contribute to the development of this field. It extends the tree structure into a graph structure, which is interesting for further research.

缺点

  1. I think the evaluation should compare iAgent with other methods, rather than only comparing iAgent with different LLM cores. The ablation model can be considered as one of them, but I think naive methods can be added as baselines, such as constructing a shared memory for all agents.
  2. The font size in figures can be enlarged to make them prettier.

问题

  1. I'm curious about the amount of memory in each agent. In other words, it is the number of words in fuzzy memory in each agent. The average and variance are helpful if they can be calculated, because the amount of memory can greatly influence the retrieval result.
  2. For the memory retrieval process, could it have a try on considering all the memory context as a part of the prompt to get the target memory entity? There are many long-context LLMs that can replace the conventional retrieval process (text embedding - cosine similarity calculation - top_k ranking) with prompting methods. You can try GPT-4, because it supports 128k contexts.
  3. Could you provide more insights and analyses on the memory part of the LLM-based agent?
  4. By the way, I think this communication task is highly related to the memory of LLM-based agents, where [1] also discusses about them. Maybe you can check it as a reference or provide some analysis between your task and this paper? I'm willing to improve my rating if the authors can address my concerns.

Refs:

[1] Zhang, Z., Bo, X., Ma, C., Li, R., Chen, X., Dai, Q., ... & Wen, J. R. (2024). A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501.

局限性

The authors have discussed the limitations in section 6.

评论

Discussion 1: Could you provide more insights and analyses on the memory part of the LLM-based agent?

Reply to Discussion 1: As mentioned above, the design of Mixed Memory aims at solving the information interaction between humans and agents. Unlike previous multi-agent systems that primarily address knowledge-intensive problems, iAgents emphasize solving information-intensive problems in environments with information asymmetry. The former requires the LLM itself to possess extensive knowledge to decompose complex issues and distribute them to various agents. In contrast, the latter emphasizes that agents can obtain, update, and exchange accurate and objective information in real-time, which is precisely the goal of iAgents memory design.

  1. As stated in lines 151-158 of the paper, Mixed Memory provides two different granularity levels of memory. Distinct Memory offers fine-grained, cross-session, objective and truthful memory retrieval to ensure accurate answers, while Fuzzy Memory provides coarse-grained, session-wise memory retrieval of information summaries, relaxing retrieval query conditions to offer a more comprehensive context. The combination of these two can help agents obtain more accurate, objective, and comprehensive information.
  2. Moreover, an effective memory mechanism not only depends on the construction of candidates in memory but also requires reasonable queries. iAgents allows agents to observe previous retrieval queries and results, and by combining infoNav to track the progress of overall collaborative tasks, it enables reactive adjustments to achieve better queries (lines 159-162).
  3. The memory format of Mixed Memory aligns with the design of iAgents using dialogues as a means of human information source. It can easily store dialogue information as a structured database or ANN database and can also generalize to other forms of human information. Our latest version of iAgents have integrated the Llama index and file management system, allowing various file formats to be used as human information to construct Mixed Memory, as Mixed Memory is related only to information granularity and retrieval methods, not the format of the information content.

Based on the experimental results, the problem setting of the FriendsTV dataset is relatively simple and does not require complex asymmetric environment logic reasoning. Answering questions in the given correct context is not difficult (while the Schedule dataset focuses more on reasoning). Therefore, the positioning information is crucial, which aligns with your judgment. For specific details, please refer to our ablation study in Section 6.2. Additionally, from a case-by-case analysis, errors often arise from the agent generating inaccurate queries, leading to the retrieval of incorrect context, which subsequently affects subsequent information interaction and reasoning.

评论

Discussion 2: can check it as a reference or provide some analysis between your task and this paper

Reply to Discussion 2: Based on the framework presented in the paper "A Survey on the Memory Mechanism of Large Language Model based Agents," the following conclusions can be drawn about Mixed Memory:

  1. On Why We Need the Mixed Memory: From the cognitive psychology perspective, iAgents require Mixed Memory because each round of communication involves a ReAct [1] process that combines reasoning and acting, necessitating working memory [2] for support. However, unlike the aim of designing memory capabilities akin to humans for agents to replace humans, our approach focuses on creating a society where agents and humans coexist harmoniously. Agents do not replace humans but serve them, and the value of agents lies in the information possessed by humans. Therefore, the design of Mixed Memory ensures that human information sources are accessed by agents only when they are authorized to do so. From the self-evolution perspective, Mixed Memory provides agents with substantial intermediate decision information for their decision trajectories, facilitating subsequent optimization based on feedback. The latest version of iAgents incorporates feedback functionality, allowing humans to rate each collaborative result from iAgents. This feedback, along with the entire trajectory, is stored in a database to enable cross-trials optimization. From the agent application perspective, iAgents cannot function without Mixed Memory (unless all information fits within the LLM context, such as small datasets like Schedule and NP). The goal of iAgents is to exchange information to solve problems, and this information resides within Mixed Memory.
  2. On How to Implement the Mixed Memory: The information sources of Mixed Memory include both Inside-trial Information and External Knowledge. Agents improve their current queries by observing previous queries and results, which falls under Inside-trial Information. The retrieved memory, sourced from human information, constitutes External Knowledge. Mixed Memory takes the form of textual data. Mixed Memory supports operations such as reading, writing, and managing memory. It reads memory through ANN and structured queries, writes memory in real-time when user sends new messages, and supports session-wise summarization and file management operations.
  3. On How to Evaluate the Mixed Memory: We evaluate the effectiveness of Mixed Memory indirectly by observing the final performance of iAgents on the informative bench. Please refer to Figure 5b for our ablation study on Mixed Memory.
  4. On the Limitations & Future Directions of Mixed Memory: Many perspectives mentioned in the survey align with ours, including advancing the research on Parametric Memory rather than relying solely on external memory. iAgents aim to experiment with Parametric Memory in future work due to the significant information loss associated with retrieval-based methods. Issues such as inaccurate queries, chunking, similarity metrics, and embedding models contribute to this loss. From a privacy perspective, we aspire for iAgents to be deployed on each user's device, enabling retrieval results without traversing specific information. Parametric Memory aligns with this need, as the forward pass in a network does not expose readable information. We aim to design components like LLM as Hard Disk or Agent as Hard Disk, which allow information retrieval directly through natural language, eliminating the errors introduced by the query/similarity/topk process. Additionally, Mixed Memory exemplifies Memory in LLM-based Multi-agent Applications, where iAgents, combined with the infoNav mechanism, achieve synchronization of information and memory among agents. Finally, Memory-based Lifelong Learning is also a goal for iAgents. The information humans possess is not limited to a few conversations required for a task, nor to 128k tokens. We need to further enhance memory design to enable each agent to accurately understand the lifelong information of its human users.

[1] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models.

[2] Baddeley, A. (1992). Working memory.

作者回复

Thank you very much for your outstanding insights and suggestions regarding agent memory! The reference you shared, "A Survey on the Memory Mechanism of Large Language Model Based Agents," has been particularly enlightening, and we will include it in our related works. Due to rebuttal length limits, we only reply to questions and leave the discussion (like insights on memory/analysis on the connection between this paper and survey) in official comments.

Firstly, we would like to clarify that this paper places relatively little emphasis on memory. As you have observed, our research aims to shift the focus of multi-agent research from studying a single entity to examining each individual. This requires us to address two main issues: first, the information exchange among agents, specifically the problem of information asymmetry, which is the primary focus of this paper; second, the information exchange between agents and humans, which involves memory, RAG, and other technologies, which is a huge topic and is the objective of our next research phase, while in this paper, we have only made an initial attempt at it.

Q1: evaluation should compare iAgent with other methods

A1: From the very beginning, we considered how to set the baseline. However, we found that,

  1. For the information asymmetry tasks in InformativeBench, the fully ablated iAgents is the most appropriate baseline.
  2. When we tried to modify other multi-agent frameworks to run InformativeBench, these frameworks essentially became iAgents.
  3. We also considered setting up information symmetry, such as constructing a shared memory for all agents as you mentioned, but this cannot serve as a baseline due to different settings (information symmetry/asymmetry). Our research focuses on solving collaboration problems under the premise of information asymmetry, rather than observing the changes before and after information becomes asymmetric. Our related work includes such studies that focus on observing changes [1].
  4. Moreover, we are the first to address this problem and have established the first benchmark. As such, there are no other suitable task-solving multiagent systems for information asymmetry to serve as a baselines (as of submission time of NeuralPS 2024). We also anticipate that InformativeBench will drive the development in this field, making iAgents the baseline for subsequent work.
  5. We use different LLM backends because InformativeBench is a benchmark, and we want to observe the performance of current state-of-the-art LLMs on this benchmark.

[1] Zhou, X., Su, Z., Eisape, T., Kim, H., & Sap, M. (2024). Is this the real life? is this just fantasy? the misleading success of simulating social interactions with llms. arXiv preprint arXiv:2403.05020.


Q2: font size too small

A2: Thank you! We will adjust and improve all the figures and tables in the paper for better reading.


Q3: amount of memory in each agent

A3: We present the statistic of memory on all characters and main characters (ross, rachel, joey, monica, chandler and phoebe). The full memory files are included in the submitted software, under the path of iAgents/memory/summary_by_scene_v3. The metrics exhibit significant variance because fuzzy memory summarizes information within a session, represented in the FriendsTV dataset as aggregation by scenes within episodes. Many scenes are very brief (possibly consisting of only one or two lines of dialogue), while some main scenes contain much more content, resulting in a high variance in the data.

All CharactersMain Characters
Avg #messages in a fuzzy memory4.764.85
Var #messages in a fuzzy memory22.0520.52
Avg #words in messages from a fuzzy memory79.8581.85
Var #words in messages from a fuzzy memory5287.195097.42
Avg #words of a fuzzy memory69.7371.07
Var #words of a fuzzy memory1490.691493.61

Q4: could it have a try on considering all the memory context as a part of the prompt

A4: We can indeed place the entire memory into the long-context of LLM. The issue is that the design of iAgents requires both agents to retrieve memory in each round of the conversation, which consumes a significant amount of input tokens. Please refer to Appendix E, the cost section. Even with retrieval, completing a FriendsTV sample requires over 45k tokens. If we do not perform retrieval and directly use all memory, the cost will increase by several orders of magnitude. Additionally, since real human information cannot be limited to 128k tokens, external databases are indispensable. Therefore, such an experiment was not designed from the beginning.

Although it is difficult to complete this experiment due to cost constraints, it is very important. This experiment can compare the information loss that occurs in the retrieval pipelines and the information loss that occurs in long-context reasoning [1]. In fact, you can refer to our experiments on two other datasets, NP and Schedule, in informativebench. These two datasets do not require memory construction as the agents acquire less information, and all information is directly placed in the context, which aligns with the experimental setup you mentioned. From the ablation experiments (Figure 5), we can see that under the premise of information asymmetry, we do not even need to examine the LLM's reasoning ability in long contexts. Its reasoning ability in short contexts (context examples of NP and Schedule can be seen in Figures 10, 11, 13, and 14, where the information seen by each agent usually does not exceed 1k tokens) is poor and needs improvement.

[1] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts.

I hope the above response addresses your concerns. If you have any further questions, please let us know. Thank you!

评论

Thanks for the detailed rebuttal by the authors. I would like to raise my score to 6.

评论

Thank you!

审稿意见
5

This paper studies the asymmetry of information handled by agents that represent users, i.e., each agent can only access the information of its human user, not others. To address this issue, the authors proposed Informative Multi-Agent Systems (iAgents) and a benchmark called InformativeBench.

优点

  1. The paper is clearly written and easy to follow.
  2. The paper is well-motivated and studies the more practical use case of LLM agents as a society.
  3. The proposed benchmark can encourage further studies on similar problems.

缺点

  1. The ablation studies are not very complete and some baselines are missing. Specifically, it seems to me that InfoNav benefits the most through the recursive communication module. However, the ablation only compares the results with and without communication modules. Have the authors experimented with other naive baselines such as each agent is simply an LLM that only maintains its own memory with recursive communication?
  2. Why the performance of InformativeAgents in Figure 5 does not align with the performance of GPT 3.5 in Table 1?
  3. It is not very clear how the dataset, e.g., FriendsTV, is collected. Appendix H.1.2 seems to contain only the post-processing pipeline of the raw dataset.

问题

  1. Could the authors provide the number of tokens needed for each base model to run InformativeBench?

局限性

Yes.

作者回复

Thank you very much for your careful review. Below is a detailed point-by-point response addressing your main concerns.

Q1: InfoNav benefits the most through the recursive communication

A1: InfoNav and recursive communication are two parallel designs within iAgents, and there is no situation where one benefits from another. The former refers to the mechanism of how agents communicate with each other, while the latter refers to whether the agents' communication can spread within a social network. Additionally, please refer to Figure 5, where recursive communication on the NP dataset brought about a 3% improvement, which is less than the 12% improvement brought by InfoNav. On the FriendsTV dataset, recursive communication showed a greater improvement than InfoNav. Therefore, these two designs perform differently on different datasets and one design does not depend on the other. We have analyzed this in our paper (lines 221-225). The original intent of the recursive communication design is to demonstrate the scalability of iAgents, enabling active and widespread communication in large social networks. Hence, in a large social network dataset like FriendsTV, it brings significant performance improvements, while for the other two datasets that focus more on reasoning, InfoNav is indispensable (lines 226-232).


Q2: ablation studies are not very complete

A2: Ablation experiments in Figure 5 are divided into two parts, Figure 5a and Figure 5b. The former specifically shows the ablation of the infoNav mechanism, while the latter shows the ablation of other mechanisms. This is because the experimental settings differ for different datasets (see section G in appendix):

  1. For all datasets, iAgents have the infoNav mechanism enabled, so a complete ablation experiment was conducted on all datasets.

  2. For the NP and Schedule datasets, as the information obtained by each agent is little and can be directly stored as LLM's in-context information, iAgents did not activate the mixed memory mechanism, making the ablation of memory unnecessary.

  3. For the Schedule dataset, due to the setup of questions and social relationships in this dataset, recursive communication does not bring additional information. Therefore, iAgents did not enable the recursive communication mechanism, making the ablation of recursive communication unnecessary.

So the ablation experiments of iAgents on all datasets are complete.


Q3: Have the authors experimented with other naive baselines such as each agent is simply an LLM that only maintains its own memory with recursive communication?

A3: I do not quite understand this baseline. A baseline comparison with agent equipped with memory, recursive communication but without the InfoNav communication mechanism is exactly the first set of experiments from the left in Figure 5a, which is the ablation of InfoNav on FriendsTV (35.71 -> 34.92). We have analyzed it in the paper (lines 221-225). If you can provide a more specific description, we can better address your concerns.


Q4: Why the performance of InformativeAgents in Figure 5 does not align with the performance of GPT 3.5 in Table 1?

A4: the performance of iAgents in Figure 5 is (from left to right):

Figure 5a: 35.71 (FriendsTV), 51.00 (NP), 36.67 (ScheduleEasy), 18.00 (ScheduleMedium), 12.25 (ScheduleHard)

Figure 5b: 35.71 (FriendsTV), 35.71 (FriendsTV), 35.71 (FriendsTV), 51.00 (NP)

The performance of iAgents using GPT 3.5 in Table 1 is (from left to right):

51.00 (NP), 35.71 (FriendsTV), 36.67 (ScheduleEasy), 18.00 (ScheduleMedium), 12.25 (ScheduleHard)

and the numbers are completely aligned. We would be grateful if you can specify which particular number is misaligned, and we will recheck it. If the misunderstanding arises from the inconsistent ordering of datasets in Table 1 and Figure 5, we will further optimize and unify the dataset ordering in the tables and figures.


Q5: It is not very clear how the dataset, e.g., FriendsTV, is collected

A5: As you mentioned, we provide the complete pipelines for constructing the FriendsTV dataset within InformativeBench from the original FriendsQA dataset in Appendix H and submitted code. On the collection of the original data, since FriendsTV is constructed based on FriendsQA, please refer to the original paper of FriendsQA [1]. We will add all details including the collections method from the original paper in the camera-ready version. In brief:

  1. The context, questions, and answers were manually annotated through crowdsourcing by the authors of the original paper. This was a remarkable project that spanned several years [2]. We are very grateful for the contributions of the authors of the original data paper and cited them in the paper.

  2. The original Friends script is publicly available online and can be accessed through multiple channels, such as Kaggle [3].

[1] Yang, Z., & Choi, J. D. (2019, September). FriendsQA: Open-domain question answering on TV show transcripts.

[2] https://www.emorynlp.org/projects/character-mining

[3] https://www.kaggle.com/datasets/gopinath15/friends-netflix-script-data


Q6: Could the authors provide the number of tokens needed for each base model to run InformativeBench?

A6: Please refer to Appendix E, the section on costs, and Appendix F, the fifth point in the section on limitations. We have provided the average token consumption of all models. The token consumption numbers vary little across different backend LLMs. iAgents handle information-intensive tasks that consume more input tokens. The unit price of input tokens is significantly lower than that of output tokens, and with the continuous development of Long-context LLM, the cost of iAgents will decrease progressively.

I hope the above response addresses your concerns. If you have any further questions, please let us know. Thank you!

审稿意见
6

The paper presents an innovative approach to addressing the challenge of information asymmetry in multi-agent systems (MAS), a barrier to effective collaboration in various tasks. The paper introduces iAgents (Informative Multi-Agent Systems), designed to navigate and mitigate information asymmetry by enhancing the communication and information exchange capabilities of the agents within a system.

优点

To me, this paper features the following strengths:

  1. The InfoNav mechanism for guiding agent communication towards effective information exchange is well-conceived. This structured approach to agent reasoning and communication is an important contribution.
  2. The development of InformativeBench as a benchmark for evaluating task-solving ability under information asymmetry is remarkable, which provides a standardized way to measure the effectiveness of relevant systems.
  3. The experiments are well conducted, demonstrating that iAgents can handle complex networks and large volumes of information efficiently to some extent.

缺点

  1. While the paper mentions the several limitations of previous multi-agent system approaches (especially regarding the ability of handling information asymmetry), a more detailed comparative analysis of iAgents with existing methods would strengthen the argument for its superior performance.
  2. The proposed mechanism lacks theoretical foundations or analysis which principally shows that iAgents does improve the agents' ability of information exchange in the face of asymmetry under certain assumptions or specific situations.

问题

  1. How does the newly proposed iAgents mechanism ensure data privacy during information exchange?
  2. Is it possible to further develop any kind of theoretical analysis of the proposed iAgent system with InfoNav and Mixed Memory?

局限性

Please refer to the above weakness and question section.

作者回复

Thank you for your thorough review and feedback. Below is a detailed point-by-point response addressing your main concerns. Due to rebuttal length limits, we only reply to questions and leave the discussion (like a theoretical discussion on iAgents) in official comments.

Q1: a more detailed comparative analysis of iAgents with existing methods would strengthen the argument for its superior performance.

A1: As you have pointed out, this paper raises the "limitations of previous multi-agent system approaches regarding information asymmetry," and it is exactly these limitations that prevent us from using previous multi-agent systems as baselines in our InformativeBench. Specifically,

  1. Previous multi-agent systems work under the assumption of information sharing, meaning that information is symmetric among all agents. These multi-agent systems cannot function in the asymmetric information tasks of InformativeBench.

  2. Of course, we can make the necessary modifications to the previous multi-agent systems, such as allowing each agent to observe only partial information and encouraging them to actively query information and exchange information with each other so they can run in InformativeBench. However, the system modified in this way is essentially iAgents.

  3. Furthermore, for the NP and Schedule datasets, iAgents do not need to use Mixed Memory (lines 633-634). Therefore, in the ablation experiments on these two datasets (lines 226-232), iAgents without the infoNav mechanism are essentially Vanilla Communicative Agents [1,2] baselines without Mixed Memory and InfoNav. As shown in Figure 5a, compared to such a baseline, iAgents exhibit performance increases ranging from 15% to 26%.

  4. Moreover, we are the first to address this problem and have established the first benchmark. As such, there are no other suitable task-solving multiagent systems for information asymmetry to serve as baselines (as of the submission time of NeuralPS 2024).

[1] Li, G., Hammoud, H., Itani, H., Khizbullin, D., & Ghanem, B. (2023). Camel: Communicative agents for" mind" exploration of large language model society.

[2] Qian, C., Cong, X., Yang, C., Chen, W., Su, Y., Xu, J., ... & Sun, M. (2023). Communicative agents for software development.


Q2: How does the newly proposed iAgents mechanism ensure data privacy during information exchange?

A2: For iAgents, privacy is an unavoidable issue. Please refer to our experiments and discussions on Privacy in Section 6.4 "Analysis on Real World Concern" (lines 277-288) and the discussions in Appendix F "Limitations" (lines 507-521). In summary, we discuss the privacy and security issues of iAgents at three levels:

  1. Privacy Level L1: Users accept providing the necessary information to use iAgents for the whole cooperation process. As mentioned in our paper, absolute privacy protection equates to non-cooperation. Thus, there is a trade-off between the level of privacy protection and the degree of automation in cooperation. If users fully accept iAgents accessing their personal information, they can achieve maximum efficiency in automated cooperation with agents. Even so, iAgents can still offer users settings to control access permissions, such as allowing iAgents to access only specific information authorized by the user for different communication objects. This will balance the user's privacy and cooperation efficiency to the greatest extent (lines 282-288).

  2. Privacy Level L2: Users accept using iAgents for automated communication but wish to keep their personal information private. Under this privacy level, the distributed design of iAgents allows users to deploy private agents on their own devices (edge-side) to handle information exchange between humans and agents, while the information exchange between agents can be handled by cloud-side agents driven by large-scale LLMs. This cloud-edge design paradigm allows iAgents to handle privacy issues flexibly.

  3. Privacy Level L3: Users want iAgents to protect privacy to the maximum extent throughout the process, both in terms of accessing personal information and communication between agents. For this strictest requirement, based on Level L2, we can have the communication between agents completed by agents deployed on private servers or solely by agents driven by small on-device models. Since iAgents are designed for information-intensive tasks with asymmetric information rather than knowledge-intensive tasks, the success rate of task completion mainly depends on whether information acquisition and interaction are sufficient, rather than the knowledge memory ability of LLMs themselves. Therefore, it can be implemented with small edge-side LLMs. Additionally, in Section 6.4 of the paper, we also attempted some prompt experiments where iAgents can securely complete cooperative tasks using vague references without leaking additional information.

I hope the above response addresses your concerns. If you have any further questions, please let us know. Thank you!

评论

Thank you very much for your very detailed responses to my concerns and the potential discussions. I currently have no further questions. I would maintain my score and advocate for acceptance.

评论

Thank you!

评论

Discussion: Is it possible to further develop any kind of theoretical analysis

Reply to Discussion: We are delighted that you are interested in the theoretical foundations and developments of multi-agent collaboration under information asymmetry! Due to that we need pages to introduce new problems, new benchmarks, and new methods, we do not have space in the main body of the paper to provide a detailed theoretical exposition. Instead, we have cited relevant foundational literature and provided brief explanations (lines 32-34, 79-85, 116-140, 107-110). Here, we offer a more detailed introduction to the theoretical foundations of iAgents:

  1. iAgents are a class of communicative agents [1,2], modeling the communication between agents as a Markov Decision Process. The agent’s actions consist of generating each utterance in the communication, and the state represents the progress of the current task (lines 116-140). For any given agent, its environment comprises the responses of other agents it is communicating with, which is why information asymmetry arises: each agent could partially observe the environment, as it can only perceive the utterances of other agents not the entirety of the information they possess.

  2. Furthermore, we model the agents' communication as a ReAct [3] process (lines 100-106), incorporating reasoning and acting into communicative agents. Thus, like ReAct, the theoretical foundation of iAgents is rooted in cognitive science, including inner speech [4], strategization [5], and working memory [6]. Building on ReAct, iAgents introduces the process of reasoning and acting into two types of information interactions (lines 107-110): interactions between agents and humans and interactions among agents themselves.

  3. The above points cover the theoretical foundation of iAgents. As for the issue of information asymmetry, its theoretical basis can be traced to two origins. One comes from the Agent Modeling Agent [7] research in the field of Multi-Agent Reinforcement Learning (MARL), where agents, under the constraints of a partially observable environment, model the intentions of other agents to maximize their own utility despite imperfect information. The other aspect derives from the theory of mind [8] (lines 32-34), where agents learn to model the high-order mental states of other agents. iAgents draw on research from these two fields, proposing not only that agents model other agents but also introducing the infoNav mechanism, which explicitly maintains the communication state between agents which fosters effective collaboration among agents under conditions of information asymmetry.

[1] Li, G., Hammoud, H., Itani, H., Khizbullin, D., & Ghanem, B. (2023). Camel: Communicative agents for" mind" exploration of large language model society.

[2] Qian, C., Cong, X., Yang, C., Chen, W., Su, Y., Xu, J., ... & Sun, M. (2023). Communicative agents for software development.

[3] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models.

[4] Alderson-Day, B., & Fernyhough, C. (2015). Inner speech: Development, cognitive functions, phenomenology, and neurobiology.

[5] Fernyhough, C. (2010). Vygotsky, Luria, and the social brain. Self and social regulation: Social interaction and the development of social understanding and executive functions.

[6] Baddeley, A. (1992). Working memory.

[7] Raileanu, R., Denton, E., Szlam, A., & Fergus, R. (2018, July). Modeling others using oneself in multi-agent reinforcement learning.

[8] Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind?

最终决定

This paper introduces an innovative approach to addressing the challenge of information asymmetry in multi-agent systems (MAS), particularly in scenarios where agents can only access the information of its human user with limited access to each other's information. The proposed framework, iAgents, leverages InfoNav and Mixed Memory mechanisms to enhance agent communication and information exchange under these constraints. The authors also introduce InformativeBench, the first benchmark tailored to evaluating MAS under information asymmetry. The iAgents framework represents a significant step forward in the design of multi-agent systems capable of operating under information asymmetry.

The proposed approach is intuitive and innovative. It closely aligned with practical applications and presents greater challenges compared to knowledge-intensive tasks. The experiments conducted are robust, demonstrating the effectiveness of iAgents in handling complex networks and large volumes of information efficiently. The introduction of InformativeBench as a benchmark for evaluating task-solving ability under information asymmetry is a notable contribution. The paper also provides a solid theoretical basis for the proposed framework. A thorough discussion on the theoretical side is also posted during the rebuttal phase.

The novelty of the approach, combined with its solid theoretical grounding and practical applicability, makes this paper a valuable contribution to the field. Therefore, I recommend its acceptance.