Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making
We address the challenge of explaining the total counterfactual effect of an agent’s action on the outcome of a realized scenario in multi-agent Markov decision processes.
摘要
评审与讨论
This paper proposes an approach to decomposing the counterfactual effect of the actions taken by agents comprising a multi-agent Markov decision process into the effect arising directly from the other agents' resultant behaviours, and the effect coming from the resultant environmental dynamics. This provides a way to explain counterfactual outcomes in multi-agent sequential decision making settings.
优点
The paper is well-written and covers a topic that I see will be very useful in the study of multi-agent systems. Using causal inference methods to understand the importance of agent decisions in complex multi-agent systems is something that will help scientists to reason about such systems. The authors also provide a good discussion of the relation of their approach to a variety of pre-existing works and notations of counterfactual effect decomposition (causal mediation formula etc.).
缺点
-
Some of the notation could be better explained. For example (and sorry if I missed this explanation somewhere in the paper), for someone like me – where I am perhaps an example of someone who is not fully a member of this particular research subfield – it wasn't clear to me why it is necessary to write and why someone couldn't instead just write . Having consulted other papers (e.g., some of the references in the current paper), it does indeed seem like the former is standard notation, but it might benefit the reader and make the paper more accessible to those not directly in this research subfield to explain what exactly the subscripts mean in these expectations and why it is necessary to both "condition" on and include a subscript in some of these expectations.
-
Both experiments considered only two interacting agents simulated over relatively short time horizons, and the approach isn't demonstrated on larger-scale systems. I appreciate that this is explicitly discussed as a limitation in the Discussion section, however.
问题
-
Could you please clarify the notation I mentioned in Weakness 1) above? (Just to be sure I understood the paper as I think I did!)
-
To what extent does the complexity of the agents themselves contribute to the complexity of computing these scores? Some multi-agent systems consist of large populations of boundedly rational agents making decisions according to relatively simple rules, or according to "satisficing" criteria. This is the other extreme to the examples considered in this paper, which consists of a very small number of agents, but who have good domain expertise/cognitive capabilities (e.g., your LLM-assisted RL agents). Does using agents with simpler cognitive rules mean it is easier to scale up these scores to larger numbers of agents?
Thank you for your valuable feedback and positive score. We are glad to see that you find our paper well-written and useful for multi-agent systems, and our discussion of related work good. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
Some of the notation could be better explained. For example ... in some of these expectations. Could you please clarify the notation I mentioned in Weakness 1) above? (Just to be sure I understood the paper as I think I did!)
Thank you for your feedback (from Weakness 1) and your question. Our intention is indeed to make the paper accessible to researchers who may not be fully immersed in the intersection of counterfactual reasoning and decision-making. While we did explain the meaning of the subscript in Section 2.3 of the original manuscript, we recognize that we did not explicitly clarify the notation as you pointed out. This was an oversight, and we have now addressed it in the revised manuscript. Thank you for bringing this to our attention.
Regarding your clarification question, denotes the counterfactual probability of taking the value in the modified model , conditioned on having been realized in . This is equivalent to . We choose the former notation because we believe it allows for a more clear distinction between quantities such as (e.g., in Definition 3.1) and (e.g., in Definition 3.2).
Additionally, addressing a related comment from Reviewer N9FM, we have also included a table in Appendix B summarizing all the causal quantities and key notations introduced in our paper to further enhance its clarity and accessibility.
Both experiments considered only two interacting agents simulated over relatively short time horizons, and the approach isn't demonstrated on larger-scale systems. I appreciate that this is explicitly discussed as a limitation in the Discussion section, however.
Thank you for your feedback. We recognize that evaluating and improving the scalability of our approach would be important for its practical applicability in systems with larger numbers of agents and longer horizons. However, we view this as an independent research challenge that, while important, lies somewhat outside the main scope of this paper. Our primary focus here is to provide an interpretable solution to the problem of counterfactual effect decomposition in multi-agent MDPs.
Even though, there are plenty of interesting multi-agent settings with only a few agents, e.g., AI assistants or AI supervision, we definitely believe that empirically evaluating, within our framework, the mitigation strategies against computational complexity proposed in our Appendix, constitutes a meaningful future research direction.
Lastly, as a point of clarification, our Gridworld environment involves three agents: an LLM Planner and two RL actors.
To what extent does the complexity of the agents ... it is easier to scale up these scores to larger numbers of agents?
This is a very interesting point of discussion. The answer to your question is yes, if increased agents' capabilities implies increased inference time. The reasoning is the following: our approach to estimating counterfactual effects involves sampling trajectories from the posterior distribution and then averaging the values of the response variable across these trajectories. Sampling a trajectory from the posterior distribution generally requires to prompt each agent once for every counterfactual state in which they need to act.
For reference, in the Gridworld environment, more than % of the time required to sample one counterfactual trajectory is spent on the inference of the LLM agent, while the remaining % is shared between the two RL agents (raw values and additional details on the time compute of our experiments are included on the README file of our code). Consequently, if we were to use an LLM agent with reduced cognitive capabilities, and hence less inference time, then the scalability of our approach in this experiment would significantly improve.
We have added this discussion to Appendix I of our revised manuscript. Thank you for the insightful observation.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
Thanks for your response. I think adding a table to summarise notation is a good idea and will help readability a lot. And I agree that the additional discussion point is interesting and will be helpful for the readers, so thank you for adding this. I'm happy to maintain my positive score with these additions.
Thank you for your kind response and maintaining your positive score. We are glad to see that you find our additions helpful for the readers.
This paper presents a causal effect attribution approach that decomposes the total counterfactual effect into an agent-specific effect and an inverse state transition effect within the framework of a multi-agent Markov Decision Process (MDP). The authors further refine this decomposition at a granular level to attribute causal effects to individual agents and states. Comprehensive simulations are provided to illustrate the proposed decomposition formula, with detailed discussions on future directions.
Overall, the paper is well-written and well-structured, with ideas that are direct and easy to follow, supplemented by clear examples. While there are minor concerns regarding the degree of innovation and scalability, this work nonetheless advances the understanding of causal effects in complex multi-agent sequential decision-making systems, which I believe will be valuable for guiding future research in decision-making.
优点
-
The motivation for this work is clear, with significant relevance to causal attribution in multi-agent MDPs.
-
The theoretical decomposition is accompanied by detailed explanations, and I appreciate the authors' efforts in providing sufficient context in Sections 4 and 5.
缺点
-
The work appears to be a direct extension of Triantafyllou et al. (2024) in decomposing the agent-specific effect and Janzing et al. (2024) in decomposing the reverse state-specific effect. The main innovation of this work, as I understand, is the extension to a multi-agent MDP system. However, it is not clear which specific challenges of the multi-agent MDP framework are addressed that go beyond these previous works. For instance, if the primary challenge arises from the sequential decision-making nature of MDPs, it would be beneficial to further emphasize how the authors handle these complexities.
-
The paper addresses the computational complexity of counterfactual effect decomposition with a large number of agents and time steps, and I appreciate the discussion in Appendix H on approximation and scalability improvements. However, the simulations in the main paper are limited to two agents. It would be informative to see how the method performs with more agents (e.g., or ) and how the results might differ. Additionally, has the performance of the approximation methods in Appendix H been empirically evaluated, or is it based solely on theoretical considerations? Some empirical analysis of the time-accuracy tradeoff would strengthen the discussion.
问题
-
In Lines 242-250, the authors discuss connections between this work and mediation analysis in causal inference, where the total effect is typically decomposed into direct and indirect effects. In the multi-agent MDP setting, however, the total effect is expressed as a difference (rather than a sum) between the ASE and SSE. Could the authors elaborate on how this decomposition aligns with classical mediation analysis? Is the similarity primarily conceptual (i.e., decomposing into two parts), or is there a more detailed mapping between single-stage mediation analysis and the multi-agent MDP framework?
-
The proposed causal explanation framework relies on counterfactual estimation under a fixed action assignment sequence, which in turn requires accurate and consistent counterfactual estimates with minimal variance. In many real-world applications involving only observational data, would it be overly challenging to apply this decomposition using off-policy evaluation methods? I am curious to hear the authors’ thoughts on the feasibility of such approaches.
Thank you for your valuable feedback and positive score. We are glad to see that you find our paper well-written and well-structured, and our work clearly motivated and with significant relevance to causal attribution. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
The work appears to be a direct extension ... it would be beneficial to further emphasize how the authors handle these complexities.
Thank you for your comment, please let us address it step by step.
Total agent-specific effect. Our approach to decomposing the total ASE, termed ASE-SV, builds on the notion of agent-specific effects introduced by [5]. Fortunately, ASE is already well-defined in the MMDP setting, allowing us to leverage this concept directly. However, it is crucial to clarify that simply summing the agent-specific effects of individual agents does not yield the total ASE. In fact, our experiments in the Sepsis environment reveal discrepancies of up to 95% in certain scenarios. To derive an efficient attribution method, we formulated the problem as a cooperative game [7]. This formulation enabled us to define Shapley value in the setting of agent-specific effects. Furthermore, in Section 5 and Appendix E, we explain and formally define in this context a set of well-known desirable properties uniquely satisfied by Shapley value.
Reverse state-specific effect. Our approach to decomposing r-SSE, termed r-SSE-ICC, builds on the concept of intrinsic causal contributions (ICC) introduced by [8]. ICC is an information-theoretic measure quantifying how much an observed variable contributes to the uncertainty of another variable in a general SCM. We argue that extending ICC to address the attribution of uncertainty in a complex counterfactual effect, such as r-SSE, to the state variables of an MMDP-SCM is not a straight-forward process. A challenge specific to MMDP-SCMs, for instance, is that apart from state variables also action variables may directly or indirectly influence the variation in r-SSE.
General comment on contributions. We acknowledge that the theoretical results in this paper, i.e., Theorem 3.3 and Theorem 4.2, are not challenging to prove with the appropriate technical background. However, we view our work more as foundational, addressing a newly proposed and complex problem that is highly relevant to accountability in multi-agent AI systems. The primary contribution of this work lies in integrating concepts from a wide range of fields, including multi-agent sequential decision making (MMDPs), counterfactual reasoning (ASE), mediation analysis (Theorem 3.3), cooperative game theory (ASE-SV), information theory (r-SSE-ICC) and more, to tackle this novel problem. Finally, the empirical evaluation of our proposed solution demonstrates that it is interpretable and aligns with standard intuitions. This validates the potential of our approach to inform future research in this area.
The paper addresses the computational complexity ... time-accuracy tradeoff would strengthen the discussion.
First, we would like to thank you for taking the time to review the (optional) Appendix and for your feedback. We recognize that evaluating and improving the scalability of our approach would be important for its practical applicability in systems with larger numbers of agents and longer horizons. However, we view this as an independent research challenge that, while important, lies somewhat outside the main scope of this paper. Our primary focus here is to provide an interpretable solution to the problem of counterfactual effect decomposition in multi-agent MDPs.
To address the second part of your question, the mitigation strategies proposed in our Appendix are grounded in theoretical considerations and have not yet been tested empirically within our setting. Even though, there are plenty of interesting multi-agent settings with only a few agents, e.g., AI assistants or AI supervision, we definitely see testing these strategies empirically as a meaningful future research direction.
Lastly, as a point of clarification, our Gridworld environment does involve three agents: an LLM Planner and two RL actors.
In Lines 242-250, ... mapping between single-stage mediation analysis and the multi-agent MDP framework?
Thank you for your question. To address it, first, let us clarify a potential misunderstanding. Formally, according to the causal mediation formula, the total causal effect of a treatment on outcome , relative to a reference value , is equal to the difference, and not the sum, between the natural direct effect of on , relative to , and the the natural indirect effect of on relative to . The latter is also referred to as the reverse transition in the literature [4].
In our paper, we draw a connection between our proposed explanation formula and the standard causal mediation formula, based on the fact that they both follow a similar principle in decomposing total effects. However, we would like to emphasize that we do not claim the existence of a direct mapping between the individual components of the two approaches. The causal mediation formula analyzes the propagation of an effect through causal paths. In contrast, our approach analyzes the propagation of an effect through its influence on the agents' behavior and the environment dynamics of an MMDP. Since the two frameworks are fundamentally distinct, we do not believe a direct mapping between their components exists.
The proposed causal explanation framework relies on ... I am curious to hear the authors’ thoughts on the feasibility of such approaches.
Thank you for your question. It is true that causal assumptions, such as weak noise monotonicity, albeit necessary for counterfactual identifiability may indeed restrict the applicability of approaches to effect decomposition in more practical domains with access only to observational data. This is a common trade-off in counterfactual reasoning, as highlighted in the references from our Discussion section.
In response to yours and similar comments, made by other reviewers, we have conducted additional experiments to evaluate the robustness of our results to the noise monotonicity assumption. A concise summary of these experiments is included in our General Comment (Additional Experiments Evaluating Robustness to Noise Monotonicity), with the complete set of results detailed in Appendix M of the revised manuscript.
The results of these experiments showcase that our conclusions from Section 6.2 are rather robust to violations of noise monotonicity (especially conclusions related to the average percentage decomposition). This is encouraging, as it provides evidence supporting that our causal explanation formula can yield valuable insights in simulated domains where: (a) effect sizes and randomness reflect real-world settings [9], (b) only observational data are available, and (c) our theoretical assumptions might not hold, as it often happens in practice.
We would like to conclude by acknowledging that the accuracy of counterfactual estimates can be unstable in scenarios where causal assumptions do not hold. Thus, we advocate for performing thorough sensitivity analyses of such assumptions before applying approaches like ours not in simulations but in the real world, especially in high-stakes domains. We deem this to be an exciting as well as meaningful future research direction.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
I appreciate the authors' detailed responses to my questions. The responses helped alleviate my concerns about practical application limitations, such as those related to decision-making levels or scenarios involving a larger number of agents (though I still hold some reservations, as this is a complex challenge that cannot be fully addressed in a single paper). Nevertheless, this paper provides a novel perspective on decomposing total counterfactuals, supported by comprehensive studies in both simulations and real data analysis. The majority of my questions have been addressed, and I therefore maintain my positive score for this paper.
Thank you for your response. We are glad to see that our responses helped alleviate your concerns about practical applications and addressed most of your questions.
Please let us know if there is any other concern that we could address, which could potentially help increase your score. We mainly ask you this question, because you mentioned that your concerns were alleviated and most of your questions were addressed. From your review (especially the second question) and your high confidence score, we gather that you have a good understanding of the field and of our paper. Any additional feedback would be highly appreciated.
We understand that you have already spent substantial amount of time on reviewing this paper, and we appreciate that. If you decide not to respond further, that is entirely understandable. Thank you again for your valuable feedback.
This paper proposes new counterfactual measures that can be used to explain the observed behaviors of multi-agent systems. First, the authors proposed the total counterfactual effect, which measures the difference in the expected reward if every agent took a different action compared to the observed trajectories (as a baseline). The authors then decomposed the total counterfactual effect into two components: one measuring the effect that propagates through all subsequent agents’ actions, called the total agent-specific effect, and another related to the effect that propagates through the state transitions, called the reverse state-specific effect. Building on the method of path analysis in structural causality, the authors then further decomposed the total agent-specific effect into a series of more refined agent-specific effects. Counterfactual measures based on the Shapley value are proposed to attribute the effect to each individual agent. Similarly, the authors also proposed more refined counterfactual measures, based on the notion of intrinsic causal contributions, to decompose the reverse state-specific effects. These proposed counterfactual measures could be seen as explanations evaluating the influence of underlying causal mechanisms on the difference in the observed outcome.
优点
- The proposed measures seem reasonable. All parametric assumptions have been explicitly stated. The proposed methods seem technically sound.
- Comprehensive simulations were performed. The results support the proposed algorithms.
缺点
- The identification of the proposed counterfactual measures relies on the additional parametric assumption of weak monotonicity. This assumption is restrictive and may limit the paper's application to many practical domains.
- The paper is quite notation dense. It would be best improved by including a table summarizing all the proposed measures and necessary notations.
- The introduction of each new counterfactual measure could be more gentle. This might be challenging for readers unfamiliar with counterfactual notations and structural causality. I suggest including a detailed example following each definition of the proposed counterfactual measure.
问题
- Can the authors provide more detailed examples for each proposed measure? For instance, can the authors explain these measures and the decomposition in the context of Figure 1 or Figure 2a? What causal mechanisms are these metrics evaluating, and what are their implications?
Thank you for your valuable feedback and positive score. We are glad to see that you find our approach reasonable and technically sound, and our simulations comprehensive and supporting our algorithms. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
The identification of the proposed counterfactual measures relies on the additional parametric assumption of weak monotonicity. This assumption is restrictive and may limit the paper's application to many practical domains.
We acknowledge that causal assumptions, such as weak noise monotonicity, albeit necessary for counterfactual identifiability may indeed restrict the applicability of our effect decomposition approach in more practical domains. This is a common trade-off in counterfactual reasoning, as highlighted in the references from our Discussion section.
We have conducted additional experiments to evaluate the robustness of our results to the noise monotonicity assumption. A concise summary of these experiments is included in our General Comment (Additional Experiments Evaluating Robustness to Noise Monotonicity), with the complete set of results detailed in Appendix M of the revised manuscript.
The paper is quite notation dense. It would be best improved by including a table summarizing all the proposed measures and necessary notations.
Thank you for your suggestion. In response, we have added a table in Appendix B of the revised manuscript that summarizes all the introduced causal quantities along with the key notations used in the paper. Additionally, we have included a reference to this table in the main text to guide readers more effectively.
The introduction of each new counterfactual measure ... of the proposed counterfactual measure. Can the authors provide more detailed examples for each proposed measure? For instance, can the authors explain these measures and the decomposition in the context of Figure 1 or Figure 2a? What causal mechanisms are these metrics evaluating, and what are their implications?
Thank you for your suggestion. We would like to kindly point out that our (submitted) manuscript already explains in Section 6.1 all the introduced counterfactual measures, alongside the metrics they are evaluating, in the context of Figure 2a, particularly in the paragraph titled Counterfactual effects. Additionally, in the subsequent paragraphs of the same section we present the outputs of our decomposition formula for the considered simulated scenario.
In response to a related comment made by Reviewer NMck, we have included in our revised manuscript a new paragraph at the end of Section 3, where we revisit and highlight the subtleties of all causal quantities from that section.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
Thank you once again for your valuable feedback. As the deadline for revising our paper is approaching, we would like to check if you had any additional comments or questions. We hope that our rebuttal addresses your concerns and we would be happy to answer any further questions you might have.
I thank the authors for their responses. My concerns regarding the Noise Monotonicity assumption remain. This paper could be most improved by employing a partial identification approach (Manski, 1990). Indeed, there exists a canonical family of generative models that can represent all counterfactual distributions in a causal model (Zhang et al., 2022). A reliable bound over target counterfactual effects can be obtained by optimizing over the feasible region over candidate canonical models compatible with the observational data. Due to these reasons, I am keeping my current score.
- Zhang, Junzhe, Jin Tian, and Elias Bareinboim. "Partial counterfactual identification from observational and experimental data." International conference on machine learning. PMLR, 2022.
Thank you for your response and insightful suggestion.
Thank you once more for your valuable feedback. Following up on your suggestion, we have carefully reviewed the paper you provided. The proposed approach is indeed compatible with our setting, where endogenous variables are assumed discrete and finite. However, deriving exact bounds in our context, following this approach, appears to be infeasible, particularly in environments with large state and/or action spaces, such as Sepsis. This highlights the need for efficient algorithms to approximate bounds on our target counterfactual effects.
One potential solution to this challenge is the MCMC algorithm proposed by (Zhang et al., 2022). While this approach seems reasonable and promising, to the best of our knowledge, it has not been empirically tested in more complex frameworks like the MMDP-SCM described in our paper. Without further investigation, we cannot say with certainty whether this method would scale effectively and thus be able to support our decomposition formula in practice.
Nonetheless, we find the work from (Zhang et al., 2022) to be a compelling alternative to our effect estimation approach in scenarios where: (a) assuming Noise Monotonicity might not be practical, and (b) informative bounds over our target counterfactual effects can be drawn efficiently from observational data.
In the next iteration of our paper, we plan to include this discussion in Section 7. Thank you once again for your insightful feedback and for pointing us toward this valuable direction.
This paper proposes a causal explanation framework for multi-agent MDPs, by decomposing the total counterfactual effect of an intervention into a part that influences the agents’ behavior and and a part that depends on environment dynamics.
优点
- Exposition is clear.
- Good use of the sepsis example, which helps contextualize the abstract math throughout the text .
- Principled development of mathematical tools to achieve the desired effect decomposition.
缺点
- The mathematical notation are quite dense and at times a little hard to grasp. For example, Def 2.1, 3.1, 3.2 all look similar but clearly they are different. The authors included textual descriptions of these mathematical definitions (individually) which is great, but I would appreciate a paragraph comparing these definitions and highlighting the differences (for example, Y vs Y_{a_{i,t}}).
问题
- Percentage conversion: Theorem 3.3 suggests the "total counterfactual effect" can be decomposed into "total agent-specific effect" minus "reverse state-specific effect". How are the two parts converted to a percentage contribution, for example in the pie chart in Fig 1b? It might be useful to show the raw effect values as well.
- L370: cells denoted with stars are delivery locations - where are these?
Thank you for your valuable feedback and positive score. We are glad to see that you find the exposition of our paper clear and our development of mathematical tools principled. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
The mathematical notation are quite dense and at times a little hard to grasp. For example, Def 2.1, 3.1, 3.2 all look similar but clearly they are different. The authors included textual descriptions of these mathematical definitions (individually) which is great, but I would appreciate a paragraph comparing these definitions and highlighting the differences (for example, vs ).
Thank you for your feedback. In response, we have added a dedicated paragraph in Section 3 of the revised manuscript to incorporate your suggestion. We would also like to point out that Section 6.1 contains a similar discussion, which uses the Gridworld environment to provide a visual comparison of these definitions and better illustrate their subtleties.
Additionally, based on a related comment from Reviewer N9FM, we have also included a table in Appendix B summarizing all the introduced causal quantities and key notations to further enhance the clarity and accessibility of the paper.
Percentage conversion: Theorem 3.3 suggests the "total counterfactual effect" can be decomposed into "total agent-specific effect" minus "reverse state-specific effect". How are the two parts converted to a percentage contribution, for example in the pie chart in Fig 1b? It might be useful to show the raw effect values as well.
Thank you for your suggestion. In this simulated scenario, the total counterfactual effect (TCFE) is equal to 0.82. The total agent-specific effect (tot-ASE) is approximately 0.374, and the negative of the reverse state-specific effect (r-SSE) is approximately 0.446. Consequently, our method attributes % of the TCFE to how the two agents would have responded to the intervention, while the remaining % is attributed to the intervention's influence on the patient state. Following your feedback, we have updated the manuscript to include the raw effect values as well.
L370: cells denoted with stars are delivery locations - where are these?
In Figure 2a, delivery locations, denoted with stars, are located in the leftmost column of the Gridworld.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
Thank you for the response. I think the edits to Section 3 and percentage conversion helps improve the clarity of the paper. I'll maintain my already positive rating.
Thank you for your response and for maintaining your positive score. We are glad to see that you find our additions improving the clarity of the paper.
The paper investigates the counterfactual effect of an agent's action within a multi-agent Markov decision process (MDP) environment by decomposing it into two components: effects propagating through subsequent actions and effects propagating through subsequent states. For the first component, the study introduces agent-specific effects to assess the contribution of each agent. For the second component, the reverse state-specific effect (r-SSE), the paper applies the concept of intrinsic causal contributions (ICC) to measure the impact of each state.
This decomposition is evaluated across two experimental environments, with results providing insights into the distinct roles that agents and the environment play in shaping the overall effect. The paper discusses the implications and highlights the contributions of each factor in influencing outcomes, which is helpful in decision making process.
优点
-
This paper addresses an interesting and meaningful topic by explaining the counterfactual effect in MMDP through a two-part decomposition, which enables a clear, separate bi-level evaluation of each component's contribution.
-
The paper is well-written, and the proposed causal explanation formula is concise. Modifying the state-specific effect into a reverse state-specific effect, which can be combined with agent-specific effects, offers a reasonable way to evaluate counterfactual effects.
-
The experimental results seem promising, particularly in terms of supporting causal interpretation.
缺点
-
The explanation of the decomposition of TCFE, especially for r-SSE, lacks clarity and is somewhat difficult to follow since it's a reverse version. Including a toy example or a causal graph for Theorem 3.3 would improve comprehension.
-
The decision-making process in the Gridworld experiment (Section 6.1) is not clearly conveyed. A simplified trajectory or causal graph—such as a plot similar to Figure 1(a) or Table 1 in Appendix I—would help clarify the process.
-
While the experimental results allow for reasonable causal interpretation, there is limited evidence supporting their robustness or reliability. The results are reasonable under the specific settings, but can the reported percentages be considered reliable?
问题
-
In Figure 1(b), what does the unannotated section of the pie chart represent?
-
Adding a causal graph to illustrate ASE and r-SSE would improve clarity and enhance understanding of the counterfactual effect decomposition.
-
In Figures 3(a) and 3(b), why is the trust parameter allowed to be 0 in (b) but not in (a)?
-
While the identifiability of TCFE is discussed under the condition of noise monotonicity, what about the identifiability of r-SSE and ASE?
-
For the proposed decomposition of TCFE, is there a way to verify the validity of the results? Specifically, how reliable are the percentage contributions assigned to each agent or state transition? Could simulations with known ground-truth causal graphs be included to demonstrate their reliability? By comparing the estimated contributions of each agent and state with the ground truth, the causal explanation formula and subsequent algorithm would be more compelling.
Thank you for your valuable feedback. We are glad to see that you find the problem addressed in our paper interesting and meaningful, and our proposed solution reasonable. We are also happy to see that you find our paper well-written and our experimental results promising. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
The decision-making process in the Gridworld experiment (Section 6.1) is not clearly conveyed. A simplified trajectory or causal graph—such as a plot similar to Figure 1(a) or Table 1 in Appendix I—would help clarify the process.
Thank you for your suggestion. Appendix K.1 of our paper provides a detailed textual depiction of both trajectories from Fig. 2a. This depiction includes the Planner's observations and instructions, the Reporter's messages to Planner, as well as the actions taken, rewards granted and penalties incurred by both actors throughout each trajectory. In response to your feedback, we have added an explicit reference to these depictions in the caption of Figure 2a in the revised manuscript to enhance clarity.
In Figure 1(b), what does the unannotated section of the pie chart represent?
It represents the contribution score assigned by our decomposition approach to the patient state at step 15 for the depicted simulated scenario. It was omitted to avoid overcrowding the figure.
In Figures 3(a) and 3(b), why is the trust parameter allowed to be 0 in (b) but not in (a)?
When the trust parameter is set to 0, all of the AI's actions are effectively overridden by the clinician. In this scenario, intervening on the AI's actions would not yield any counterfactual effect on the trajectory's outcome. For that reason, we omit this case from Figure 3(a).
While the identifiability of TCFE is discussed under the condition of noise monotonicity, what about the identifiability of r-SSE and ASE?
Assuming exogeneity alongside noise monotonicity guarantees counterfactual identifiability in general, encompassing also more complex counterfactual effects such as r-SSE and ASE. This is formally established in Lemma 4.4 and discussed in detail in Appendix H of [5]. To clarify this point and avoid potential confusion, we have revised the paragraph Assumptions and counterfactual identifiability in our manuscript to explicitly highlight this aspect. Thank you for pointing it out.
Adding a causal graph to illustrate ASE and r-SSE would improve clarity and enhance understanding of the counterfactual effect decomposition.
Thank you for your suggestion. In response, we have added in Appendix N of our revised manuscript a figure that graphically illustrates all counterfactual estimates appearing in Definitions 2.1 (TCFE), 3.1 (tot-ASE), 3.2 (SSE) and Equation 2 (r-SSE) using the Sepsis example from the introduction section of [5] and the causal graph defined therein. Does this sufficiently address your concern? If not, we would like to kindly ask you for additional clarifications so that we can better implement your feedback.
For the proposed decomposition of TCFE, is there a way to verify the validity of the results? Specifically, how reliable are the percentage contributions assigned to each agent or state transition? Could simulations with known ground-truth causal graphs be included to demonstrate their reliability? By comparing the estimated contributions of each agent and state with the ground truth, the causal explanation formula and subsequent algorithm would be more compelling.
Thank you for your feedback. In response, we have conducted additional experiments to support the reliability and validity of our empirical results. A concise summary of these experiments is included in our General Comment (Additional Experiments Evaluating Reliability), with the complete set of results detailed in Appendix L of the revised manuscript.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
Thank you so much for your hard work and reply.
-
The additional experimental results, including the comparisons with true values and standard errors in the Appendix, demonstrate that the estimations are close to the ground truth and robust.
-
I was wondering if Figure 11 (b) and (c) should have distinct names to better reflect their differences? Perhaps you could add a few words in the paper to explain the meaning of the different colors. This would help readers understand the figures more easily.
Thank you for your prompt reply and constructive feedback. We are glad to see that you find our additional experimental results effectively demonstrate the robustness of our estimations. Please find our answers your questions and comments below.
I was wondering if Figure 11 (b) and (c) should have distinct names to better reflect their differences?
There was a typo in the caption of (c), which we have now corrected. Thank you for pointing it out.
Perhaps you could add a few words in the paper to explain the meaning of the different colors. This would help readers understand the figures more easily.
In response to your feedback, we have added in the caption of the figure a detailed description of the two-step Sepsis example as well as an explanation of all necessary symbols and colors. Additionally, we have included a reference to this appendix in the main body of the paper, to enhance its clarity and accessibility.
We thank you once more for your feedback. We would be happy to answer anything else in addition.
Thank you so much for your updated comments and I really appreciate the contribution the authors offer. My questions have been clarified, so I would like to raise my score to 8.
This is awesome :) thank you so much for raising your score and even more so for your very kind words. Your feedback has helped us improve our paper, and we sincerely appreciate that.
This paper studied counterfactual effect decomposition in multi-agent MDPs under the structural causal model framework. Specifically, it first shows that the total counterfactual effect of an agent's action can be decomposed into the agent-specific effect and the reverse state-specific effect. Second, it proposes an axiomatic framework based on agent-specific effects to attribute the total effect to individual agents, and introduces a method for attributing the reverse state-specific effect to state variables based on their "intrinsic" contributions.
优点
-
This paper bridges reinforcement learning with the causal reasoning community, extending several well-established concepts—such as natural direct/indirect effect decomposition, intrinsic causal contributions, and flow-based attribution methods—into multi-agent MDPs.
-
It provides insights into the distinct roles that agents and the environment play in influencing effects, demonstrated through real-world examples.
缺点
Novelty The paper examines the counterfactual effect of a single agent’s action. However, it would be more impactful and interesting to investigate the estimands for multiple agents' actions in a multi-agent MDP, especially under varying interaction structures. Additionally, under the single-agent action setting, Theorems 3.3 and 4.2 are straightforward extensions of the causal mediation formula (Pearl, 2001) and intrinsic causal contributions (Janzing et al., 2024).
Experiment I find the validity of conclusions such as the causal explanation formula (lines 411 and 412) questionable, given that both TCFE and tot-ASE are estimated quantities. I also find there is no experiment to demonstrate the estimation accuracy. It would be more insightful to first use simulations to assess whether the estimation algorithm accurately approximates the true values.
Assumption The interpretation of the key assumption for identification, noise monotonicity, is unclear. A discussion on the conditions under which it is valid, and when it may not hold, would clarify its applicability. Additionally, an illustration using the sepsis example would be helpful.
问题
Please see Weaknesses.
Thank you for your valuable feedback. We are glad to see that you find our approach insightful. Please find below our response to your comments and questions. For references, please see our general comment.
Response to Comments and Questions
The paper examines the counterfactual effect of a single agent’s action. However, it would be more impactful and interesting to investigate the estimands for multiple agents' actions in a multi-agent MDP, especially under varying interaction structures.
This is an interesting point of discussion. We view the extension of our current methodology to decomposing the total counterfactual effect of multiple agents' actions as a meaningful and impactful direction for future work. Our current approach can be straightforwardly extended to explain the effect of multiple actions taken by different agents at the same time-step. However, the situation becomes more complex when the actions have not been taken simultaneously. It is not obvious how to define the potential response of an agent, across time, to a (natural) multi-step intervention that might also be forcing some of the agent's future actions. Addressing this would require extending the definitions of agent- and state-specific effects to explicitly incorporate this temporal dimension. We see our work as a foundation for addressing these challenges, providing a systematic framework for extending effect decomposition to more complex counterfactuals in multi-agent decision making systems.
Additionally, we agree that investigating the interpretability of effect decomposition approaches under varying interaction structures is a critical avenue for exploration. To this end, our experiments already include two environments featuring heterogeneous agents and diverse interaction protocols. In the Gridworld environment, two actors trained with reinforcement learning follow the instructions of an LLM planner, who observes the environment through a Reporter module. In contrast, the Sepsis environment (formulated as a turn-based MMDP [2,3]) features an AI agent primarily acting in the environment under the supervision of a human clinician, who can interrupt the process at any time and override the AI's decision.
Additionally, under the single-agent action setting, Theorems 3.3 and 4.2 are straightforward extensions of the causal mediation formula (Pearl, 2001) and intrinsic causal contributions (Janzing et al., 2024).
Let us address your comment step by step.
Causal explanation formula. Even though Theorem 3.3 from our paper can be viewed as an extension of Theorem 3 from [4], as discussed in the paragraph Connection to prior work from our paper, we respectfully disagree with the characterization of this extension as "straightforward". As highlighted in the introduction section of our paper, prior work in mediation analysis has largely focused on effect propagation through causal paths. In contrast, Theorem 3.3 goes beyond this by explaining the total counterfactual effect of an agent's action through its influence on the state transitions and subsequent agents’ actions in an MMDP.
Reverse state-specific effect. Our approach to decomposing r-SSE, termed r-SSE-ICC, builds on the concept of intrinsic causal contributions (ICC) introduced by [8]. ICC is an information-theoretic measure quantifying how much an observed variable contributes to the uncertainty of another variable in a general SCM. We argue that extending ICC to address the attribution of uncertainty in a complex counterfactual effect, such as r-SSE, to the state variables of an MMDP-SCM is not a straight-forward process. A challenge specific to MMDP-SCMs, for instance, is that apart from state variables also action variables may directly or indirectly influence the variation in r-SSE.
General comment on contributions. We acknowledge that the theoretical results in this paper, i.e., Theorem 3.3 and Theorem 4.2, are not challenging to prove with the appropriate technical background. However, we view our work more as foundational, addressing a newly proposed and complex problem that is highly relevant to accountability in multi-agent AI systems. The primary contribution of this work lies in integrating concepts from a wide range of fields, including multi-agent sequential decision making (MMDPs), counterfactual reasoning (ASE), mediation analysis (Theorem 3.3), cooperative game theory (ASE-SV), information theory (r-SSE-ICC) and more, to tackle this novel problem. Finally, the empirical evaluation of our proposed solution demonstrates that it is interpretable and aligns with standard intuitions. This validates the potential of our approach to inform future research in this area.
I find the validity of conclusions such as the causal explanation formula (lines 411 and 412) questionable, given that both TCFE and tot-ASE are estimated quantities. I also find there is no experiment to demonstrate the estimation accuracy. It would be more insightful to first use simulations to assess whether the estimation algorithm accurately approximates the true values.
Thank you for your feedback. In response, we have conducted additional experiments to strengthen the validity of our conclusions and empirical findings. A concise summary of these experiments is included in our General Comment (Additional Experiments Evaluating Reliability), with the complete set of results detailed in Appendix L of the revised manuscript.
The interpretation of the key assumption for identification, noise monotonicity, is unclear. A discussion on the conditions under which it is valid, and when it may not hold, would clarify its applicability. Additionally, an illustration using the sepsis example would be helpful.
Thank you for your feedback. Please find our answers to your comments below.
Interpretation. Assuming that a causal model satisfies noise monotonicity, imposes a restriction on the model's counterfactual distribution. We begin by providing some intuition behind the imposed restriction through the Sepsis example from our paper, and then continue with a formal discussion.
Imagine a scenario where a patient receives antibiotics, and we observe high glucose levels. We are interested in the following hypothetical question: "Had the patient been treated with mechanical ventilation in addition to the antibiotics, would their glucose levels have become normal?". Under the noise monotonicity assumption, this counterfactual is plausible only if the probability of observing low or normal glucose levels increases when mechanical ventilation is added to an antibiotics treatment regimen, and at the same time the probability of observing low glucose levels in the hypothetical scenario does not exceed the probability of the observed scenario.
Formally, consider a structural causal model (SCM) , where is a categorical variable, e.g., can be the patient treatment and can represent the level of glucose from the above example, and let there be a total ordering on . Suppose we have observed and we wish to compute the counterfactual probability of taking a value had been set to . Under the noise monotonicity assumption, the counterfactual probability is strictly positive only if the following two conditions hold: (a) the probability of conditioned on exceeds the probability of conditioned on , and (b) the probability of conditioned on exceeds the probability of conditioned on (Lemma 4.4 in [5]). A similar reasoning applies for cases where or .
Validity conditions. We acknowledge that causal assumptions, such as noise monotonicity, albeit necessary for counterfactual identifiability may indeed restrict the applicability of our effect decomposition approach in more practical domains. This is a common trade-off in counterfactual reasoning, as highlighted in the references from our Discussion section. Similar to the standard monotonicity assumption [6], or related causal assumptions for counterfactual identifiability [1], noise monotonicity cannot be verified from observational data. Thus a discussion about when it is valid or not in this context is not possible.
Additional experiments. We have conducted additional experiments to evaluate the robustness of our empirical findings to the noise monotonicity assumption. A concise summary of these experiments is included in our General Comment (Additional Experiments Evaluating Robustness to Noise Monotonicity), with the complete set of results detailed in Appendix M of the revised manuscript.
Conclusion
We thank you again for your comments and questions. We would be happy to answer anything else in addition.
Thanks for the response. Is it possible to test noise monotonicity using observational data, for example, within the sensitivity analysis framework?
Thank you for your question. In our sensitivity analysis for noise monotonicity (Additional Experiments Evaluating Robustness to Noise Monotonicity and Appendix M) we have essentially only used the observational data distribution. We do not assume access to the correct causal model, interventional or counterfactual distribution at any point of this experiment.
Thanks for the response. Considering the limited novelty in the contributions, particularly on the multi-agent aspect, I will maintain my weak reject score. However, I do not hold a strong opinion on acceptance or rejection.
Thank you for your response and also thank you once more for your valuable feedback.
We thank all the reviewers for their valuable feedback and insightful questions. We have updated our manuscript to incorporate the majority of your suggestions, including new experiments. We note that all changes in the revised manuscript are highlighted in blue color for clarity of exposition. Please find our answers to your questions and comments below.
We have conducted additional experiments to support the reliability of our empirical findings. First, for the Gridworld environment, we computed the ground-truth contribution scores for all agents and state variables as determined by our causal explanation formula. These ground-truth scores were then compared against the estimates we had obtained using posterior sampling in the original submission. Second, for the Sepsis environment, we repeated the experiment across 10 different seeds and analyzed the standard error distributions of all estimated counterfactual quantities.
The detailed description of the new experiments and the complete set of results are included in Appendix L of the revised manuscript. Below, we provide a concise overview of our findings:
Gridworld experiment. The relatively simple dynamics of the Gridworld environment and underlying causal model allowed us to compute the exact contribution scores as determined by the proposed causal explanation formula. In Appendix L.1, we present a side-by-side comparison of these ground-truth scores with the estimates we had obtained using posterior sampling in the original submission. Notably, the ground-truth values consistently fall within the standard error bounds of the estimated quantities.
Sepsis experiment. Exactly computing the values of all estimated counterfactual quantities in the Sepsis experiment is significantly more challenging than in the Gridworld setting. To assess the reliability of our findings in this experiment, we repeated it across 9 additional seeds. Appendix L.2 presents box plots for all estimated counterfactual quantities, illustrating their empirical standard error distributions across the 10 seeds. The standard errors are shown relative to the absolute means. These visualizations reveal minimal variability in the estimates of our causal explanation formula across seeds, with only a very small number of outliers. These results reinforce the reliability of our findings and support the robustness of our effect decomposition approach in the Sepsis experiment.
In the Sepsis experiment, we assume noise monotonicity with respect to some chosen total ordering. To assess the robustness of our empirical results to this choice, we repeated the experiment across 5 additional total orderings.
The results of these additional experiments showcase that our conclusions from Section 6.2 are rather robust to violations of noise monotonicity (especially conclusions related to the average percentage decomposition). This is encouraging, as it provides evidence supporting that our causal explanation formula can yield valuable insights in simulated domains where: (a) effect sizes and randomness reflect real-world settings [9], (b) only observational data are available, and (c) our theoretical assumptions might not hold, as it often happens in practice.
The detailed description of the new experiments and the complete set of results are included in Appendix M of the revised manuscript.
[1] Oberst, Michael, and David Sontag. "Counterfactual off-policy evaluation with gumbel-max structural causal models." International Conference on Machine Learning. 2019.
[2] Sidford, Aaron, et al. "Solving discounted stochastic two-player games with near-optimal time and sample complexity." International Conference on Artificial Intelligence and Statistics. 2020.
[3] Jia, Zeyu, Lin F. Yang, and Mengdi Wang. "Feature-based q-learning for two-player stochastic games." arXiv preprint arXiv:1906.00423. 2019.
[4] Pearl, Judea. "Direct and Indirect Effects.". Conference on Uncertainty and Artificial Intelligence. 2001.
[5] Triantafyllou, Stelios, et al. "Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs." International Conference on Machine Learning. 2024.
[6] Pearl, Judea. "Probabilities Of Causation: Three Counterfactual Interpretations And Their Identification." Synthese. 1999.
[7] Balcan, Maria-Florina, Ariel D. Procaccia, and Yair Zick. "Learning cooperative games." International Conference on Artificial Intelligence. 2015.
[8] Janzing, Dominik, et al. "Quantifying intrinsic causal contributions via structure preserving interventions." International Conference on Artificial Intelligence and Statistics. 2024.
[9] Tran, Allen, Aurelien Bibaut, and Nathan Kallus. "Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments." International Conference on Machine Learning. 2024.
This paper studied counterfactual effect decomposition in multi-agent MDPs under the structural causal model framework. The author nicely bridged reinforcement learning with the causal reasoning community, extending several well-established concepts into multi-agent MDPs. The paper is well written and organized. While there are remaining concerns from reviewers regarding the novelty, contribution, and utilities of this work. As raised by several reviewers, this paper is a direct extension of Triantafyllou et al. (2024) and Janzing et al. (2024) with the effect defined on a more granular level. The considered Shapley value is already well established and the proposed attribution method is similar to the causal mediation formula (Pearl, 2001). I appreciate the authors' efforts in connecting all these dots but the utility of the proposed effect decomposition is not quite clear for multi-agent MDPs. Considering this work focus on the framework of sequential decision making, it is crucial to justify how the proposed method is useful besides interpretation of effects. There are other comments regarding the strong assumptions of noise monotonicity which further limited the practical utility of this work.
Overall, this paper was rather borderline; we'd encourage the authors to take into consideration all the feedback provided by the reviewers to further strengthen their manuscript for resubmission.
审稿人讨论附加意见
Though part of reviewers' concerns have been nicely resolved during rebuttal and some reviewers are satisfied with the rebuttal, the main concerns from reviewers remain regarding the novelty, contribution, and utilities of this work. With more careful reading and comparison with existing literature, as raised by several reviewers, this paper is a direct extension of Triantafyllou et al. (2024) and Janzing et al. (2024) with the effect defined on a more granular level. The novelty and contribution is marginal compared to these work; and more importantly, the current paper did not justify and detail how the proposed method is useful for sequential decision making. Considering these factors, this paper is not quite ready for publication.
Reject