Responses to Reviewer GNQs

Thank you for your thoughtful review and constructive suggestions. We are delighted by your recognition of our paper's clear presentation and ADAM’s performance. We provide detailed replies to your comments and hope we can resolve your major concerns.

Concerns on Weaknesses:

It contains numerous typos and minor errors. I recommend that the authors carefully review and correct these issues.

Thank you for your suggestion. We believe the phrase "numerous typos" may overstate the issue, as we have not identified many. Nonetheless, we will continue to review our paper carefully. Should you notice any additional typos, we welcome your input for further improvement.

The major issue lies in the extensive use of pretrained language models that already incorporate substantial knowledge of Minecraft. Since language models may internally form a comprehensive causal graph of the game world, primarily in linguistic form, the proposed additional causal graph construction might be redundant.

We respectfully disagree and believe there may be a misunderstanding of our fundamental experiment. Your statement appears contrary to the findings presented in our paper. ADAM's robustness in the modified environment stems from its ability to construct causal graphs without relying on prior knowledge (Fig. 1 (Line 68), Robustness (Line 428)). Specifically, ADAM uses the reasoning ability of LLM (Line 268) and intervention-based causal method (Line 280) to obtain the causal graph. This has been clearly demonstrated in our experiments and was highlighted by Reviewer b8Jp as strengths 1 and 2.

The issue you raised is actually a limitation of existing methods (such as VOYAGER [1], Jarvis-1 [2], OmniJARVIS [3]), and represents a fundamental difference between our approach and others. In the modified environment, the game rules (causal graph) of Minecraft are altered, making prior knowledge even harmful. Therefore, in such a modified environment, "LLMs incorporated substantial knowledge of Minecraft" is actually mismatched with the environment, and only constructing causal graphs from scratch allows for successful task completion. This is one of our most important experimental results (Fig. 1 (Line 68), Table 7 (Line 1055)), demonstrating ADAM's superior performance -- being the only approach capable of mining diamonds.

The causal graph constructed by ADAM is not derived from the prior knowledge embedded in LLMs; instead, it emerges from an iterative intervention and reasoning process enabled by the Interaction Module and the Causal Model Module. This distinction is crucial**: prior knowledge from LLMs is inherently limited to static information, whereas ADAM's approach allows for adaptability in dynamic environments.** The learned causal graph provides ADAM with enhanced robustness to environmental changes, as demonstrated in our experiments (Robustness, Line 428).

I suggest that the authors explore scenarios with completely altered world rules in Minecraft to test the validity of models like GPT in such modified environments, perhaps using a setting like “Mars.”

Our robustness experiment (Line 428) has already altered the core dependencies of the game, and in the ablation experiment, we verified that the core of task performance is the reasoning ability of LLM, not prior knowledge.

Mars [4] uses Crafter [5], which simplifies the environment while retaining Minecraft's prior knowledge. In other words, Mars [4] (Crafter [5]) does not "(fully) alter world rules".

The agent’s modular design is nearly identical to Voyager, with the primary addition being the causal graph.

We respectfully disagree. The three key components of VOYAGER [1] -- Automatic Curriculum, Iterative Prompting Mechanism, and Skill Library -- are fundamentally different from our approach. We are unclear about the basis for your conclusion, as all our innovative designs are tailored to the process of discovering (Line 205, 268), verifying (Line 280), and utilizing (Line 316) causal graphs from scratch, which VOYAGER does not address.

While VOYAGER manages existing knowledge and skills via a skill library, ADAM employs causal graphs, representing a significant difference in knowledge discovery, storage, and utilization. Incorporating causal structures into embodied intelligence requires considerable effort, which constitutes our original contribution, as acknowledged by reviewers Gehh and b8Jp.