We would like to thank the reviewer for their feedback. We provide an answer to their questions below.

The work can be seen as an engineering trick: Contrary to viewing the 7 DoF as an inherent opportunity for E3, our new action space is a simple strategic response to this overactuation problem, commonly encountered in robot arms with greater than 6 DoF. The E3 action space applies classical robotics control techniques to Robot Learning (see "Redundant manipulators control" part of the "Related Work" section of the original paper), aiming to replace Joint and EE action spaces currently used in the robot learning literature.

While the E3 action space offers a straightforward solution, its simplicity is its strength: it can be seamlessly applied to many existing and future robot learning works. Additionally, it is clear from our findings that this simple change in action space for robot learning dramatically improves performance on tasks where whole-arm control is imperative. As we confront increasingly complex challenges, such as navigating real-world scenes involving cupboards and other intricate, hard-to-navigate areas, it becomes evident that conventional action spaces fall short in providing the necessary control over the arm's configuration whilst maintaining data-efficient learning.

Previous contributions, which also mostly focus on re-engineering the action space, e.g. by adopting primitives [1] or next-best pose [2], has had a significant impact on the field, leading to higher performance, through only minor modifications of the learning algorithm.

Generalization to 7 DoF arms: In the case of a 6 DoF arm, such as the UR robot family, where the "elbow" has two solutions for end-effector (EE) poses (elbow up or elbow down), E3J method would utilize pose information along with a binary flag indicating elbow orientation. This contrasts with the conventional approach in robot control for UR robots, where the elbow joint is typically constrained to either the up or down position.

Additionally, 6 DoF arms face limitations due to their morphology. Unlike overactuated arms, they may struggle with tasks demanding flexibility. For instance, consider our real-world setup with a cabinet except replacing the Franka Emika Panda arm with a UR5. The UR robot's vertical elbow position would hinder solving the task from this position where the elbow needs to be rotated to the horizontal place in order to reach inside the cupboard. Visual example in simulation available at https://imgur.com/a/n4L5UZH : while the cup position would be reachable by the UR5, the presence of the obstacle and the stiffness of the UR5's elbow do prevent the robot from reaching inside the cabinet.

Why the method talks about the elbow? in most 7 DoF arms, the redundancy of the robot allows control over one more dimension, which can be identified in the position of the elbow. As we mention in the Introduction, E3A allows direct control of the elbow by controlling its angle, while E3J allows indirect control of the elbow by constraining one of the joints to fix the elbow position. Thus, both the methods presented allow control over the elbow redundancy (see Figures 2 and 3 from the original submission for further illustration).

Experiments to validate the choice of the joint: in Section 4.4, we motivated our choice for mainly considering E3J-base and E3J-wrist in the first iteration of the manuscript. In order to further validate our choice, we have now completed the ablation study, where we attempt to constrain other joints of the robot for E3J (see Section A4 and Figure 13 in Appendix). The results show that our intuitions generally proved correct, but there are tasks where constraining other joints than the base or the wrist improves sample efficiency. We also believe E3A may constitute a more general strategy than E3J, but in practice we found that the agent struggles to directly control the elbow angle more than indirectly controlling the elbow through one constrained joint. As finding out what's the best action space for robot learning, in terms of full-body control and efficiency, is one of the main goals of this work, we have chosen E3J as the idea that best fits this definition according to empirical evidence.

We hope our revision answers the reviewers' doubts and we look forward to any further suggestions to improve our work.

[1] Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives, Dalal et al

[2] Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation James et al