Using an incorrect format (ICLR 2024) for submission.
Typos (Also, please add indexes to each equation to make it easy to be referred to):

In the first equation about the definition of , chapter 3.1, there is an extra ")".
In the last equation of chapter 3.1, is missing on the top of the colored in red.

There's not much originality in the key idea of simply combining the model-generated data with the real data to augment the training process, which has already been used in previous works [1] and other model-based reinforcement learning algorithms [2-3]. As the distribution shift problem has been widely researched in offline RL, it's not surprising that a similar problem will occur in off-policy RL. Although the authors provide theoretical analysis to show the instability brought by target policy action selection, more profound results are expected, such as how will the model error affect the performance and when to trust model-generated data in off-policy RL.
The algorithm is mainly based on TD-MPC2, while several components are changed. While the proposed MAD-TD is compared with the original TD-MPC2, the influence of these changes on performance has not been clearly explained or demonstrated.

[1] Lu, Cong, Philip Ball, Yee Whye Teh, and Jack Parker-Holder. "Synthetic experience replay." Advances in Neural Information Processing Systems 36 (2024). [2] Sun, Yihao, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, and Yang Yu. "Model-Bellman inconsistency for model-based offline reinforcement learning." In International Conference on Machine Learning, pp. 33177-33194. PMLR, 2023. [3] Rigter, Marc, Bruno Lacerda, and Nick Hawes. "Rambo-rl: Robust adversarial model-based offline reinforcement learning." Advances in neural information processing systems 35 (2022): 16082-16097.