The static Gaussian field and the deformable Gaussian field are optimized separately using mutually exclusive masks, but why do the authors claim that the proposed hexplane-based Gaussian field is a unified representation? From my perspective, 4DGS appears to provide a more ‘unified’ representation.
Is the deformable Gaussian field optimized using only foreground dynamic masks? In other words, only dynamic regions of the deformable Gaussian field are supervised? Then how to ensure that Gaussians in static regions would not be affected by the deformable field when rendering from a novel view?
How are 3D Gaussians initialized before optimizing the Hexplane-based Gaussian field?
Please provide more details on how to minimize the objective defined in Eq. 16.
How are the static regions obtained in each frame?
It would be beneficial to provide a mathematical form of the rotDeform in Eq. 17.
How many datasets are used for evaluation? The authors list three: DyCheck, NVIDIA DynamicNeRF, and MPI Sintel (Line 86-87), but an additional DAVIS dataset is also mentioned (Line 413).
Including relevant citations for the scale-invariant loss and ARAP loss would be helpful.
I am concerned about the qualitative results in Fig. 4, especially the last two columns, which empirically do not align with the quantitative results in Tab. 2. There is an urgent need for additional qualitative comparisons presented in the form of videos.

When the authors or the publication are included in the sentence, the citation should not be in parenthesis using \citet{}.

Please check the format of notations. For example, it is recommended to format the high-dimensional spatial-temporal feature in bold as .
A non exhaustive list of typos:
- Line 69: vides -> videos
- Line 137-138: files -> fields