We thank the reviewer for the constructive and informative review.

[Q1] More Comparison.

Our method focuses on human body avatars, which is different from the aim of Articulated Point NeRF [1]. So we do not compare the general dynamic NeRF baselines in Articulated Point NeRF.

Both Neuman [2] and InstantAvatar [3] focus on monocular video reconstruction, and do not compare themself to the standard multi-view video benchmarks like H36M and ZJU-Mocap. So directly comparing with their methods is unfair.

Neuman [2] is designed for learning human NeRF and scene NeRF from a single-view video, and it only learns a frame-dependent error-correction network for observation space. So quantitative comparison is unfair. We give a qualitative comparison of ZJU-Mocap in the updated supplementary. Our results preserve more details than Neuman.

InstantAvatar [3] is designed for learning a neural field from a fixed-posed human-rotating video or a synthetic video without loose clothes, and it is unknown that it can handle non-rigid dynamics with large poses from multi-view datasets. We also do not find any qualitative results from InstantAvatar for extensive comparison. Furthermore, our rendering speed (100 FPS) is faster than InstantAvatar (15 FPS).

Ultimately, we have referred to the related literature and uploaded the newer version of the pdf.

[Q2] Using existing skinning templates or not.

It depends. We learn geometry, appearance, and motions (skinning and non-rigid) to provide a template-free solution to avatar reconstruction. The problem with using templates is that:

The templates are usually proprietary in real-world usage.
There exists a gap between the skinning field of clothed humans and naked humans. Our skinning field can fill the gap via optimization.
Templates are object-specific (e.g., mature humans), which hinders their wider application to other kinds of objects.

However, there are other factors to be considered. For example, if there are limited variety of motions in the training data, the quality of the skinning field may be poor due lack of inter-frame regularization. Besides, Leveraging templates is a good choice to initialize or regularize the skinning field and speed up convergence [2,4].

[Q3] Efficiency breakdown.

The efficiency of the method comes from two-fold: Both geometry and rendering. We provide a runtime breakdown and analyze the efficiency. The runtime breakdown:

Function	Time (ms)
Extract Mesh (w/ geo. NF query, w/ non-rigid NF query)	11.39273
Extract Mesh (w/o geo. NF query, w/ non-rigid NF query)	3.08209
Render Mesh (w/ texture NF query, w/ env light query)	7.00726
└── (texture NF query)	4.08976

"NF" means Neural Fields.
Extract Mesh: For NF query in mesh extraction, we query both the canonical SDF field and the non-rigid motion field.
In inference time, we only query the neural field once to get the canonical mesh and use the same mesh for the latter rendering. The motion field will be queried by the number of vertex of the mesh times.
In training time, we need to query the NF for every optimization step to update the SDF and non-rigid fields.
The insight is that, compared with NeRFs, our gradients only flow through the iso-surface, which is drastically less than NeRF, which where the gradients flow over the whole space.
Render Mesh:
For rendering an NxN resolution image, we only query O(NxN) times for the texture field thanks to the rasterization. While NeRF-based method needs to query O(NxNxM), where M is the number of points per pixel (ray). The usage of tiny-cuda-nn, a highly optimized package for neural fields, further improves the speeds.
It involves both texture querying and env-light map querying.

[1] https://lukas.uzolas.com/Articulated-Point-NeRF/ (https://lukas.uzolas.com/Articulated-Point-NeRF/)

[2] https://machinelearning.apple.com/research/neural-human-radiance-field (https://machinelearning.apple.com/research/neural-human-radiance-field)

[3] https://tijiang13.github.io/InstantAvatar/ (https://tijiang13.github.io/InstantAvatar/)

[4] https://github.com/taconite/arah-release(https://github.com/taconite/arah-release)