What specific data was used to train the Contrastive Object-Skeleton Alignment (COSA) adapter?
Camera-Related Details Require Further Clarification:
- How are camera parameters defined (angle-based or other representations)?
- What is the form and dimension of camera pose embeddings in Skeletal Correlation Modeling (SCM)?
- How are camera views represented during multi-view texture refinement? Are intrinsic and extrinsic parameters of a perspective camera model used for differentiable rendering texture optimization?
The current implementation appears to treat all joints with full degrees of freedom. However, in reality, some skeletal joints for human (like elbows and knees) have constrained movement. What are the authors' future considerations regarding these anatomical constraints?
Regarding Equation 2, please clarify:
- Which variables are involved in the gradient computation of ?
- What are the specific inputs required when using this gradient term as guidance?