PaperHub
6.0
/10
Poster3 位审稿人
最低4最高7标准差1.4
4
7
7
3.3
置信度
正确性2.7
贡献度3.0
表达2.3
NeurIPS 2024

Reconstruction of Manipulated Garment with Guided Deformation Prior

OpenReviewPDF
提交: 2024-04-23更新: 2024-11-06

摘要

关键词
Garment reconstructionDeformation priorsGeometric deformationsGarment manipulation3D to UV mappingNon-rigid reconstruction

评审与讨论

审稿意见
4

The paper aims to recover garments that are manipulated instead of worn. The method first generates the UV mappings from point clouds, followed by ISP to recover the complete mapping. A diffusion model is used to extract the deformation priors and guide the recovery from UV mappings to 3D mesh. Experiments show that the proposed method delivers lower reconstruction errors and outperforms the baselines.

优点

  • This method is able to recover garments in a more general and complex poses.
  • The proposed model achieves robust performance as shown in the experiments.

缺点

  1. This method seems to be garment-specific design, or even topology-depend. For example, to recover the shirt and pants, one need to train different models to recover the garments, leading to limited generalisation abilities.
  2. While the ground truth may include too much details, i.e. too many wrinkles, and even looks a bit noisy, the recovered garments are oversmooth. The proposed model fails to recover high frequency details of the garments.

问题

  1. Are the recovered garments meshes able to be used for further animations?
  2. How to connect the edges in the recovered mesh from point clouds? Are the edges fixed or dynamically connected? Since the point clouds do not include any information about the connectivities, how to define the edges?
  3. In the qualitative results, such as Figure 5, the recovered garments seem to be smoother than the ground truth. Is this because of the “auto smooth” option during rendering? Could you provide some visual results of the smoothed ground truth garments?
  4. What is the averaged number of points for different garments? Is the model able to deal with large number of points?
  5. While the proposed method is able to handle garments in more complex poses, is it possible to compare with other baselines using the garments worn by the human body, e.g. the quantitative and qualitative results on CLOTH3D dataset?

局限性

Please refer to the weaknesses and questions.

作者回复

Thank you for your valuable reviews.

To provide a context, we would first like to briefly describe our pipeline. Given the point cloud, we first map each point to the UV space using the UV mapper. This yields a sparse UV map and a sparse panel mask. We then use Eq. (10) to fit the optimal latent code z**z**^* for the ISP model from the sparse panel mask. Note that with z**z**^*, we can recover a complete panel mask and a rest-state garment mesh that defines the vertices and faces of the garment. Then, we leverage the diffusion model to recover the complete UV map, utilizing the sparse UV map and the complete panel mask as the guidance in the reverse diffusion process.

This being said, below is the actual response to your comments.

  1. Garment-specific design.

Our approach is not specific to a particular garment. However, due to the challenging nature of the task where only a partial garment is observed, we consider category-level garment reconstruction as in much prior art such as GarmentNets [27] and GarmentTracking [2]. Since a folded shirt and a pair of folded trousers can exhibit similar shapes as shown in Fig. 2 of the attached PDF file, we need to use different models for them.

  1. The recovered garments are smooth.

The qualitative results in our paper are rendered without smoothing. Both the ground truth meshes and our reconstructed meshes are visualized in their raw form. The fact that our reconstructions seem smoother than the ground truth meshes can be attributed to the tendency of neural networks to learn low-frequency functions [Rahaman2019], which yields smooth reconstructed UV maps. Furthermore, our guided denoising process finds the expected reconstructions x^\hat{x} given observations yy and the starting noise xTx_T, where x^=E(xy,xT)=xxP{xy,xT}\hat{x} = E(x|y,x_T) = \sum_x xP\lbrace x|y, x_T\rbrace. In areas where the data is missing, it provides only weak guidance and the reconstructions are naturally smooth. In future work, we will explore methods to enhance our diffusion model to capture finer details.

  1. Are the recovered garments meshes able to be used for further animations?

Yes, our recovered meshes can be used for animation and simulation directly. In Fig. 3 of the attached PDF file, we show the simulated results for the recovered shirts using Blender, where we drop them onto a horizontal bar.

  1. How to connect the edges in the recovered mesh from point clouds?

We do not compute edges for the point cloud to generate the garment mesh. Instead, we use Eq. (10) to fit the optimal latent code z**z**^* for ISP model from the sparse panel mask. Using z**z**^* alongside the ISP meshing process, we reconstruct a garment mesh in rest state as illustrated in the bottom-right of Fig. 2 of the main paper. This mesh defines the vertices and faces of the garment. To reconstruct the garment in the observed deformed state, we update the vertex positions by V=M[u,v]**V**=\mathcal{M}[u,v], where M\mathcal{M} is the recovered UV map and (u,v)(u,v) is the corresponding UV coordinate of V**V**.

  1. What is the averaged number of points for different garments? Is the model able to deal with large number of points?

The averaged numbers of points provided by VR-Folding dataset [2] are 30K for Shirt/Pants/Skirt and 27K for Top. However, instead of using all points, we randomly sample 4000 of them as the input. Therefore, we are able to handle large number of points.

  1. Garments worn by the human body.

Using ISP model to recover on-body garments has been shown to work in [1,40]. Consequently, we focus on the more challenging task of recovering garments not being worn. Our method differs from [1, 40] by utilizing a diffusion model as the deformation prior and leveraging UV mapping along with the proposed fitting method (Sec. 3.3 and 3.4) to recover complete garment meshes from partial point clouds. With appropriate training data, our method can also handle garments worn on the human body. However, due to time constraints, we are unable to present results for this specific scenario.

References

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A.Courville. On the spectral bias of neural networks. In International Conference on Machine Learning, 2019.

评论

Dear Reviewer J5PZ,

As the discussion period is approaching its end, we would like to ask if you have any questions or comments regarding our rebuttal.

Thank you again for your time and consideration.

审稿意见
7

The pape addresses the challenge of accurately reconstructing the 3D shape of garments that are manipulated rather than worn. The authors leverage the Implicit Sewing Patterns model and introduce a diffusion-based deformation prior to recover 3D garment shapes from incomplete 3D point clouds. The method maps these points to UV space, generates partial UV maps, and uses a reverse diffusion process to produce complete UV maps and 2D to 3D mappings. The approach demonstrates superior accuracy compared to previous methods, especially in handling large non-rigid deformations.

优点

The focus on reconstructing manipulated garments rather than worn ones addresses a significant gap in current research, as most existing methods assume garments are worn and thus have less complex deformations.

Combining ISP with a diffusion-based deformation prior is a strong methodological contribution, enabling the modeling of complex deformations that were previously challenging to capture.

缺点

The accuracy of the reconstruction heavily depends on the quality of the input point clouds. Incomplete or noisy point clouds might still pose a challenge.

问题

Can this method handle noisy or highly sparse point clouds effectively, and have you tested its robustness in such scenarios?

局限性

While the method shows promise, its ability to generalize across a wide variety of garment types and materials without retraining is not fully explored.

作者回复

Thank you for your appreciation of our work. We address your questions and comments as follows:

  1. Handling noisy or highly sparse point clouds.

To evaluate performance under noisy conditions, we add per-point Gaussian noise to the input data, varying the standard deviation. As shown in Fig. 1 (b) of the attached PDF file, the results on the Folding Pants subset indicate that reconstruction error increases with noise levels; however, the errors remain relatively low across different noise levels. Additionally, the evaluation of real-world data in Sec. 4.4 demonstrates the robustness of our method, even when the input point cloud, generated using Nerf, is noisy and inaccurate.

Regarding sparsity, the captured points are generally dense in visible areas. Instead of using all available points, we randomly sample 4000 points from them as the input. We also evaluate the influence of point quantity on reconstruction quality by analyzing errors with varying input point numbers on the Folding Pants subset. The results, shown in Fig. 1 (a) of the attached PDF file, reveal that while a reduction in points leads to increased error, we maintain a relatively low error margin even with only 2000 points.

  1. Generalization across a wide variety of garment types and materials without retraining is not fully explored.

Due to the challenging nature of our task where only a portion of the garment is observed, we consider category-level garment reconstruction as in prior art GarmentNets [27] and GarmentTracking [2]. However, we acknowledge the reviewer's point that investigating the generalization across types and materials is an important direction for future work.

评论

Thank you for your thorough and detailed responses to my questions and comments. I appreciate the additional experiments and analysis you provided to address my concerns.

Regarding the handling of noisy or highly sparse point clouds, I appreciate the effort to evaluate your method's performance under varying levels of Gaussian noise and with different quantities of input points. It is encouraging to see that your method maintains relatively low reconstruction errors even as noise levels increase and point quantities decrease. The robustness demonstrated on real-world data also strengthens the confidence in your approach.

On the topic of generalization across a wide variety of garment types and materials, I understand the challenges associated with reconstructing garments when only a portion is observed. While category-level reconstruction is a reasonable approach given these challenges, I agree that exploring the generalization capabilities across different garment types and materials is an important direction for future research. I appreciate your acknowledgment of this point and openness to further investigate it.

Overall, I commend the contributions of your work and the thoroughness of your rebuttal. My overall assessment and rating of the paper will remain the same.

评论

Thank you for your commendation of our contributions!

审稿意见
7

This paper presents a method for reconstructing folded and crumpled garments from point cloud data. It uses the implicit sewing pattern (ISP) model to represent the 3D shape in 2D uv-maps. The proposed method converts a 3D point cloud to sparse uv-maps and corresponding masks for front and back side using an encoder structure followed by a MLP. The incomplete masks are filled and used to guide the completion of the uv-maps via a diffusion process. Finally, the deformed mesh can be recovered from the filled uv-map.

优点

This paper improves state-of-the-art reconstruction of point cloud data for folded garments in visual quality as well as 3D accuracy. Notably, this is done while no prior knowledge of the garment geometry is needed. The usage of a diffusion network to fill the sparse 2D data of the ISP model is a clever idea and matches the network characteristics very well.

缺点

The comprehensibility of the paper could be improved by discussing the different parts of the pipeline in order and clearly pointing out the result of each stage and its purpose for the next stage. Some intermediate results for different scenes might be helpful to follow the pipeline.

问题

How many points do the input point cloud contain? Did you test how many points are necessary and how accurate do they have to be to produce a high-quality reconstruction?

局限性

The limitations are just mentioned very briefly. Some quantitative evaluation on the number of intersections in the reconstructed mesh or more animated reconstructions would show how large these limitations are. An analysis might even benefit the method as e.g. the number of intersections seems to be low based on the qualitative results.

作者回复

We thank you for your acknowledgement of our contribution in manipulated garment reconstruction. Below are our responses to your comments and questions.

  1. Comprehensibility.

Thank you for pointing this out. At the end of each stage and to enhance comprehensibility, we will revise our paper and refer the reader to the intermediate results in our main framework figure (Fig. 2 of the main paper).

  1. How many points do the input point cloud contain?

We use 4000 points randomly sampled from the captured point clouds as the input. To evaluate the influence of point quantity, we analyze the reconstruction errors by varying the number of points used as input on the subset of Folding Pants. The results are reported in Fig. 1 (a) of the attached PDF file. A reduction in points correlates with increased error. However, even with 2000 points, we maintain a relatively low error margin. We will include this experiment in our final version of the paper.

  1. How accurate do the points have to be to produce a high-quality reconstruction?

To evaluate the influence of input point noise, we add per-point Gaussian noise to the input with varying standard deviation. Fig. 1 (b) of the attached PDF file shows the results on the subset of Folding Pants. It illustrates that as the noise level rises, so does the reconstruction error; nonetheless, the errors remain relatively low across different noise levels. We will include this experiment in our final version of the paper. Additionally, the evaluation of real-world data in Sec. 4.4 of the main paper also demonstrates the robustness of our method, where the input point cloud is generated using Nerf which is noisy and inaccurate.

  1. Quantitative evaluation on the number of intersections or more animated reconstructions.

In Table 1 of the attached PDF file, we evaluate the intersections of our reconstructions and compare them with those of GarmentTracking [2] using the ground-truth initialization. We compute the average ratio of faces with intersection as the evaluation metric. Notably, our results exhibit fewer intersections compared to GarmentTracking on Pants, Top and Skirt. We will revise our paper to include this evaluation and more reconstruction results.

作者回复

We would like to thank all reviewers for their valuable suggestions and constructive comments. We have carefully considered and addressed each of the suggestions and questions raised. We will incorporate these suggestions into our revised paper.

The attached PDF file includes the following additions:

  • figures of error curves under varying numbers of points and noise levels as suggested by Reviewers 2QwG and 4mMP;
  • a table of intersection evaluation as recommended by Reviewer 2QwG;
  • illustrative examples for Reviewer J5PZ.

Once again, we sincerely thank all reviewers for their expertise and the time they spent in reviewing our paper.

最终决定

This paper received mixed ratings, with two positive and one negative review following the rebuttal. The authors provided detailed responses to all the reviewers' questions. The primary critique from the negative review focused on the method being limited to a specific garment and the insufficient recovery of high-frequency wrinkles. However, category-specific models are a common assumption in this field and should not be grounds for rejection. Moreover, the paper achieves state-of-the-art results compared to previous approaches and is considered technically sound.

The AC concurs with the positive reviews, believing that the strengths of the paper outweigh the critiques, and therefore recommends acceptance.