Thank you for your constructive suggestions and insightful comments! Following a reviewer's suggestion, we added (1) more ablation studies and (2) additional experiments for image generation with Stable Diffusion and MaskGiT

Ablation studies on the noising faction (K)

Thank you for the thoughtful suggestions regarding the ablations. In response, we have performed additional ablation studies by varying key hyperparameters. To provide a quick yet informative signal, we focused on the ss-match and cRMSD tasks. Here is a link to figures describing experimental results. We plan to extend these studies to other tasks in the final version.

We added an ablation study varying K (Figure 1,2 in the link) by fixing the computational budget for evaluating reward models. The results show a strong performance when K/T=10% or 20%. Generally, a large K/T reduces the benefit of refinement, while a very small K/T limits the opportunity for reward-guided decoding.
We also performed an ablation on L, the number of repetitions for importance sampling (Figure 3,4 in the link). As expected, performance improves with a larger L due to the increased computational budget.

The proposed method is only implemented on discrete models

That's a great point! We’ve focused on discrete diffusion models because they tend to have a greater impact in the protein design domain. That said, our method can integrated with continuous diffusion models as well. To verify this, we have implemented our method when we set Stable diffusion as pre-trained continuous diffusion models and compressibility (the negative file size in kilobytes (kb) of the image after JPEG compression) as reward models (Figure 5 in the link). Following our experiment section, we have tried two scenarios where we set K/T to be 10% and 20%. This figure also highlights the effectiveness of iterative refinement in continuous diffusion models, as we showed in our protein design scenarios (Figure 6 in our original draft). We will incorporate them in the final version.

Limited novelty over [1]

Thank you for pointing out this work—we will certainly include a citation in the revised version. From our understanding, the paper introduces an SMC-based approach similar to other related methods we have cited (e.g., Wu et al., 2024; Dou and Song, 2024). However, it appears to follow a more single-shot sampling strategy. While the restart sampler component may share a similar spirit, Our main contribution—an iterative refinement procedure tailored for reward optimization, supported by both theoretical and empirical evidence—differs substantially in both methodology and intent. We will make this distinction clearer in the final version.

Limited Impact Due to Model Choice (EvoDiff is less popular)

To the best of our knowledge, EvoDiff is widely recognized as a representative discrete diffusion model in the protein design domain, as noted in recent reviews (e.g., Winnifrith et al.). While other pre-trained diffusion models, such as DPLM and ESM-3, are also potential candidates, incorporating them into our framework would be relatively straightforward. We are be happy to include additional results if the reviewer has specific protein diffusion models in mind.

Winnifrith, Adam, Carlos Outeiral, and Brian L. Hie. "Generative artificial intelligence for de novo protein design." Current Opinion in Structural Biology 86 (2024): 102794.

To further address the reviewers’ concerns, we conducted additional experiments on image generation tasks using a different discrete diffusion model, implemented on top of the MaskGiT codebase. Here, the setup closely resembles that of EvoDiff. We set the duplication number to L=20, and in each iteration, we remask a 10% square region of the entire image. Experiments are conducted across 32 image categories. As shown in Figure 6 (linked above), compressibility consistently improves over iterations, demonstrating the practical effectiveness of RERD. One point of clarification: the compressibility here appears higher than that observed with Stable Diffusion earlier. This is primarily because MaskGiT operates in compressed sequence spaces, making it easier to optimize.

These results indicate that our method performs robustly across various pre-trained models. We will include more comprehensive quantitative results in the final version, and of course, we would be happy to provide further clarifications during the rebuttal process.