W1 | Recent works [1,2] have also addressed the training complexity and instabilities in CMs and proposed various related training techniques to improve convergence. However, these works are neither discussed nor compared with.

I believe the presence of [1,2] largely limits the main contribution and that CCM needs to be carefully compared against the techniques from these works, omitting the effect of the GAN loss.

W2 | CCM has been applied only to FM models on CIFAR10 and ImageNet64. It makes comparisons with the original CD, CTM, iCT[1], and ECM[2] difficult. Could the authors evaluate CCM on the corresponding diffusion models with and without GAN loss?

W3 | For T2I generation, the improvements seem negligible. FID, especially at 5K, is an unreliable metric, as noted in [3, 4]. Therefore, I believe the demonstrated FID gains may be unrepresentative or marginal. Could the authors conduct a human preference study or consider adding FID30K and alternative metrics, e.g., CMMD[3], ImageReward[5], and PickScore[6]?

W4 | Inconsistent terminology and notation regarding the student, teacher and target models/outputs. Figure 1 denotes a teacher model that maps to 1, yet in Section 2, the teacher maps to .

L210-211: - teacher output, - student output. L252: - teacher output. Algorithm 1: is a student output and is missing.

W5 | The related work section can be largely extended. I recommend citing and discussing CM-related works [1,2,7], as well as other distillation methods [8,9,10,11,12,13]. A comparison with these works would be highly beneficial.

W6 | The experiment in Figure 7 needs more details.

[1] Song et al. Improved Techniques for Training Consistency Models, 2023

[2] Geng et al. ECT: Consistency Models Made Easy, 2024

[3] Jayasumana et al. Rethinking FID: Towards a Better Evaluation Metric for Image Generation, 2024

[4] Podell et al. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, 2023

[5] Kirstain et al. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation, 2023

[6] Xu et al. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, 2023

[7] Salimans et al., Multistep Distillation of Diffusion Models via Moment Matching, 2024

[8] Berthelot et al. TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation, 2023

[9] Luo et al. Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models, 2023

[10] Yin et al. One-step Diffusion with Distribution Matching Distillation 2023

[11] Yin et al. Improved Distribution Matching Distillation for Fast Image Synthesis, 2024

[12] Zhou et al. Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation, 2024

[13] Kim et al. PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher, 2024