Thank you so much for acknowledging our contribution and the constructive feedback.

Q1. Reference [1]: Krunoslav Pavasovic et al., arXiv 2025.

A1. Thanks for bringing [1] to our attention. We are aware of this work and will include it in our references. As the reviewer kindly noted, this paper was published on arXiv on February 11th, after the ICML submission deadline. [1] shows that while CFG may reduce diversity in low-dimensional settings, it becomes effective in high-dimensional regimes due to a "blessing of dimensionality." The authors identify two phases: an early phase where CFG aids class selection, and a later phase where it has minimal impact. They propose non-linear CFG variants that deactivate in the second phase, improving quality and diversity without extra computational cost.

Q2. Include samples with the golden guidance in Figure 2.

A2. Thanks for the great question. We respectfully clarify that Figure 2(a) already displays the golden/target samples that we aim to generate, and Figure 2(d) presents the golden guidance . To the best of our understanding, these are all the golden cases available. We are more than happy to include any missing visualizations to address this concern.

Q3. Extra results on diffusion models using velocity-parametrization.

A3. Thanks for the constructive feedback. We conduct extra experiments on SD-V2.1, a velocity-parametrized text-to-image diffusion model. Due to time and resource constraints, we perform a grid search over guidance weights and report the best FID and CLIP scores for each method in the table below. A more thorough evaluation (e.g., a full FID-CLIP curve) will be included in the final version if accepted.

	(CLIP, FID) w/ REG	(CLIP, FID) w/o REG
Linear CFG	(31.62, 27.83)	(31.40, 28.94)
Cosine CFG	(31.48, 23.32)	(31.72, 24.54)

Q4. Diagonal Jacobian matrix assumption.

A4. Thanks for the constructive remark. We agree that a diagonal Jacobian is more precise. We originally refer to a diagonally dominant Jacobian for a writing reason. From Eq. (22) to Eq. (21), three approximations are introduced, as detailed in Lines 220–224 on the right side of the paper, with the Jacobian assumption being the third. The first two approximations are essential—(2) is self-explanatory, and (1) is explained in our response to Q5. Given these, the transition from Eq. (22) to Eq. (21) is already approximate, regardless of whether the Jacobian is diagonal or diagonally dominant.

We will revise the text to refer to a diagonal Jacobian where appropriate.

Q5. Apply Vector-Jacobian-Product (VJP) in Eq. (22).

A5. Thanks for the great question. In short, the application of VJP requires that the function be differentiable with respect to . However, in current CFG-like frameworks, is not arbitrary—it is specifically defined as shown in the second line of Eq. (6). This definition renders non-differentiable with respect to its first argument (or very consuming to evaluate). Consequently, VJP is not applicable in this case.

Note that to address it, we use chain rule and approximate with , which can be further simplified via Eq. (7) and (9).

Q6. Runtime and memory cost.

A6. Thanks for the great question. The tables below summarize runtime and peak memory usage of CFG and REG on a single NVIDIA A40 GPU. Runtime is reported using example batch sizes, while memory is measured with batch size 1 to isolate per-image cost. As expected, REG introduces minor overhead due to the extra gradient computation. We will add these tables in the updated paper.

Model	Resol.	Batch Size	CFG/REG Runtime (sec)	Increase (x)
EDM2-S	64	8	25.96 / 42.99	1.66
DiT-XL/2	256	8	59.79 / 94.23	1.58
EDM2-S	512	8	46.14 / 62.87	1.36
EDM2-XXL	512	8	49.21 / 92.60	1.88
SD-V1.4	512	4	32.63 / 39.54	1.21
SD-V2.1	768	4	36.55 / 59.76	1.64
SD-XL	1024	2	47.48 / 74.52	1.57

Model	Resol.	CFG / REG GPU Peak Mem (GB)	Increase (x)
EDM2-S	64	0.87 / 1.49	1.71
DiT-XL/2	256	4.15 / 5.01	1.21
EDM2-S	512	1.19 / 1.81	1.52
EDM2-XXL	512	4.59 / 7.31	1.59
SD-V1.4	512	2.73 / 4.39	1.61
SD-V2.1	768	2.72 / 6.51	2.39
SD-XL	1024	6.91 / 19.49	2.82