Thanks for the valuable comments, they help improve our paper. ***W1: Flawed justification for future gradient.*** **A:** Thanks for pointing out that the simplified description (lines 164-167) may have led to some misunderstanding. We now provided a more comprehensive explanation of the underlying logic (**Approximation error analysis, please see _W2_ below**). - Firstly, the mean value theorem ensures the existence of an optimal point within an interval. - Secondly, as analyzed in lines 508-515 and 519-521, the sampling trajectory of DPMs is not a simple linear relation (if it were a straight line, a larger sampling step size would not decrease the sampling quality), thus deducing that the optimal point would not be at the interval's endpoints. Therefore, sampling using the gradient at the current time point is not optimal. - More importantly, we introduce hyperparameters $k$ and $l$ to approximate the optimal point through searching, as shown in Table 7. Even without precisely pinpointing the optimal point, adjusting these parameters has significantly improved the performance of PFDiff over the baseline. Therefore, our conclusion does not solely rely on theoretical derivation; both the mean value theorem and experimental results collectively support the viewpoint that using the gradient corresponding to future time points results in smaller discretization errors than using current gradients. We have made appropriate modifications to lines 164-167 and Appendix B.2 based on the above discussion. ***W2: Approximation error analysis.*** **A: Great suggestion !** We have added the following error analysis into Appendix B.2: Starting from Eq. (8): $x_{t_i}=x_{t_{i-1}}+\int_{t_{i-1}}^{t_i} s(\epsilon_\theta(x_t,t),x_t,t) \mathrm{d}t.$ We define $s_{\theta }(x_{t}, t):=s(\epsilon_\theta(x_t,t),x_t,t)$, and further analyze the term that may cause errors, $\int_{t_{i-1}}^{t_i} s_{\theta }(x_{t}, t) \mathrm{d}t$. Applying Taylor's expansion at $t=r, r\in [t_{i-1},t_{i}]$, we derive: $\int_{t_{i-1}}^{t_i} s_\theta\left(x_t, t\right) d t=\int_{t_{i-1}}^{t_i}\left[\sum_{n=0}^{\infty} \frac{s_\theta^{(n)}\left(x_{r}, r \right)}{n!}(t-r)^n+R_n(t)\right] dt \approx \frac{1}{(n+1)!} \sum_{n=0}^{\infty} s_\theta^{(n)}\left(x_{r}, r\right)\left[\left(t_i-r\right)^{n+1}-\left(t_{i-1}-r\right)^{n+1}\right].$ Furthermore, we analyze $r=t_{i-1}$ (i.e., the gradient corresponding to the current time point), and $r\in (t_{i-1},t_{i})$ (i.e., the gradient corresponding to the future time point). We compare the absolute values of the coefficients of the higher-order derivative terms corresponding to $r=t_{i-1}$ and $r\in (t_{i-1},t_{i})$, namely $|(t_i-t_{i-1})^n|$ and $| (t_{i}-r)^n -(t_{i-1}-r)^n |$, where $r \in (t_{i-1},t_{i})$, $n \ge 2$ and $ | t_{i}-r |+ | t_{i-1}-r |= | t_i - t_{i-1} |$. 1. When $n$ is even, we can infer $\left | (t_{i}-r)^n -(t_{i-1}-r)^n \right |= |\ | t_{i}-r |^n - |t_{i-1}-r |^n | $. Furthermore, due to $ | t_{i}-r | < | t_i - t_{i-1}|$ and $ | t_{i-1}-r | < | t_i - t_{i-1} |$, we can infer that $ |\ | t_{i}-r |^n - |t_{i-1}-r |^n | <\max(| t_i-r |^n, |t_{i-1}-r|^n)< | (t_i-t_{i-1})^n |$ holds. 2. When $n$ is odd, we can infer $ |(t_{i}-r)^n -(t_{i-1}-r)^n|=| t_{i}-r | ^n + | t_{i-1}-r | ^n$, where $n\ge 3$. Let $a= | t_{i}-r |$, $b= | t_{i-1}-r |$ and $c=| t_i - t_{i-1}|$, we have $a,b,c>0$ and $c>a,b$. Next, using mathematical induction, we prove $a^n + b^n a^3 + b^3$, hold. * When $n=k$ ($k\ge 3$, $k \in \mathbb{N}$), suppose $a\le b$, then $a^k + b^k < c^k$ hold. When $n=k+1$: $a^{k+1} +b^{k+1}=a\cdot a^k+b \cdot b^k\le b\cdot a^k+b \cdot b^k=b \cdot (a^k + b^k)