We highly appreciate this reviewer’s constructive feedback and insightful suggestions. We would like to clarify and address all of these points to the best of our ability in the response below.

W1: Lack of ablation studies about time scheduling of state flow model

Thank you for the valuable suggestion. We conducted an ablation study to assess the effect of time scheduling in state flow training, comparing three settings: partial (partial overlap of synthon denoising), no overlap (strictly autoregressive), and till end (all synthons denoised until ).

We compared the average local-optimized Vina docking scores across different training iterations for the ALDH1 target below:

# of mol explored	10,000	20,000	30,000
no overlap
partial
till end

CGFlow’s overlapping noise scheduling, where positions are refined as synthons are added, clearly outperforms conventional autoregressive approaches (no overlap).

W2 & W3: Lack of comprehensive evaluation (e.g., CrossDocked2020) and SBDD baselines.

Following your suggestions, we evaluated CGFlow on CrossDocked2020 against established SBDD baselines. Using the same conditional objective and proxy setup as TacoGFN and RxnFlow, we generated 100 molecules per pocket in a zero-shot manner without an additional optimizing process for test targets. We varied the reward exponentiation parameter β (Low: U(1,64), Medium: U(32,64), High: U(48,64)) to balance exploitation and exploration for sampling.

	Validity(↑)	Vina(↓)	QED(↑)	AiZyn. Succ Rate(↑)	Div(↑)	Time(↓)
Reference	-	-7.71	0.48	36.1	-	-
FLAG	99.7	-7.07	0.49	21.9	0.82	1047
DecompDiff	66.0	-8.35	0.37	0.9	0.84	6189
MolCRAFT	96.7	-8.05	0.50	16.5	0.84	141
MolCRAFT-large	70.8	-9.25	0.45	3.9	0.82	>141
TacoGFN	100.0	-8.24	0.67	1.3	0.67	4
RxnFlow	100.0	-8.85	0.67	34.8	0.81	4
CGFlow (low β)	100.0	-9.00	0.72	55.0	0.79	24
CGFlow (med β)	100.0	-9.16	0.73	56.6	0.76	24
CGFlow (high β)	100.0	-9.38	0.74	62.2	0.66	24

CGFlow reduces Vina from -8.85 (RxnFlow) to -9.38 (CGFlow-high beta), outperforming all baselines. It also yields the highest QED scores (0.72–0.74) and highest AiZynthFinder success rate (62.2%) compared to all baselines, underscoring the practical benefits of synthesis-aware generation. CGFlow shows consistent synthesis success rate across both CrossDock (55.0%–62.2%) and LIT-PCBA (53.1%) benchmarks.

W4. Concerns about reward hacking by generating larger molecules.

To address this concern, we conducted additional experiments on the first five targets, restricting heavy atom count (HAC) to 40. CGFlow still outperforms RxnFlow in Vina score (-10.94 vs -10.46) with comparable HAC (29.63 vs 29.37). Moreover, CGFlow achieves the highest ligand efficiency (0.375) - computed by Vina / HAC, confirming that our binding affinity gains stem from the 3D co-design strategy rather than molecule size.

	Vina (↓)	Ligand Efficiency (↑)	Avg Heavy atom count
SynFlownet	-8.644	0.335	26.44
RGFN	-9.085	0.329	28.02
RxnFlow	-10.457	0.362	29.37
CGFlow (rebuttal)	-10.940	0.375	29.63

W2 / W4: Measurement of training efficiency / The vina improvement over RxnFlow is marginal

We kindly refer the reviewer to our response to Reviewer ScwC (Weakness 1) for experimental results on training efficiency - where we show CGFlow discovers 4.7× more diverse modes than RxnFlow. We note that optimization of docking scores is restricted by the saturation of the pocket's binding interactions. At that point, discovering more diverse binding modes becomes more important to maximize the success rate of practical applications.

W4. Degeneration in Success Rate (synthesizability) and synthesize steps

The small drop in AiZynthFinder synthesizability arises from our transition from reaction-based generation (RxnFlow) to a synthon-based (brick-and-linker) approach. Reaction-based generation often halts prematurely if a state molecule lacks any reactive functional groups, while the synthon-based method can easily construct molecules with longer synthetic trajectories. We emphasize that our approach use the building block and synthesis reactions from Enamine REAL and xREAL, known for a wet-lab synthetic success rate of 80%.

W2/Q1. Lack of evaluation in geometrical properties and questions regarding generated poses.

We evaluated various geometrical properties and Vina score of the generated poses of the top 100 molecules of local-optimized Vina optimization across 3 seeds.

Metric	Validity	Med. Energy	Med. Strain Energy	Score	Minimize	Dock	Redock RMSD<1Å	Redock RMSD<2Å
Value							%	%