We thank reviewer MSYw for their thorough and constructive feedback on our work. We are delighted to see that the reviewer finds our framework to be well-engineered with thoughtful implementation, achieving high empirical performance and state-of-the-art results, while making a valuable and timely contribution with substantial practical impact through our open-sourced resources.

We address their insightful questions in the following section and are happy to follow up during the discussion period for any further inquiries.

Q: Attack efficiency comparison - average number of tokens used

We acknowledge that attack efficiency is important for both attackers and defenders. We chose token count as our efficiency metric because it provides a standardized measure across different models and directly correlates with inference time and API costs when applicable. While we report the target model's token usage in Table 3 (page 5), we did not compare it with ActorAttack (we used their default configuration: 3 actors, 5 queries, attacker LLM temperature = 1, and target LLM temperature = 0). Below, we present a comprehensive efficiency analysis:

Table 1: Token efficiency comparison between X-Teaming and ActorAttack

Target Model	Attacker Average Token Usage		Target Average Token Usage		Target Model Context Window
	X-Teaming (Attacker model: Qwen 2.5-32B)	ActorAttack (Attacker model: GPT-4o)	X-Teaming	ActorAttack
GPT-4o	1,470	1,164	2,649	3,083	128K
Gemini 2.0-Flash	1,884	1,265	5,330	6,483	1M
Claude-3.5-Sonnet	3,328	1,805	2,070	2,238	200K
LLama-3-8B	1,746	1,234	2,765	3,683	8K
LLama-3-70B	1,311	1,188	3,057	3,478	8K
Deepseek-V3	1,237	1,270	4,357	5,082	128K

Our analysis shows that:

For attacker model tokens: With much higher attack success rate, X-Teaming uses slightly more attacker tokens than ActorAttack (except for target model: Deepseek-V3) due to its dynamic plan modification and TextGrad optimization when facing resistance. However, X-Teaming shows an additional advantage by utilizing Qwen-2.5-32B (open-source, no API cost), whereas ActorAttack relies on GPT-4o (closed-source with API cost).
For target model tokens, X-Teaming consistently uses fewer tokens across all models compared to ActorAttack.

Head-to-head comparison - X-Teaming vs ActorAttack under the same budget setup: we conducted an additional experiment with strictly equal budgets: 10 plans/actors, identical token budgets, the same attacker model (Qwen-2.5-32B), and the same target model (GPT-4o). Results show X-Teaming achieves 94.6% ASR compared to ActorAttack's 75.7% ASR - an 18.9% performance advantage under identical resource constraints. We will highlight this equal-budget comparison in the Attack Efficiency paragraph (Result Section 3.2, Attack Efficiency paragraph, page 6, line 212).

To summarize, X-Teaming achieves higher attack success rates than prior methods while maintaining reasonable compute consumption, making it a practically useful framework.

Q: Rationale for hyperparameter choices

Thank you for this important question about our hyperparameter selection. The hyperparameters (7 turns, 10 attack plans, 4 TextGrad optimization attempts as maximum limits) were determined through extensive ablation studies and grid search on the HarmBench validation set using Llama-3-8B-Instruct trained on SafeMTData, as mentioned on page 5, line 165.

We kept these hyperparameters consistent across all target models for fair comparison. We did not perform model-specific hyperparameter optimization for several reasons:

Grid search across all models (especially proprietary APIs like GPT-4o, Claude, Gemini, etc.) would incur prohibitive computational costs
Using consistent hyperparameters allows for more direct comparison of model vulnerabilities
Our efficiency analysis (Table 6; Appendix B.2) shows that successful attacks typically use fewer resources than these maximum limits

We agree that this is a limitation and will add to our limitations section that future work could explore model-specific optimization using more efficient search approaches to further improve attack performance while managing computational costs.