Dear ACs/SACs/PCs,

We would like to summarize the strengths of this work acknowledged by the reviewers, and our responses to address all the reviewers’ concerns.

The reviewers highlighted the following strengths of our paper:

Novelty: novel refine step, introduce new elements (Reviewer , )
Well-organized: sound and well-presented (Reviewer ), well-written and easy to follow (Reviewer ), well-chosen and complete experiments (Reviewer )
Strong Performance: simple yet effective (Reviewer ), strong for multi-hop QA (Reviewer ), over 20% improvements in hard multi-hop tasks (Reviewer , )

The common concerns raised by the reviewers were:

The uniqueness of our method vs. prior work (Reviewer , , )
Scalability on models with 3B parameters (Reviewer , )
More explorations over proposed retrieval reward (Reviewer , )
Inclusion of baseline about evidence compression (Reviewer , )

Our responses to these concerns are summarized below:

We have elaborated on the unique challenges and benefits of our work. Additionally, we've included a summary table detailing the technical advancements of our method compared to previous research.
We have extended our method to 7B-level models, demonstrating consistent performance gains at different model scales.
We've conducted new experiments with six additional retrieval reward setups. Furthermore, we've included a thorough analysis of how reward designs influence model behavior across different situations (single-hop/multi-hop/complex answers).
We have introduced two additional baseline methods (one of them under three different settings) that utilize evidence compression techniques in RAG settings.

All major concerns have been successfully addressed, and we are grateful for the positive feedback from the reviewers. We hope that the improvements made will be taken into consideration.

We sincerely appreciate your valuable time and patience!

Thanks and regards,

Authors