PaperHub
7.3
/10
Poster4 位审稿人
最低4最高5标准差0.5
4
5
5
4
3.0
置信度
创新性3.0
质量3.0
清晰度3.3
重要性3.0
NeurIPS 2025

Active Target Discovery under Uninformative Priors: The Power of Permanent and Transient Memory

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29

摘要

关键词
Diffusion ModelActive Target Discovery

评审与讨论

审稿意见
4

This paper introduces EM-PTDM (Expectation Maximized Permanent Temporary Diffusion Memory), a novel framework for Active Target Discovery (ATD) designed specifically to operate effectively in environments with uninformative or unavailable priors. The core problem addressed is that existing methods, such as DiffATD, perform poorly when domain-specific data for learning a strong prior is scarce. Inspired by dual-memory systems in the human brain, the proposed method combines a large, pre-trained diffusion model as a "permanent memory" for generalizable knowledge with a lightweight, adaptive module based on Doob's h-transform that serves as a "transient memory" for rapid, task-specific adaptation. The framework uses an Expectation-Maximization (EM) style algorithm to guarantee monotonic improvement of the prior with new observations. The sampling strategy intelligently balances exploration and exploitation, and the entire approach is demonstrated to significantly outperform baseline methods across challenging domains, including species discovery and cross-view remote sensing.

优缺点分析

Strengths:

  • The conceptual framework is highly original. The inspiration from neuroscience's dual-memory systems provides a compelling narrative and a principled design. The technical implementation is equally novel, combining a pretrained diffusion model with a lightweight Doob's h-transform module to separately model permanent and transient memory. This decomposition of the posterior score is an elegant and powerful idea.

  • The paper provides theoretical guarantees for the monotonic improvement of the prior with each new observation (Propositions 1 and Theorems 1, 2), which in turn leads to more accurate sampling scores (Theorem 4). This theoretical rigor adds significant weight to the contribution.

  • The experiments are thorough and well-designed:

i) Challenging Tasks: The evaluation on both species distribution modeling (uninformative prior from a related domain) and overhead object discovery (cross-view prior from a disjoint domain) convincingly demonstrates the method's robustness and adaptability.

ii) Strong Baselines: The comparison against relevant baselines, including the prior state-of-the-art (DiffATD), clearly shows the superiority of EM-PTDM.

iii) Insightful Ablations: The ablation studies effectively validate the core components of the design, such as the importance of the transient memory h-model (Figure 5) and the scheduler for updating it (Figure 2).

Weaknesses:

  • The paper proposes to tackle a critical and practical limitation of current active discovery methods: the reliance on strong, domain-specific priors. By enabling effective ATD in data-scarce scenarios, this work proposes to significantly broaden the applicability of these techniques to a wide range of real-world problems, such as discovering emerging diseases or rare environmental hazards. To the best of my understanding, this is only partially true. While it avoids needing a prior for the specific task (e.g., you don't need data on the rare "CS" species to find it), the entire framework critically depends on a very powerful, highly informative general-purpose prior in the form of the pretrained diffusion model ("permanent memory"). The experiments confirm this: the permanent memory was trained on MNIST, a distribution of known species, or ImageNet. The success of the method fundamentally hinges on this permanent memory providing a strong starting point. The paper does not explore the true "uninformative prior" scenario: what happens if the permanent memory is also weak, biased, or trained on a completely irrelevant domain? The claim of operating "without any prior domain samples" could be read as more general than the method supports.

     i)  Furthermore, the paper does not discuss the constraints this imposes. In many real-world, data-scarce domains where this method is most needed, a powerful, relevant pre-trained diffusion model might not exist. For example, in discovering novel protein structures or unique geological formations, what serves as the "ImageNet equivalent"? The paper's solution works by shifting the data dependency from the specific task to a general domain, but it doesn't eliminate the dependency. This is a practical limitation that is not addressed or described as a limitation of the work.
    
     ii) Is the rigid separation between the memory systems always optimal? Especially if the task provides many observations that strongly contradict the initial prior. The paper's design is efficient, but it assumes the permanent memory is "correct" and only needs to be guided. It doesn't consider scenarios where the permanent memory itself might need some correction during the discovery process.
    
  • The paper rightly emphasizes that the transient memory module is lightweight, enabling rapid adaptation. However, the overall framework still relies on forward evaluations of a large, pretrained diffusion model. A more detailed discussion of the total computational cost per sampling step (in terms of time or resources) would be beneficial for understanding its practical deployment feasibility.

  • A rather simple approach to the exploration-exploitation phase, which could potentially be improved and more fine-tuned for this specific application.

问题

I thank the authors for their work and clear paper. I am open to changing my ratings depending on the discussion of the following points:

  1. What happens when a weak pre-trained diffusion model is used for the permanent memory? I believe this could be addressed by adding some flexibility in altering this part of memory, in a conservative way.
  2. I believe the statement that this is performing under an Uninformative Prior is misleading and not truly precise since the work highly leverages the existence of a pretrained diffusion model. I would appreciate this being addressed. Since this is essentially the main contribution, I think it’s relevant to justify this better. Indeed, this is not task-specific prior domain data, as described in the contributions, but it would be relevant to study how this truly affects the performance.
  3. Equation (10) has text over it.

局限性

The authors did not address the limitations of this work, which I believe exist. They are described both under Weaknesses and Questions.

最终评判理由

After the nice interactions with the authors, I am happy with the discussion, and keep the score as 4.

格式问题

It seems ok to me.

作者回复

Thank you for your insightful feedback. We are excited to see that you found our conceptual framework highly original and based on strong theoretical guarantees. We appreciate your recognition of our inspiration from neuroscience's dual-memory systems. Thank you for your attention to our comprehensive experiments on challenging tasks, alongside robust baselines and ablations. Next, we clarify all your concerns.

Q1: What happens when a weak pre-trained diffusion model is used for the permanent memory? I believe this could be addressed by adding some flexibility in altering this part of memory, in a conservative way.

A1:

Discussion on Weak prior (Permanent Memory):

Thank you for your insightful questions.

  • When an extremely weak prior is used as the permanent memory, according to Equation (6), adapting to a new domain essentially reduces to learning the task from scratch through the transient memory alone. In this case, as indicated by Equation (7), the hh-model no longer serves as a corrective mechanism; instead, under a partially observable environment, the entire responsibility for modeling the posterior shift falls on this lightweight module. However, this is beyond the capacity of such a lightweight module by design.

  • Furthermore, following your questions, we did few additional experiments under remote sensing scenario, where we deliberately replaced the permanent memory with a diffusion model that could only output noise without any meaningful semantic structure. On top of this setup, we implemented EM-PTDM and observed the following phenomenon: even hh-model was updated under partial observations, its limited capacity made it insufficient to compensate for the lack of a meaningful prior. As a result, all of the unexplored regions of the posterior estimation remained dominated by the weak prior—essentially resembling noise—leading to poor global environment estimation. This supports the claim that an extremely weak or non-semantic prior forces the transient memory to handle an unrealistically large modeling responsibility, which goes beyond its design capability. We appreciate your insightful question and will include qualitative visualizations alongside the quantitative results presented in the following table to provide a thorough analysis of these aspects in the revised draft. Thank you for your thoughtful feedback.

Analysis of Weak Permanent Memory: Performance Comparison on DOTA

MethodB\mathcal{B} = 250B\mathcal{B} = 300
EM-PTDM (Random Noise as Permanent Memory)0.31170.3465
EM-PTDM (ImageNet as Permanent Memory)0.56200.7013

Q2: I believe the statement that this is performing under an Uninformative Prior is misleading and not truly precise since the work highly leverages the existence of a pretrained diffusion model. I would appreciate this being addressed. It would be relevant to study how this truly affects the performance.

A2: When we referred to “uninformative prior,” our intention was to highlight the lack of domain-specific prior rather than the absence of any meaningful semantic information. As we stated in the motivation, our design was inspired by how the human brain handles novel tasks, where long-term memory always exists and contributes, but when facing an unfamiliar scenario, it often performs suboptimally and requires the assistance of a short-term memory module to adapt quickly. Similarly, in our framework, the permanent memory (pretrained diffusion model) provides a general prior, while the transient memory (hh-transform module) rapidly adjusts to task-specific information. However, we fully agree that "Uninformative" is a general term, its precise meaning does vary based on the context. Thus, we will be very careful about the terminology usage.

We have a detailed discussion with supporting experimental results on the effects of using a weak permanent memory in A1. We appreciate your insightful question.

Q3: Equation (10) has text over it.

A3: Thank you for pointing this out. We will certainly fix this in our updated draft.

Q4: The authors did not address the limitations of this work, which I believe exist. They are described both under Weaknesses and Questions.

A4: Thank you for this question. We have observed that when the permanent memory component is extremely weak and lacks semantically meaningful information for certain active target discovery tasks, the performance of EM-PTDM may be impacted. We will include a thorough discussion of these scenarios in the Limitation section of the revised manuscript to provide valuable context and help inform future research directions.

Q5: A more detailed discussion of the total computational cost per sampling step (in terms of time or resources) would be beneficial for understanding its practical deployment feasibility.

A5: This is a great question. We completely agree! Following this suggestion, we have conducted a detailed evaluation of sampling time and computational requirements of EM-PTDM across various search space sizes. We present the results in the following table.

Search SpaceComputation CostSampling Time Requirement (per observation step) (in Seconds)
28 ×\times 280.78GB0.83
128 ×\times 1281.51GB1.87

Our results show that EM-PTDM remains efficient even as the search space scales, with sampling time per observation step ranging from 0.83 to 1.87 seconds, which is well within practical limits for most downstream applications. This further reinforces EM-PTDM’s scalability and real-world applicability. We will include these additional details in our updated draft.

Q6: A rather simple approach to the exploration-exploitation phase, which could potentially be improved and more fine-tuned for this specific application.

A6: Thank you for this question. Our sole objective here is to provide a general framework for Exploration-Exploitation phase. Nevertheless, we agree that it can be fine-tuned further according to the requirements of downstream specific application.

评论

I would like to thank the authors for their thoughtful rebuttal. They have engaged with the review constructively and have addressed most of the major concerns raised. The new experiments, particularly the analysis using a random noise model as the permanent memory and the detailed evaluation of computational costs, were insightful and directly answered the core questions about the framework's dependencies and practical feasibility.

I see significant value and originality in this work. My recommendation is contingent on the authors incorporating the promised changes into the final version of the manuscript. Specifically, two points are critical:

A Dedicated Limitations Section: As discussed, the paper must include a dedicated "Limitations" section. This section should transparently discuss the framework's reliance on a semantically meaningful pre-trained diffusion model (the "permanent memory"). The results from the new "weak prior" experiment should be included here to clearly articulate the boundary conditions under which the method operates effectively.

Clarification of "Uninformative Prior": The terminology surrounding the "uninformative prior" needs to be carefully revised throughout the manuscript. The authors have correctly acknowledged that their use of the term refers to the absence of domain-specific prior data, not a true lack of any informative prior. This distinction is crucial and must be made explicit to accurately represent the paper's contribution and avoid misleading interpretations.

Provided these points are addressed as promised in the rebuttal, the paper will be a strong and valuable contribution to the field.

评论

Dear Reviewer,

We sincerely appreciate your insightful feedback and valuable suggestions. We will absolutely make sure to incorporate your suggestions in our revision. Thank you for helping us strengthen our work.

审稿意见
5

The authors study the problem of active target discovery—identifying regions of interest within a discrete domain X\mathcal{X} from a limited number of labelled observations (xi,yi)i=1N(x_i, y_i)_{i=1}^N, where yy is a binary label and NXN \ll \lvert \mathcal{X} \rvert. This setting is closely related to EM algorithm and Bayesian experimental design, but here the underlying model is a generative model p(x)p(x), specifically a diffusion model. The goal is to align the generative model to produce samples predominantly from the target region, by strategically selecting which data points to observe. To achieve this, the authors decompose the generative model into two components: a permanent memory represented by the pretrained model, and a transient memory represented by a Doob's hh-transformed model. During alignment, only the parameters ϕ\phi of the hh-transform are updated, enhancing sample efficiency. The objective for updating ϕ\phi is designed to guarantee improvement in the model evidence at each step. Similarly, the acquisition function—parameterised by η\eta and used to guide data selection—is also guaranteed to improve with every update. Empirically, the proposed exploration algorithm demonstrates superior sample efficiency compared to existing diffusion-based approaches, such as DiffATD.

优缺点分析

Strengths

  • The writing is very clear. The authors do an excellent job of presenting complex algorithms in a digestible and accessible manner.
  • The connection between neuroscience and statistical modelling is novel and insightful. I hadn’t previously considered a statistical decomposition of prior and update terms as analogous to brain functions.
  • Theoretical contributions are well-motivated. Each major component—inference, model decomposition, and acquisition scoring—is supported by a corresponding theorem ensuring step-wise improvement. While overall convergence is not yet guaranteed, these results already represent substantial contributions.

Weaknesses

  • The acquisition component is relatively unclear. I found it difficult to follow the details of the reward model and its necessity in the overall framework.
  • While the EM algorithm is known to be unstable, the authors propose heuristics—such as selectively updating the model rather than after every observation—that help mitigate this issue.
  • Variational Search Distribution [1] has researched a very similar task and methodology, which should be compared and discussed in my opinion.

Reference [1] ICLR 2025, https://arxiv.org/abs/2409.06142

问题

  1. How stable is the training scheme? I could not find any error bars reported in the results. Given that the EM algorithm is known to be potentially unstable, some quantitative measure of variance would help assess robustness.
  2. What is the role of the reward model? From my understanding, the Gaussian mixture fitted to the posterior represents the (approximated) likelihood of xx conditioned on yy, which could be seen as an approximate reward model for the ATD task. Could you clarify why an explicit reward model is introduced?
  3. What is the difference between this work and Variational Search Distribution [1]? I found this work is quite similar.

局限性

I could not find a dedicated limitations section in the paper, which is typically included in the conclusion. While the authors may have discussed some limitations elsewhere in the text, it would be helpful to clearly summarise them in the conclusion for the reader.

One notable limitation is the lack of a global convergence guarantee, while variational search distribution [1] offers. Since the method relies on the EM algorithm, it inherits well-known issues related to stability, slow convergence, and sensitivity to initialization. These concerns have been highlighted in prior work (e.g., [2]) and should be acknowledged.

Reference [2] "Singularity, Misspecification, and the Convergence Rate of EM." The Annals of Statistics. https://arxiv.org/abs/1810.00828

最终评判理由

The authors adequately addressed my concerns. So I remain my acceptance and increased the confidence from 3 to 4.

格式问题

No major concerns are found.

作者回复

Thank you for your thoughtful and encouraging feedback. We are glad to hear that the clarity of our writing and presentation of complex algorithms resonated well. We appreciate your recognition of the novel and insightful connection between neuroscience and statistical modeling, as well as your positive assessment of our well-motivated theoretical contributions and the rigorous support provided by our theorems. Next, we clarify all your concerns.

Q1: While the EM algorithm is known to be unstable, the authors propose heuristics—such as selectively updating the model rather than after every observation—that help mitigate this issue.

A1: Thank you for highlighting this important point. To address the instability, we introduced selective model update strategy, and provided both qualitative and quantitative evidence demonstrating its effectiveness. We appreciate your attention to this aspect and hope our results clearly illustrate how these strategies mitigate instability in practice.

Q2: How stable is the training scheme? I could not find any error bars reported in the results. Given that the EM algorithm is known to be potentially unstable, some quantitative measure of variance would help assess robustness.

A2:

  • Thank you for raising this important question regarding the stability of our training scheme. We provide a detailed analysis of the training dynamics in Appendix Section M, where we track the L2 distance between the ground-truth and predicted posterior after each update of the hh-model (transient memory). Our observations reveal that this distance decreases exponentially as the number of update steps increases, indicating rapid convergence of the hh-model. We see a similar pattern when comparing the predicted and ground-truth targets, further reinforcing the model’s effective learning.

  • To directly address concerns about stability, we followed your helpful suggestion and conducted additional experiments using 5 different random seeds. For each run, we recorded the training loss of the hh-model—as defined in Equation 7—after every update step. The aggregated results, summarized in the table below, demonstrate that the variance in training loss across these independent trials is minimal. We will add a variance plot of training loss across different observation steps in the updated version of our paper. This consistency strongly suggests that our training procedure is stable and robust, exhibiting reliable convergence behavior regardless of initialization. We appreciate your insightful feedback, which helped us strengthen this aspect of our evaluation and provide a clearer understanding of the robustness of our approach.

Variance of Training Scheme:

Observation Step = 150Observation Step = 300
0.0197±\pm0.00030.0425±\pm0.0011

Q3: I found it difficult to follow the details of the reward model and its necessity in the overall framework. What is the role of the reward model? Could you clarify why an explicit reward model is introduced?

A3: Thank you for your insightful question about the role and necessity of the reward model in our framework. The explicit reward model is crucial to guiding active target discovery within our approach. Our sampling strategy is based on a composite score that balances exploration and exploitation, as defined in Equation 11. The exploitation component (as defined in Equation 10), in particular, represents a reward-weighted expected log-likelihood, where the reward for each unobserved region is estimated using an incrementally trained reward model. This reward model is therefore essential for accurately evaluating and prioritizing candidate regions, enabling our method to efficiently target areas with higher potential for discovery. We provide a concise discussion of these mechanics in Lines 238–258 of the main text, with further implementation details available in the Appendix Section V. For a visual overview, Figure 20 in the Appendix illustrates how the reward model integrates into the overall EM-PTDM framework and enhances its effectiveness. We appreciate this opportunity to clarify the importance of the reward model, and we hope that these references help shed light on its central role in our proposed approach.

Q4: What is the difference between this work and Variational Search Distribution [1]? I found this work is quite similar.

A4: Thank you for bringing up the Variational Search Distribution (VSD) paper—it's a valuable and relevant comparison. While VSD and our work share some surface similarities in addressing efficient search strategies, there are several fundamental distinctions that set ATD apart:

  • Data Assumptions: VSD operates under the assumption that supervised label data from the target domain is available, including examples from the target class. In contrast, ATD is specifically designed for scenarios where no supervised labels are accessible prior to deployment, making it suitable for real-world settings with severe data scarcity.
  • Domain Priors: The VSD framework leverages explicit domain priors, whereas ATD is tailored for situations where such prior information may be sparse, incomplete, or entirely unavailable.
  • Data Types and Problem Focus: VSD primarily focuses on discrete and semi-discrete design spaces. ATD, on the other hand, is devised to handle continuous, spatially structured, and partially observed environments—such as those found in medical imaging or ecological field discovery—which pose unique challenges not addressed by VSD.
  • Search Setup: VSD conducts search over large-scale, fully accessible databases or pools of candidates. In contrast, ATD targets the discovery of target-rich regions within a single, partially observed instance—requiring strategic querying and reasoning about unobserved areas within a sample.

We will include the suggested reference in the related work section.

Q5: I could not find a dedicated limitations section in the paper, which is typically included in the conclusion. While the authors may have discussed some limitations elsewhere in the text, it would be helpful to clearly summarise them in the conclusion for the reader.

A5: Thank you for this suggestion. In response to your feedback, we will clearly summarize the limitation and ensure they are explicitly included in the conclusion section of the revised paper.

Q6: Since the method relies on the EM algorithm, it inherits well-known issues related to stability, slow convergence, and sensitivity to initialization. These concerns have been highlighted in prior work (e.g., [2]) and should be acknowledged.

A6: Thank you for pointing this out. We will certainly acknowledge the prior work as you suggested in the updated version of our paper.

评论

Dear Reviewer,

As the discussion period is nearing a close, we kindly wish to confirm if our response has fully addressed your questions. We would be very happy to provide further clarification if needed.

评论

Dear Authors,

Thank you for your thoughtful rebuttal. I am pleased to see the clarifications and improvements, and I confidently support the acceptance of this paper. Well done!

审稿意见
5

This paper presents a method for diffusion-based active target discovery (ATD), where no prior samples from the target domain are available to learn the diffusion dynamics a priori. Instead, the proposed method uses a combination of a pre-trained diffusion model and a lightweight update mechanism based on Doob's h-transform to modify the prior. Experiments on two problem settings are presented, showing improvements over the main baseline of DiffATD.

优缺点分析

Strengths:

  • The problem setting is motivated well, and the writing/figures are clear.
  • The results on the two problem settings show a clear improvement over a reasonable choice of baselines.
  • The method is rigorously described.

Weaknesses:

  • The paper currently only briefly describes the underlying DiffATD method. It would be helpful if the authors add a brief section describing that method to better ground the reader.
  • It's unclear to me whether the two problem settings shown are representative (not a big weakness; but my confidence would be higher with more evaluation settings shown)

问题

  1. I'm still struggling a bit to understand the terminology of updating the "prior". Does this not violate Bayesian principles of encoding beliefs before seeing the data? Also, given that we're updating the prior based on very few data points, do you think it is possible that this approach risks overfitting to the presented observations, something that starting with an uninformed prior may not succumb to?
  2. What is the standard deviation one should expect with the experiments shown for each method? Are the current experiments reporting single runs or repeats?

局限性

No. To provide a better understanding of capability, it would have been useful if the authors showed scenarios in which their method does not, in fact, perform very well.

最终评判理由

I believe the paper addresses an important problem setting and presents a novel method that shows statistically significant improvements over baselines. The authors have adequately provided an explanation to my question about the prior. They also address a few exposition-related concerns promising the addition of certain writing in the final version (DiffATD explanation and Limitations).

格式问题

None.

作者回复

Thank you for your positive and encouraging remarks. We appreciate your recognition of the well-motivated problem setting, the clarity of our writing and figures, the rigorous description of our method, and the clear performance improvements demonstrated across diverse problem settings compared to strong baselines. Below, we clarify all your concerns.

Q1: The paper currently only briefly describes the underlying DiffATD method. It would be helpful if the authors add a brief section describing that method to better ground the reader.

A1: Thank you very much for your valuable suggestion. We agree that including a brief section outlining the underlying DiffATD method will greatly help readers gain a clearer and more intuitive understanding of the concepts. Following your feedback, we will add a concise and accessible overview of DiffATD in the revised manuscript to better ground the reader and enhance the overall clarity of the paper.

Q2: It's unclear to me whether the two problem settings shown are representative (not a big weakness; but my confidence would be higher with more evaluation settings shown)

A2: Thank you for this question. The problem settings we chose are grounded in real-world applications where active target discovery is critical, such as remote sensing and species discovery. While these scenarios highlight the strengths of EM-PTDM, we believe EM-PTDM establishes a foundational framework for active target discovery and, given its generality, holds significant promise for broader applicability across diverse domains beyond those discussed in this work, including active drug discovery and novel disease detection. We are enthusiastic about exploring these other domains in our future work.

Q3: I'm still struggling a bit to understand the terminology of updating the "prior". Does this not violate Bayesian principles of encoding beliefs before seeing the data? Do you think it is possible that this approach risks overfitting to the presented observations?

A3: Thank you for raising this important question. We distinguish between two types of memory in our approach: the permanent memory, which acts as a true Bayesian prior, and the transient memory, which serves as an adaptive correction mechanism. The permanent memory remains fixed during the active target discovery process and is pre-trained on a dataset from a different domain, thereby encoding general, domain-agnostic knowledge before seeing any new observations from the domain of interest. This ensures adherence to Bayesian principles.

In contrast, the transient memory is updated periodically with newly observed data to capture task-specific context and align the model dynamically with the current environment. Importantly, this transient memory is re-initialized at the start of each new active target discovery task, preventing carryover of task-specific biases and mitigating the risk of overfitting to limited observations. By separating stable prior knowledge from adaptive task-specific updates, our approach balances principled Bayesian inference with the flexibility required for effective active discovery in dynamic, data-scarce settings. We hope this clarifies how our update strategy respects Bayesian principles while enhancing adaptability and robustness.

Q4: What is the standard deviation one should expect with the experiments shown for each method? Are the current experiments reporting single runs or repeats?

A4: This is a great question. We present the detailed standard deviation values across different settings in Appendix Section X, offering transparency regarding the variability of each method’s performance. We would like to emphasize that we report all the results in our paper as averages over 5 independent experimental trials to ensure the reliability of our findings. We hope this additional information helps clarify the robustness of our experiments.

Q5: To provide a better understanding of capability, it would have been useful if the authors showed scenarios in which their method does not, in fact, perform very well.

A5: Thank you for this insightful suggestion. In our experiments, we observed that when the permanent memory component is extremely weak and doesn't contain any semantically meaningful information for certain active target discovery tasks, the performance of EM-PTDM can be affected. We will include a detailed discussion of these scenarios in the Limitations section of the revised manuscript, which we believe will offer valuable context and guide future research directions. Thank you again for encouraging this important reflection.

评论

Thank you for your clear responses! I believe all my comments/questions have been addressed well. I'm raising my score from 4 to 5, in the expectation that the authors will add the changes (DiffATD explanation and Limitations) they've stated in their responses to the final version of the paper. Please also consider adding the std. to the main table in the paper, instead of Appendix X.

评论

Dear Reviewer,

Thank you for your feedback and valuable suggestions. We will certainly incorporate your suggestions in our revision. Thank you for helping us strengthen our work.

审稿意见
4

This paper proposes a principled framework EM-PTDM to tackle Active Target Discovery (ATD) in scenarios where no informative prior exists—a situation often faced in domains like rare species detection or novel object discovery. The method consists of two components: (1) Permanent Memory: a pretrained diffusion model capturing general knowledge; (2) Transient Memory: a lightweight, adaptive h-transform module that quickly adjusts to task-specific signals. The method uses an EM-style iterative refinement and principled sampling strategies balancing exploration and exploitation. Experimental results on species distribution and remote sensing show notable improvements over baselines, even under severely uninformative priors.

优缺点分析

Pros:

  • The paper clearly identifies a relevant gap: active search without any informative prior, which is highly practical in many scientific fields.
  • Drawing parallels to permanent and transient memory systems in the brain is creative and helps frame the proposed method intuitively.
  • The EM update of the prior and the integration of the h-transform for conditional sampling are well-motivated and mathematically rigorous.
  • The method shows clear empirical gains over baselines, especially in low-data regimes.
  • The paper provides code and uses public datasets, which facilitates reproducibility.

Cons:

  • The paper is quite dense and could be hard to follow for readers without a strong ML background. Adding simpler diagrams or pseudocode could help.
  • The authors mention instability if the h-model is updated too often early on. It’d be helpful to see how sensitive the results are to different update frequencies.
  • While the method beats strong baselines, many of those still rely on prior knowledge. It would be good to test simpler baselines too—for example, training on random samples and then re-ranking spots based on how similar they are to places where targets were found, without retraining the diffusion model.

问题

See Cons.

局限性

N/A

最终评判理由

The authors addressed my concerns. However, as I am not deeply familiar with this field, I would prefer to maintain my borderline accept score and defend my assessment.

格式问题

N/A

作者回复

Thank you for your thoughtful and encouraging feedback. We appreciate your recognition of our clear problem framing, the creative analogy to brain memory systems, the rigorous methodological development, the strong empirical results in low-data settings, and our commitment to reproducibility through code and public datasets. Next, we address all your concerns.

Q1: The paper is quite dense and could be hard to follow for readers without a strong ML background. Adding simpler diagrams or pseudocode could help.

A1: Thank you for your helpful suggestion. To address accessibility, we had included several explanations in the Appendix, including simpler diagrams (see Figure 20 in the Appendix Section W) and detailed pseudocode (see Algorithm 1 and 2 in the Appendix Section I). Following your feedback, we will move some of these explanations back into the main paper to help readers follow the content more easily.

Q2: The authors mention instability if the h-model is updated too often early on. It’d be helpful to see how sensitive the results are to different update frequencies.

A2: Thank you for this insightful observation. We agree that updating the hh-model too frequently during the early discovery phase can lead to instability. To investigate this, we conducted a controlled experiment comparing posterior sample generation when updating the hh-model after every observation versus after several observation steps. Our results show that frequent updates produce noisier posterior samples, and we provide a detailed discussion of this phenomenon in Section 3 (see lines 184–208, and Figure 2 for a visualization of this fact). Building on these findings, we further analyze the sensitivity of the results to different update frequencies in the Appendix Section R. Additionally, we propose an adaptive update strategy, described in Section H.1 of the Appendix, which dynamically adjusts the update schedule. Empirically, this adaptive approach outperforms fixed-interval update schedules, resulting in improved discovery performance (See Table 8). These results highlight the important role that update frequency plays in the stability and effectiveness of the hh-model, which in turn impact the discovery performance.

Q3: It would be good to test simpler baselines too—for example, training on random samples and then re-ranking spots based on how similar they are to places where targets were found, without retraining the diffusion model.

A3: Thank you for your suggestion. Following your feedback, we compare the performance of EM-PTDM with the suggested approach. We report our findings in the following Table. The experimental outcomes further reinforce the impact of EM-PTDM, and further justify the importance of both Permanent and Transient Memory.

ATD Performance Comparison Using DOTA

MethodB\mathcal{B} = 250B\mathcal{B} = 300B\mathcal{B} = 350
Frozen Diffusion with Random Prior0.26300.29790.3608
EM-PTDM0.56200.70130.8256
评论

Dear Reviewer,

Thank you for acknowledging our response. We truly hope it has resolved all of your concerns.

最终决定

This paper presents a novel and principled framework, EM-PTDM, for Active Target Discovery without domain-specific priors, combining a pretrained diffusion model as permanent memory with a lightweight transient memory module. The dual-memory design, inspired by neuroscience, is both original and well-motivated, and the method is supported by strong theoretical guarantees and compelling empirical results across challenging tasks, significantly outperforming prior baselines such as DiffATD. While concerns were raised about clarity, and reliance on a strong pretrained model, the authors addressed these constructively through additional experiments, terminology clarification, and discussion of limitations.

The novelty and rigor of this contribution outweigh its weaknesses. Before publication, the authors are encouraged to further strengthen accessibility and by clearly highlighting the practical limitations regarding dependence on pretrained diffusion models.