Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing
We develop a policy for adaptive exploration on a graph under frontier constraint that is optimal for trees based on Gittins index, and show how we can apply it to network-based testing in public health settings
摘要
评审与讨论
The authors propose the adaptive frontier graph exploration problem, whereby one wants to explode the nodes of a graph in which each node has a label, each label a reward. The goal is to maximize total expected (discounted) reward. The formulation essentially combines a Markov decision process with dynamic graph exploration.
They develop a Gittins-index approach, proving that their algorithm is optimal in the case of trees, and showing experimentally that it performs well in more complicated, real-world graphs based on disease networks. They also prove several new results concerning the computational tractability of the Gittins-index in their setting.
优缺点分析
Strengths
- Good results, both on synthetic data and real-world data
- Well motivated problem setting. The authors did a good job (modulo one complaint below) of tying the formalism back to a real world setting.
- New results on the computational tractability of the Gittins-index.
- Well written
Weaknesses
- The mathematical details are a bit hard to follow for a non-expert in the area. The Gittins formulation in particular is not intuitive, nor do the authors spend that much time providing intuition. But I understand that space constraints make this challenging.
- I find the generality of the result a bit hard to parse (see below). It seems that they solve a problem that may be more general than adaptive graph exploration. (This is a minor weakness; could even be a strength if addressed.)
- I'd like to see the experimental results stretched a bit more - when does their algorithm start being worse than its competitors? Surely there are some scenarios in which it is not the best.
问题
-
Does the agent know the structure of the graph a priori? There are flavors of the graph exploration problem where the graph structure has to be discovered by exploring.
-
I find the formulation of a bit confusing. You seem to be using to denote both the node itself and the node’s label (otherwise how can we have ?). But the reward function treats as the node itself, not the label. (Eg in the public health example at the bottom of page 2 you treat as an individual). I’m not sure you should necessarily change anything, but if you agree with this, it might be worth commenting on.
-
Is it realistic to assume that n is known in advance? In the public health example, it seems you might start exploring and only decide to stop based on what you find, or external factors such as budget etc. But it doesn’t seem likely that you’d commit to only testing n = 10,000 and, regardless of the outcomes, stop after that.
-
You say: “Our work addresses this gap by presenting the first efficient implementation of Gittins-based policies in discrete branching bandits with history-dependent rewards, enabling their use in structured settings like network-based disease testing.” (Pg 4). How general are the structured settings that you’re alluding to? If very general, one wonders why you’re focused on adaptive graph exploration the first place, and didn’t write the paper about the more general setting of discrete branching bandits. A sentence or two about this would be helpful for readers like me, who aren’t extremely familiar with this area.
-
In Figure 4, it’s interesting that greedy outperforms DQN in ~half the experiments. This seems counterintuitive? Also, if you increase the number of extra edges, does the Gittins performance ever fall below greedy and/or DQN, or do they simply converge?
Minor comments:
- Researchers typo in line 200.
局限性
Yes
最终评判理由
Maintaining my original score.
格式问题
None
Thank you for your time and effort for reviewing our work, and for your thoughtful feedback. We appreciate the opportunity to address your concerns in this rebuttal and we will add a version of the discussion below in our revision.
Gittins index computation example
We agree that the mathematical details of Gittins index computation can be difficult to follow, especially without prior familiarity. To provide better intuition, we will add the following worked example (with illustration) to the appendix of our revision. (Note: We're sorry that the "align" and "cases" are not rendering well on OpenReview...)
Consider a graph G on 7 nodes with edges \mathbf{E} = \bigg\\{ \\{X_0, X_1\\}, \\{X_0, X_2\\}, \\{X_0, X_3\\}, \\{X_2, X_4\\}, \\{X_2, X_5\\}, \\{X_3, X_6\\} \bigg\\}. Let be the root node in this graph.
Consider the following simple joint binary distribution which assigns higher probabilities to realizations where adjacent nodes have the same status:
where is the indicator whether the statuses of adjacent nodes and agree. One can verify that we have and for any non-root node with parent .
Now, suppose the discount factor and we have a reward equal to the binary label for each node. Then, the largest reward and .
$ \phi_{X_4, 0}(m) &= \max \left\\{ m, \mathcal{P}(X_4 = 0 \mid X_2 = 0) \cdot \left[ 0 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] + \mathcal{P}(X_4 = 1 \mid X_2 = 0) \cdot \left[ 1 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] \right\\}\\ &= \max \left\\{ m, \left( 1 - \frac{1}{1+e} \right) \cdot \beta \cdot m + \frac{1}{1+e} \cdot \left[ 1 + \beta \cdot m \right] \right\\}\\ &= \max \left\\{ m, \beta m + \frac{1}{1+e} \right\\} $Our Gittins computation begins from the leaves. For instance, let us consider the leaf node . When parent node , one can verify that
That is, is the following piecewise linear function:
$ \phi_{X_4, 1}(m) &= \max \left\\{ m, \mathcal{P}(X_4 = 0 \mid X_2 = 1) \cdot \left[ 0 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] + \mathcal{P}(X_4 = 1 \mid X_2 = 1) \cdot \left[ 1 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] \right\\}\\ &= \max \left\\{ m, \left( 1 - \frac{e}{1+e} \right) \cdot \beta \cdot m + \frac{e}{1+e} \cdot \left[ 1 + \beta \cdot m \right] \right\\}\\ &= \max \left\\{ m, \beta m + \frac{e}{1+e} \right\\} $Meanwhile, when parent node , one can verify that
That is, is the following piecewise linear function:
Due to symmetry in , all the leaf nodes have exactly the same functions for .
Now, let us consider the computation of the function , which is required for the computation of . From Equation (2), we know that this involves the product of function derivatives . From above, one can check that this evaluates to the following piecewise constant function:
whose integration from to yields the following piecewise linear function :
Using Proposition 5, we can derive that
$ \Phi_{\{X_4, X_5\}, 0}(m) &= h(m) + 10 - h(10)\\ &= h(m) + \frac{10}{1+e}(1 - \beta^2)\\ &= \begin{cases} \beta^2 m + \frac{10}{1+e}(1 - \beta^2) & \text{if $k \leq \frac{10}{1+e}$}\\ m & \text{if $k \geq \frac{10}{1+e}$} \end{cases} $One can continue this computation up the rooted tree. Using our Gittins computation code, this produces
Stretching the experimental results
Thank you for this suggestion. During the rebuttal period, we conducted additional experiments where policies are evaluated on noisy approximations of the true underlying joint distribution . These experiments shed light on when and how performance degrades due to model mismatch. Due to character limits, we refer you to the rebuttals to other reviewers for full results, and we will include this expanded analysis in our revision.
Does the agent know the graph structure?
You are correct that our optimality guarantees assume full knowledge of the interaction graph . In real-world disease testing, is often revealed incrementally: individuals are tested as they visit clinics, are interviewed by public health staff, and may refer peers via voucher programs. A practical strategy is to let this process run for a fixed period, yielding a partial observation of , after which AFEG can then be applied to this discovered subgraph to guide the allocation of limited testing resources. This staged approach fits naturally into our framework. Extending our method to fully online graph discovery is a compelling direction for future work.
Notation Overload
We appreciate the reviewer's suggestion regarding potential ambiguity in our notation. We follow standard conventions in the graphical models literature, where variables and nodes are often used interchangeably, and the graph structure implicitly defines the factorization of the joint distribution. In this setting, it is common to use a symbol like to refer to both the node and its associated random variable, so that expressions like denote the event that variable takes value . That said, we understand that readers from other communities may find this notation unfamiliar, and so we will include a brief remark early in the paper clarifying this convention to aid accessibility.
Knowing in advance
Yes, we consider the setting where the structure of the interaction graph is first obtained before testing decisions are made. Once is known, the total number of nodes is fixed, and we can plan whom to engage subject to a testing budget. Exploring settings with uncertain or dynamic is an interesting future direction.
Importantly, our method does not require knowing the exact number of tests (i.e., the testing budget) in advance. Note that AFEG policies continually select an untested node given observed outcomes of tested nodes, and as a result could be seen as "anytime policies". That is, these AEFG policies can be executed sequentially and stopped at any point as resources are exhausted. In our experiments, we plot the cumulative performance of each policy as testing progresses, i.e., the total reward attained after testing 10%, 20%, …, up to 100% of the population. To assess performance under a specific budget, one can take a vertical slice of the plot at the desired percentage. For ease of comparison, we highlight the 50% mark with a dotted line in the figures in our submission, but any vertical slice yields a valid comparison. Each policy thus defines a full budget-performance tradeoff curve, and our method is not tuned for any particular testing threshold.
Generality of our result, and its relation with AFEG and branching bandits
Thank you for this observation. AFEG is a structured formulation that reflects real-world network testing constraints, particularly the frontier condition. While AFEG instances are not necessarily trees, one of our key contributions (Section 3.1) is to show that Gittins index policies are optimal for AFEG when the input graph is a forest.
As noted on Lines 134–140, although Gittins index policies are known to be optimal for branching bandits [KO03], no efficient implementation had been proposed previously. We believe our work is the first to provide a polynomial-time, polynomial-space implementation of Gittins indices in discrete branching bandits with history-dependent rewards, enabling their use in structured problems such as network-based disease testing.
By "structured settings", we mean other domains where the problem can be reduced to a branching bandit formulation, e.g., tree-based search under uncertainty or structured sequential diagnosis. We focused on AFEG due to our public health motivating application, but agree it would be worthwhile to explore more general applications in future work.
Figure 4 and Greedy versus DQN
Thank you for highlighting this. We observe that greedy outperforms DQN in some experiments, which can seem counterintuitive. One likely explanation is that DQN requires significant data to train and can overfit or underperform when training data is sparse or environments are not sufficiently varied. In contrast, greedy policies exploit strong local heuristics, which can be quite effective in certain structured networks.
Dear reviewer, I hope our response has clarified your doubts and concerns. Please let us know if there is anything else you would wish to discuss regarding our work.
Dear reviewer, thank you for your time and effort in reviewing our paper. As the author‐discussion period is drawing to a close, please let us know if you have any remaining questions so that we can address them before it ends.
Motivated by network-based disease testing, the paper formalizes Adaptive Frontier Exploration on Graphs (AFEG) as a sequential decision problem on a probabilistic graph. When the graph is a forest, the proposed Gittins index-based policy for AFEG has a provable convergence guarantee. Experiments on both synthetic and real-world graphs show the superiority of the proposed method.
优缺点分析
Strengths
- The problem of network-based disease testing is important. The proposed AFEG model correctly captures realistic locality constraints (e.g., contact-tracing).
- The piecewise-linear analysis of the recursive value functions appears original to me and leads to the first practical algorithm for discrete branching-bandit indices.
Weaknesses
- Assumed access to the joint distribution . I think the is usually learned from data in practice. How sensitive are results to model misspecification? Empirical ablations with learned vs. true would strengthen the work.
- Scalability to very large graphs. is quadratic on . Can sparsity or incremental computation be exploited to reach web-scale graphs?
问题
- The definitions of two potential functions and in Eq. 1 and 2 are confusing me, although there are some attempts of explanation on L176-178. What is the difference between these potentials?
- I am curious why the author only uses a single-step DQN as a deep RL comparison, but omits more recent graph RL or planning methods (e.g., policy-gradient, Monte-Carlo tree search). Why were they not considered?
局限性
- Optimality only holds for forests and relies on exact
- Scalability to a very large graph is limited.
格式问题
no
Thank you for your time and effort for reviewing our work, and for your thoughtful feedback. We appreciate the opportunity to address your concerns in this rebuttal and we will add a version of the discussion below in our revision.
Oracle access to underlying probability distribution
We agree that the assumption of oracle access to may not always hold in practice. In real-world settings, such as modeling the spread of sexually transmitted infections (STIs), it is reasonable to approximate the distribution using a pairwise Markov Random Field (MRF) over the contact network . As you kindly noted, we outline in Appendix B how the parameters of such a model can be estimated from historical data via maximum pseudo-likelihood estimation (MPLE). In addition, Appendix E.3 contains an analysis of how errors in estimating affect decision quality.
In the original submission, we assumed that (estimated via MPLE) was the ground truth. In response to your comment, we have conducted additional experiments during the rebuttal period to assess robustness when the policy only has access to a noisy version of the true distribution .
We perturbed the learned parameters with noise of magnitude to simulate distributional mismatch, while keeping the evaluation grounded in the original . For reference, in the HIV setting, we have and . Thus, constitutes a significant perturbation.
Since plots and external URLs are disallowed, we summarize our results in tabular form, reporting the expected undiscounted reward over a random subset of approximately 300 nodes in the HIV network (following the format in Appendix D.3). We focus on undiscounted reward as it better reflects the practical goals of maximizing the number of infected individuals detected under a fixed testing budget. For clarity, we bold the best-performing methods for each budget level.
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.4 | 16.0 | 19.4 | 22.3 | 24.6 | 26.6 | 28.2 | 29.8 | 31.6 |
| Greedy | 20.1 | 25.8 | 26.6 | 26.8 | 27.0 | 27.5 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 15.7 | 22.7 | 25.4 | 26.6 | 27.0 | 27.3 | 27.9 | 28.2 | 29.9 | 31.6 |
| Gittins | 20.1 | 28.4 | 29.4 | 30.5 | 30.9 | 31.2 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.5 | 16.0 | 19.5 | 22.3 | 24.8 | 26.6 | 28.3 | 29.8 | 31.6 |
| Greedy | 19.2 | 25.7 | 26.4 | 26.8 | 27.0 | 27.6 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 8.1 | 14.3 | 22.4 | 24.3 | 25.6 | 26.4 | 27.9 | 29.0 | 30.0 | 31.6 |
| Gittins | 19.1 | 25.8 | 28.6 | 30.3 | 30.8 | 31.1 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 5.8 | 11.4 | 13.7 | 19.8 | 22.5 | 24.7 | 26.4 | 28.1 | 29.7 | 31.6 |
| Greedy | 16.6 | 22.3 | 23.8 | 24.8 | 26.7 | 27.6 | 30.1 | 30.8 | 31.2 | 31.6 |
| DQN | 3.4 | 9.9 | 11.6 | 14.7 | 17.1 | 21.3 | 23.8 | 27.1 | 28.9 | 31.6 |
| Gittins | 16.5 | 22.2 | 23.8 | 24.8 | 26.1 | 28.6 | 29.7 | 30.6 | 31.5 | 31.6 |
From this preliminary set of experiments, we see that our proposed method (Gittins) remains competitive while the performance of all policies degrade as becomes less representative of due to larger values. In our revision, we plan to conduct a broader suite of such experiments across all the disease networks and report them in the appendix.
Scalability and runtime
We appreciate the concern regarding scalability. While the worst-case runtime of our Gittins index computation is , this is tractable in our target applications. In our datasets, (binary test outcome), and graph sizes are typically in public health studies (see [1,2,3]). As shown in Figure 5 and Appendix D.4, our method runs efficiently in practice and is the fastest among all considered baselines on larger graphs. Additionally, as noted on Lines 134–140 of our paper, we believe this is the first work to present an efficient method for computing Gittins indices in discrete branching bandits with history-dependent rewards.
[1] Peter Bearman, James Moody, Katherine Stovel. Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of Sociology, 2004.
[2] Joel Wertheim, Sergei Kosakovsky Pond, Lisa Forgione, Sanjay Mehta, Ben Murrell, Sharmila Shah, Davey Smith, Konrad Scheffler, Lucia Torian. Social and Genetic Networks of HIV-1 Transmission in New York City. PLoS Pathogens, 2017.
[3] A M Young, A B Jonas, U L Mullins, D S Halgin, J R Havens. Network structure and the risk for HIV transmission among rural drug users. AIDS and Behavior, 2013.
Clarification of and
Thank you for raising this point. We agree that the distinction between and could be clearer. To clarify:
- represents the expected value of the subtree rooted at node , given that its parent has label .
- captures the expected value of the subtree excluding itself. That is, the contributions from the children of , conditioned on having label .
We will make this distinction more explicit in our revision and improve the explanation in Lines 176–178.
Choice of deep RL baseline
We appreciate the question regarding our choice of a single-step DQN as the deep RL baseline. While more recent graph-based RL and planning methods (e.g., policy gradient, Monte Carlo Tree Search) are indeed promising, they are generally data- and compute-intensive.
In our problem application domain of disease testing, the available data is often limited to a few hundred/thousand individuals with one label each. This makes it challenging to effectively train more complex models. Additionally, such methods often require significant computational resources (e.g., GPU clusters), which are typically unavailable to public health agencies.
Given these constraints, we focused on lightweight baselines like greedy and DQN that are both implementable and deployable in practice. As shown in Figure 5 and Appendix D.4, our method achieves strong performance while remaining efficient and practical.
Dear reviewer, I hope our response has clarified your doubts and concerns. Please let us know if there is anything else you would wish to discuss regarding our work.
Dear reviewer, thank you for your time and effort in reviewing our paper. As the author‐discussion period is drawing to a close, please let us know if you have any remaining questions so that we can address them before it ends.
This work considers the problem of sequential decision-making over a graph where each node is associated with a label from a finite set of labels and a label-dependent reward whose distribution is given by a Markov model. The goal is to find the optimal policy that collects the maximum expected discounted reward. The work investigates an action constraint defined here as the adaptive frontier exploration on graphs (AFEG) constraint where the next node to visit in the graph has to be chosen from the set of non-visited neighbours of the already visited nodes. The authors model network based disease testing using this framework motivating such a model in scenarios where tested individuals might be interviewed to trace contact with other individuals so that these individuals can subsequently be tested. The problem of network based disease testing is an important problem in public health and studies have shown better algorithms lead to better outcomes. In light of this, the current work reformulates AFEG as a branching bandit problem and proves optimality of Gittins index when the disease network is a forest. In case of general graphs, the authors use the BFS tree of the graph to run the Gittins index based policy, thus providing a heuristic for the general case. Moreover, the authors show that their algorithm runs in time polynomial in the size of the graph and the number of elements in the label space. Finally, the work demonstrates empirically the performance of Gittins index based policy for synthetic as well as real-world disease network datasets. The experiments show that Gittins index based policy is generally faster than other baselines while giving superior or competitive performance.
优缺点分析
Strengths:
-
The formulation of network based disease testing as an AFEG problem is interesting. Network based disease testing is an important problem in public health and has shown merits in multiple studies. Therefore, this work is an important contribution in the literature of network based disease testing.
-
The computational efficiency results for Gittins index based policy which is generally intractable due to the curse of dimensionality is another novel aspect of the work.
-
The experiments performed are extensive with several real-world disease datasets considered on top of synthetic experiments. In most of the experiments on real data, the Gittins index based policy is either slightly superior or competitive. The experiment set up is also natural and well motivated.
-
Lastly, the writing is very clear and cogent with good organization and discussions making it easy for the reviewer to understand the work. Overall it was a pleasure to read.
Weaknesses:
-
Firstly, Theorem 1 is not very novel. It is a straightforward adaptation of the work [1]. Moreover, the guarantee works for the forest case which though is a good starting point, is theoretically a limiting assumption and may not hold in real world networks. However, to the authors’ defence the Gittins index based policy on the BFS tree of general graphs seems to give good empirical performance. Here however the question is why BFS tree and not other trees? This work might benefit if the authors can give provable guarantees on the global graph when Gittins based policy is run on an appropriately chosen tree of the graph.
-
The work considers a case where the label space is finite. The main computational result (Theorem 6) also banks on this assumption. This may be limiting in scenarios where richer data is available, for example in disease networks, not just discrete labels but also real valued intensity may be available. While general label spaces may be hard to compute, it would be interesting to see some generalization, for example, can it be done for continuous label spaces with an -discretization where the computational efficiency scale as whereas the optimal policy over the discretization may differ by from optimal?
-
The current work relies on access to the graph structure and the distribution . While the assumption on graph structure is still fine, the assumption of access to distribution exactly is a limiting one. Particularly, one of the baselines considered is DQN which relies only on sample access to . Instead of assuming exact oracle for , it would be interesting to see the theory developed for sample access to the distribution. Indeed, a more reinforcement learning style with finite sample convergence guarantees will be useful to the work. This is because currently the work uses existing data to first learn an approximate and then use it for policy. However, this learning step makes certain strong structural assumptions which may be bypassed if an RL style algorithm can be given.
-
Figure 14 seems to show that the DQN policy is much better than the Gittins policy with the later having constant suboptimal gap for all the disease networks considered. This is in contradiction to the experimental results in the remaining paper. Moreover, the motivation for the smaller dataset created by subsampling with threshold is not clear. Indeed, Fig. 14 result on the full dataset seems contradictory to the result on this subsample. Moreover, the authors’ algorithm does not seem to provide much performance benefit over greedy/DQN in the low regime on synthetic dataset. In light of this, it is not clear why the authors do not repeat experiments with low on the real world dataset.
[1] Godfrey Keller, Alison Oldale. Branching bandits: a sequential search process with correlated pay-offs, Journal of Economic Theory, Volume 113, Issue 2, 2003, Pages 302-315.
问题
See weaknesses above. Minor suggestion: please consider changing the notation that indicates both label and the node currently. It would help the reader if this notation was disambiguated.
局限性
Yes
最终评判理由
One of the claimed primary contribution of this work is the theoretical results, which does not appear novel and has been observed in the existing literature. Moreover, one of the theorems (Thm 6) seems to mis-specify the required conditions for the theorem to hold. However, to the authors' credit, the experimental results on real-world datasets seem promising. Therefore, I am inclined towards a borderline reject since although the theoretical parts are not novel, the experimental parts are promising and merits further research.
格式问题
None
Thank you for your time and thoughtful feedback on our submission. We sincerely appreciate your comments and the opportunity to address your concerns. We will incorporate a version of this discussion into our revised manuscript.
Theorem 3 and novelty
We acknowledge that Theorem 3 is a direct consequence of Theorem 1 from [KO03]. We clarified this in the main text (Line 184). Our primary contribution in Section 3.1 is to establish that the connection that Gittins indices for branching bandits provide an optimal solution to AFEG instances when the input graph is a forest. Additionally, while [KO03] proves optimality, it does not provide an efficient computation method. Our contributions go further in two ways:
- Proving new structural properties of the Gittins computation (Lemma 4 and Proposition 5)
- Developing a polynomial-time algorithm with an accompanying implementation to compute Gittins indices (Theorem 6)
BFS tree projection
We selected the BFS tree projection to minimize the depth of the induced tree, preserving proximity between nodes and root. This ensures that options at the frontier are not artificially constrained by long detours due to us removing edges to invoke Gittins.
For example, consider the graph made of a small cycle and a large cycle , where . Suppose is the root. The induced BFS tree will keep the path while another induced tree that removes edges and , resulting in being reachable from the root only via a very long path . It would be interesting to investigate different tree projections for other approaches when there is no frontier exploration constraint.
Extending to continuous label space
We believe discrete labels are already meaningful in many applications, particularly in health domains. For example, positive/negative test results, or coarser categorical risk levels (e.g., low/medium/high), are common in practice.
That said, we agree this is an important and interesting extension. However, even under -discretization, the analysis is nontrivial. This is because the function can have non-continuous derivatives, and aggregates these through products and integrals. This makes bounding policy errors with respect to discretization challenging.
Oracle access to underlying probability distribution
We agree that the assumption of oracle access to may not always hold in practice. In real-world settings, such as modeling the spread of sexually transmitted infections (STIs), it is reasonable to approximate the distribution using a pairwise Markov Random Field (MRF) over the contact network . As you kindly noted, we outline in Appendix B how the parameters of such a model can be estimated from historical data via maximum pseudo-likelihood estimation (MPLE). In addition, Appendix E.3 contains an analysis of how errors in estimating affect decision quality.
In the original submission, we assumed that (estimated via MPLE) was the ground truth. In response to your comment, we have conducted additional experiments during the rebuttal period to assess robustness when the policy only has access to a noisy version of the true distribution .
We perturbed the learned parameters with noise of magnitude to simulate distributional mismatch, while keeping the evaluation grounded in the original . For reference, in the HIV setting, we have and . Thus, constitutes a significant perturbation.
Since plots and external URLs are disallowed, we summarize our results in tabular form, reporting the expected undiscounted reward over a random subset of approximately 300 nodes in the HIV network (following the format in Appendix D.3). We focus on undiscounted reward as it better reflects the practical goals of maximizing the number of infected individuals detected under a fixed testing budget. For clarity, we bold the best-performing methods for each budget level.
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.4 | 16.0 | 19.4 | 22.3 | 24.6 | 26.6 | 28.2 | 29.8 | 31.6 |
| Greedy | 20.1 | 25.8 | 26.6 | 26.8 | 27.0 | 27.5 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 15.7 | 22.7 | 25.4 | 26.6 | 27.0 | 27.3 | 27.9 | 28.2 | 29.9 | 31.6 |
| Gittins | 20.1 | 28.4 | 29.4 | 30.5 | 30.9 | 31.2 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.5 | 16.0 | 19.5 | 22.3 | 24.8 | 26.6 | 28.3 | 29.8 | 31.6 |
| Greedy | 19.2 | 25.7 | 26.4 | 26.8 | 27.0 | 27.6 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 8.1 | 14.3 | 22.4 | 24.3 | 25.6 | 26.4 | 27.9 | 29.0 | 30.0 | 31.6 |
| Gittins | 19.1 | 25.8 | 28.6 | 30.3 | 30.8 | 31.1 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 5.8 | 11.4 | 13.7 | 19.8 | 22.5 | 24.7 | 26.4 | 28.1 | 29.7 | 31.6 |
| Greedy | 16.6 | 22.3 | 23.8 | 24.8 | 26.7 | 27.6 | 30.1 | 30.8 | 31.2 | 31.6 |
| DQN | 3.4 | 9.9 | 11.6 | 14.7 | 17.1 | 21.3 | 23.8 | 27.1 | 28.9 | 31.6 |
| Gittins | 16.5 | 22.2 | 23.8 | 24.8 | 26.1 | 28.6 | 29.7 | 30.6 | 31.5 | 31.6 |
From this preliminary set of experiments, we see that our proposed method (Gittins) remains competitive while the performance of all policies degrade as becomes less representative of due to larger values. In our revision, we plan to conduct a broader suite of such experiments across all the disease networks and report them in the appendix.
Clarification regarding figure 14
Thank you for pointing this out. The figure in question does not show DQN outperforming Gittins. Due to a rendering error, the legend was mislabeled. In this instance, Gittins corresponds to the yellow curve, and DQN results were omitted because each training run exceeded a full day. We will correct this in the final version.
Low on real-world dataset
As noted in Lines 286–289, our goal for real-world scenarios is to identify as many positive cases as possible under a fixed testing budget. The discount factor is simply a tool to emphasize early discoveries; any is valid. However, smaller values heavily penalize late discoveries, which can be counterproductive when detection at any point is valuable.
To illustrate this, we re-ran HIV experiments under and and report the undiscounted performance in both cases. For easier comparison, we bold the largest reward attained at each level of budget available.
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.0 | 15.2 | 18.5 | 21.3 | 23.5 | 25.6 | 28.0 | 30.0 | 31.6 |
| Greedy | 17.7 | 22.4 | 23.0 | 23.3 | 23.4 | 29.2 | 30.1 | 31.1 | 31.2 | 31.6 |
| DQN | 16.9 | 22.2 | 22.9 | 23.2 | 23.7 | 24.2 | 24.8 | 30.6 | 31.2 | 31.6 |
| Gittins | 17.8 | 22.4 | 23.0 | 29.2 | 29.8 | 30.6 | 30.9 | 31.0 | 31.0 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.0 | 11.6 | 15.7 | 19.1 | 21.7 | 23.9 | 25.9 | 28.1 | 30.0 | 31.6 |
| Greedy | 17.7 | 22.4 | 23.0 | 23.3 | 23.4 | 29.2 | 30.1 | 31.1 | 31.2 | 31.6 |
| DQN | 13.9 | 18.3 | 21.3 | 26.2 | 27.7 | 28.5 | 29.8 | 30.2 | 30.8 | 31.6 |
| Gittins | 19.0 | 26.9 | 29.5 | 30.6 | 30.9 | 31.2 | 31.3 | 31.6 | 31.6 | 31.6 |
Gittins maintains strong performance across the board and achieves higher coverage than baselines even when is small. However, note that a small effectively "zeroes out the reward" for an identification many steps in the future, resulting in degradation of policies that optimize long-term performance (such as DQN and Gittins).
Notation Overload
We appreciate the reviewer's suggestion regarding potential ambiguity in our notation. We follow standard conventions in the graphical models literature, where variables and nodes are often used interchangeably, and the graph structure implicitly defines the factorization of the joint distribution. In this setting, it is common to use a symbol like to refer to both the node and its associated random variable, so that expressions like denote the event that variable takes value . That said, we understand that readers from other communities may find this notation unfamiliar, and so we will include a brief remark early in the paper clarifying this convention to aid accessibility.
Dear reviewer, I hope our response has clarified your doubts and concerns. Please let us know if there is anything else you would wish to discuss regarding our work.
Thank you again for your careful reading and thoughtful comments. We sincerely appreciate your continued engagement and the opportunity to further clarify key aspects of our work.
Structural insights
We agree that convexity and monotonicity are important properties already noted in [KO03]. That said, while it may be intuitively clear to an expert that the value function is piecewise linear, piecewise linearity is neither formally stated nor proved in [KO03], nor is it constructively exploited. In contrast, our work explicitly proves piecewise linearity (Lemma 4), characterizes the breakpoints, and leverages this structure to design an efficient recursive algorithm.
This calling out of piecewise linearity is central to our contribution. Convexity, while a useful property, does not in itself lead to tractable computation. Among the many structural properties that could be inferred, our key insight is the recognition that piecewise linearity is the one that enables compact and tractable representations of the value functions in our dynamic program. While we agree that the tree structure supports dynamic programming (DP), DP alone does not yield efficiency unless the value functions themselves can be represented efficiently. Without piecewise linearity, a naive DP approach would require storing and manipulating unbounded or implicit functional forms. It is the combination of tree structure and piecewise linearity that makes tractable computation possible in both time and space.
Point 3
Thank you for recognizing the novelty of the reduction. We view the connection between AFEG and branching bandits as part of a broader effort to bridge real-world problems with principled theoretical models, a style of contribution that has driven impactful work in areas such as Stackelberg security games and restless bandits. We're glad that this style of research is being appreciated and we see our current work as a promising foundation for further research.
Theorem 6
We would like to once again clarify that Theorem 6 is intentionally designed to abstract away the computational cost of computing quantities like . The focus of the theorem is on the recursive cost of computing Gittins indices, under the assumption that these probabilities are available via oracle access. We will revise the theorem statement and supporting discussion to make this clearer.
This oracle assumption is meant to encapsulate the work performed by standard inference procedures (such as belief propagation or the junction tree algorithm) which run in polynomial time on low-treewidth graphs, e.g., in trees. Since our setting assumes is a pairwise Markov Random Field defined over a tree-structured graph, inference is tractable. We are not assuming access to the full joint distribution explicitly, but rather to a compact graphical model from which these quantities can be efficiently derived.
Proposition 5 and theoretical contributions
We appreciate the suggestion to include a reference to Proposition 5 in the pseudocode and will revise the paper accordingly. We also hope that this clarification, together with our discussion thus far, helps convey that our theoretical contributions (previously viewed as marginal) are in fact significantly beyond just invoking [KO03] and are closely tied to the algorithmic and practical advances presented in the paper.
I would like to thank the authors for their clarifications. I believe the contribution of this paper to be primarily experimental, specifically, formulating network-based disease testing as an AFEG (branching bandit) problem and applying Gittins-index based algorithm to this problem for real world datasets. The theoretical aspects of this work are not that novel, therefore, the contribution 1 of the paper bears less weightage to me. I feel that focusing and strengthening contribution 2 and 3 will be more beneficial to the current work. The detailed concerns are as follows.
Theorem 3 and novelty: Lemma 4 is already observed as a remark by KO03 in their proof of theorem 1 (see the paragraph between eqn 2 and 3). Proposition 5 does not seem related to the current paper, and indeed the authors mention it as some additional properties they prove. Therefore, the theoretical contribution of this work is marginal. The only important theoretical contribution I feel this paper makes is Theorem 6. However, once the authors note Lemma 4 (observation in KO03), this follows directly via a dynamic programming argument. Moreover, in the proof of Theorem 6, the authors assume “any conditional probability value can be obtained in constant time via oracle access to P” (line 802-803) whereas in the theorem statement, the assumption is oracle access to joint distribution P. Can the authors lay out how to construct a conditional probability model in constant time out of a joint distribution? Otherwise, the theorem statement is misleading.
Continuous Label Space: Note that is a piecewise linear function with finitely many pieces, so even if its derivative is non-continuous, it is piecewise constant. With the other theoretical results in the paper seeming limited to me, I would like to encourage the authors to investigate this direction as an interesting theory question.
Experimental Results: The experimental results are promising even under perturbed . However, note that this does not fully address my concern, which was regarding an (Gittins-index based) algorithm that works with sample access to the distribution. The current set of experiments presented by the authors in the rebuttal only backs up their theory in Appendix E.3 (Lemma 8), which was never my concern. To reiterate, this work might benefit from an algorithm that can work with samples from the distribution to construct (noisy) Gittins index.
Thank you very much for your thoughtful follow-up and for engaging further with our work. While we value your feedback as it helps sharpen both our understanding and presentation, we would like to respectfully push back on the characterization that our contributions are primarily experimental, and that the theoretical components are of limited novelty.
Our work aims to bridge a practically motivated and messy real-world problem (network-based disease testing) with a principled theoretical model via a novel reduction to branching bandits. This style of contribution, casting complex domains into well-founded mathematical frameworks, has been recognized as foundational across multiple areas of AI and operations research (e.g., casting security operations as Stackelberg games [1,2,3,4] or health resource management as restless bandits [5,6,7]). We see our work in the same spirit, and believe that our reduction, together with the structural insights and implementation advances it enables, is both non-trivial and impactful.
Theoretical Foundations and Novel Reduction
We view our theoretical contributions as comprising the following:
- Efficient computation: While [KO03] proved the optimality of Gittins indices for branching bandits, no efficient implementation has previously been proposed. As noted in Lines 134--137 of our paper:
While Gittins index policies are known to be optimal for branching bandits [KO03], no efficient method has been proposed to compute them in general. Indeed, Gittins indices are underused in practice due to perceived computational intractability in all but simple settings [Sco10, MKLL12, Edw19].
Our work fills this gap by developing a polynomial-time and polynomial-space implementation (Theorem 6), supported by a publicly available codebase and demonstrated on real-world graphs. The absence of an efficient computational method for this Gittins index over the last 20 years reflects the nontrivial nature of this contribution.
- New structural insights: We respectfully disagree with the assertion that Lemma 4 is already fully observed in [KO03]. While that work notes convexity and monotonicity between Equations 2 and 3, it does not state or prove that the value functions are piecewise linear, a property we prove explicitly and rely on for efficient recursion.
Similarly, we would like to clarify a misunderstanding regarding Proposition 5. The reviewer noted that it "does not seem related to the current paper", but it is in fact an essential component of our implementation. The first bullet of Proposition 5 is directly exploited in our recursive Gittins computation (see _build_Phi in gittins_policy.py), and enables a key simplification in computing composed value functions. When we state that these results may be of independent interest, we do not mean they are unrelated to our work. Rather, we mean that the structural properties we prove apply more broadly than just our setting, while still being crucial for our algorithmic design and analysis. We hope this clears up the confusion.
- Structural reduction: We introduce AFEG, a formalization of the network-based testing problem, and prove that tree-structured AFEG instances can be solved optimally via the Gittins index policy for branching bandits. To our knowledge, this connection is new and enables the application of classical bandit theory in a fresh, practically relevant domain. In the same spirit as how Stackelberg's model of competing firms from the 1930s later became foundational to the real-world deployment of Stackelberg Security Games (SSGs) in domains like airport and wildlife protection, we believe that drawing such principled connections between classical models and modern applications is a meaningful contribution in its own right.
Clarification on Theorem 6
Thank you for pointing this out. Theorem 6 aimed to abstract away the cost of computing conditional probabilities of the form , and focuses on bounding the number of recursive computations required for Gittins index evaluation. In our tree setting, the conditioning set contains at most one variable, so the required conditional probabilities can be derived from the joint distribution via marginalization with minimal overhead. We will clarify this in our revision to better align the theorem statement with the assumptions in the proof.
Continuous label space
We agree that extending the framework to continuous label spaces is an interesting direction for future work. Our current focus addresses discrete label settings, and already generalizes beyond the binary case of our real-world motivation of disease testing. Further extensions to continuous domains would be valuable, but are beyond the natural scope of this work and not necessary to demonstrate the strength of our current contributions.
Sample-based access vs. model-based estimation
Thank you for clarifying this point. Developing Gittins-like policies that operate directly from sample-based access is indeed an interesting direction. Our current approach uses a learned model to compute exact indices and achieves strong empirical performance. Exploring sample-based variants is complementary to our framework, but beyond the intended scope of this paper.
Final remarks
We appreciate your interest in directions such as continuous label spaces and sample-based estimation. While we agree that these are intellectually interesting, we respectfully view these as natural next steps, rather than missing components of the current paper. The core contribution of this work --- its problem formulation, structural reduction, efficient algorithm design, and strong empirical performance --- is, in our view, already substantial and meets the bar for publication on its own merit. Requiring these additional components would significantly broaden the scope beyond what is typical for a single conference paper.
References
[1] Avrim Blum, Nika Haghtalab, Ariel D. Procaccia. "Learning optimal commitment to overcome insecurity". NeurIPS, 2014.
[2] Blog post by Ariel D. Procaccia: https://agtb.wordpress.com/2012/06/17/is-game-theory-useful/
[3] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, Ariel D. Procaccia. "Commitment Without Regrets: Online Learning in Stackelberg Security Games". EC, 2015.
[4] A more recent work: Maria-Florina Balcan, Kiriaki Fragkia, and Keegan Harris. "Learning in Structured Stackelberg Games". arXiv, 2025.
[5] Aditya Mate, Jackson A. Killian, Haifeng Xu, Andrew Perrault, Milind Tambe. "Collapsing bandits and their application to public health intervention". NeurIPS, 2020.
[6] Dexun Li, Pradeep Varakantham. "Efficient resource allocation with fairness constraints in restless multi-armed bandits". UAI, 2022.
[7] Archit Sood, Shweta Jain, Sujit Gujar. "Fairness of Exposure in Online Restless Multi-armed Bandits". AAMAS, 2024.
Thanks for your reply and the further explanation.
Regarding new structural insights, and the claim that “While that work notes convexity and monotonicity between Equations 2 and 3, it does not state or prove that the value functions are piecewise linear, a property we prove explicitly and rely on for efficient recursion.” - KO03 already notes: “It is fairly easy to show that as a function of is non-decreasing and convex (convexity following from the fact that we are dealing with the supremum of expressions which are linear in )” From this, it is clear that is piecewise linear in since it is a well-known fact that supremum of piecewise linear functions is convex. Therefore, I am not convinced about the novelty of the structural result. Further, given this piecewise linearity, efficient computation follows from dynamic programming arguments since the paper is dealing with trees where one can start filling the DP table from the leaves (which cannot be done in general graphs because of absence of a leaf node). This further leads me to believe that the computational benefit comes from the tree structure, and not from the piecewise linearity observation which is present in the definition of Gittins index itself and noted in KO03.
Regarding Prop 5, if it is being used in implementation, I suggest it should also be made part of the pseudocode?
Regarding point 3, I agree that the connection is new and merits further research.
Regarding theorem 6: “In our tree setting, the conditioning set contains at most one variable, so the required conditional probabilities can be derived from the joint distribution via marginalization with minimal overhead.” I do not immediately see how the bold part can be computed “with minimal overhead”. For example, if is a joint distribution of variables , and the support size of each variable is , then to obtain the marginal from the joint distribution , one needs computations (so as to compute (observe that the summation runs over things). The tree setting does not help since you still have to obtain the marginal of the leaf variable which will require computations to marginalize out all but the leaf variable, which is exponential time!
Consider the problem of sequential decision making on a graph, where the arms that can be pulled are only be neighbours of previously pulled arms. This is the frontier of exploration. Specifically, this paper addresses the "Adaptive Frontier Exploration on Graphs" (AFEG) problem, a sequential decision-making task relevant to high-impact applications like network-based disease testing.
The core contribution is a novel, efficient, polynomial-time algorithm for computing the Gittins index policy, which is provably optimal for AFEG instances on tree-structured graphs. The authors connect this deep theoretical result to a pressing real-world problem, demonstrating on real-world HIV interaction networks that their method can improve testing efficiency compared to strong baselines, including a DQN (deep Q net) approach. The work is technically deep, rigorously presented, and its claims are backed by strong empirical evidence.
优缺点分析
Strengths
- One of the papers I was excited to read, and I've been reading MAB papers for a long time! Very interesting and practical problem formulation.
- The paper's primary contribution is arguably proving the first poliynomial time algorithm for the provably optimal policy in discrete branching bandits. This is a foundational contribution to the MAB and RL literature, with implications beyond the disease testing/contact tracing problem.
- I was very impressed with the empirical results which indicate that nearly all HIV-positive individuals were identified by testing only half the population in the ICPSR dataset, and for example works better than DQN (deep Q networks). Given decreasing medical and research budgets, this could be an important method to release to real-world use cases.
Weakness
The model assumes an oracle access to joint probability distribution P, which can provide answers about the conditional probability distributions over Bayesian Nets.
Note: But in Appendix B in the paper, they convert the weakness into a strength by actually building this oracle out from a model using maximum pseudo-likelihood estimation (MPLE). This makes the oracle assumption plausible. Yet the authors acknowledge that such a parameter fitting may not recover the exact real world dynamics (Line 631).
问题
- Are real world graphs truly forests (for which the optimality results hold)? In the real world, are people not fully connected to each other?
- Where does this come from (line 701 in appendix) and
- How valid is the assumption on non-decreasing piecewise linear function?
局限性
Please include a more detailed limitations section in your work. Since this paper has potential to be applied to medical domains, it would be prudent to state the limitations in greater detail to potential users.
最终评判理由
maintain score.
格式问题
N/A
Thank you for your time and thoughtful feedback on our submission. We sincerely appreciate your comments and the opportunity to address your concerns. Below, we respond to the main points raised in the review and will incorporate a version of this discussion into our revised manuscript.
Oracle access to underlying probability distribution
We agree that the assumption of oracle access to may not always hold in practice. In real-world settings, such as modeling the spread of sexually transmitted infections (STIs), it is reasonable to approximate the distribution using a pairwise Markov Random Field (MRF) over the contact network . As you kindly noted, we outline in Appendix B how the parameters of such a model can be estimated from historical data via maximum pseudo-likelihood estimation (MPLE). In addition, Appendix E.3 contains an analysis of how errors in estimating affect decision quality.
In the original submission, we assumed that (estimated via MPLE) was the ground truth. In response to your comment, we have conducted additional experiments during the rebuttal period to assess robustness when the policy only has access to a noisy version of the true distribution .
We perturbed the learned parameters with noise of magnitude to simulate distributional mismatch, while keeping the evaluation grounded in the original . For reference, in the HIV setting, we have and . Thus, constitutes a significant perturbation.
Since plots and external URLs are disallowed, we summarize our results in tabular form, reporting the expected undiscounted reward over a random subset of approximately 300 nodes in the HIV network (following the format in Appendix D.3). We focus on undiscounted reward as it better reflects the practical goals of maximizing the number of infected individuals detected under a fixed testing budget. For clarity, we bold the best-performing methods for each budget level.
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.4 | 16.0 | 19.4 | 22.3 | 24.6 | 26.6 | 28.2 | 29.8 | 31.6 |
| Greedy | 20.1 | 25.8 | 26.6 | 26.8 | 27.0 | 27.5 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 15.7 | 22.7 | 25.4 | 26.6 | 27.0 | 27.3 | 27.9 | 28.2 | 29.9 | 31.6 |
| Gittins | 20.1 | 28.4 | 29.4 | 30.5 | 30.9 | 31.2 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 6.1 | 11.5 | 16.0 | 19.5 | 22.3 | 24.8 | 26.6 | 28.3 | 29.8 | 31.6 |
| Greedy | 19.2 | 25.7 | 26.4 | 26.8 | 27.0 | 27.6 | 30.1 | 31.0 | 31.2 | 31.6 |
| DQN | 8.1 | 14.3 | 22.4 | 24.3 | 25.6 | 26.4 | 27.9 | 29.0 | 30.0 | 31.6 |
| Gittins | 19.1 | 25.8 | 28.6 | 30.3 | 30.8 | 31.1 | 31.3 | 31.6 | 31.6 | 31.6 |
Expected undiscounted reward for on HIV network
| Percentage of nodes tested | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|
| Random | 5.8 | 11.4 | 13.7 | 19.8 | 22.5 | 24.7 | 26.4 | 28.1 | 29.7 | 31.6 |
| Greedy | 16.6 | 22.3 | 23.8 | 24.8 | 26.7 | 27.6 | 30.1 | 30.8 | 31.2 | 31.6 |
| DQN | 3.4 | 9.9 | 11.6 | 14.7 | 17.1 | 21.3 | 23.8 | 27.1 | 28.9 | 31.6 |
| Gittins | 16.5 | 22.2 | 23.8 | 24.8 | 26.1 | 28.6 | 29.7 | 30.6 | 31.5 | 31.6 |
From this preliminary set of experiments, we see that our proposed method (Gittins) remains competitive while the performance of all policies degrade as becomes less representative of due to larger values. In our revision, we plan to conduct a broader suite of such experiments across all the disease networks and report them in the appendix.
Structure of real-world graphs
We agree that real-world graphs are not truly forests (see Line 210). However, empirical studies have shown that STI transmission graphs are often sparse and exhibit tree-like structure (e.g., [1,2,3]). While certain high-degree nodes (e.g., sex workers) may exist, the network as a whole is far from fully connected. We refer to Appendix D.2 for an empirical evaluation of how performance degrades as the graph departs from tree structure.
[1] Peter Bearman, James Moody, Katherine Stovel. Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of Sociology, 2004.
[2] Joel Wertheim, Sergei Kosakovsky Pond, Lisa Forgione, Sanjay Mehta, Ben Murrell, Sharmila Shah, Davey Smith, Konrad Scheffler, Lucia Torian. Social and Genetic Networks of HIV-1 Transmission in New York City. PLoS Pathogens, 2017.
[3] A M Young, A B Jonas, U L Mullins, D S Halgin, J R Havens. Network structure and the risk for HIV transmission among rural drug users. AIDS and Behavior, 2013.
and
and are the parameter vectors of the unary and pairwise potentials in the pairwise-MRF model described in Appendix B.1. These parameters correspond to feature maps and from Equations (4) and (5), which consist of monomials (up to second order) of the covariate features. The construction respects symmetry and the maximum entropy principle, ensuring proper dimension alignment between features and parameters.
Non-decreasing piecewise linear function
The non-decreasing piecewise linearity is not an assumption but a provable property of the Gittins index computation, as shown in Lemma 4 (line 192) and Proposition 5 (line 201). The detailed proof appears in Appendix E.1.
Limitations
As mentioned in Section 5, we have provided additional discussion on broader impact and limitations in Appendix C.
Dear reviewer, I hope our response has clarified your doubts and concerns. Please let us know if there is anything else you would wish to discuss regarding our work.
Dear reviewer, thank you for your time and effort in reviewing our paper. As the author‐discussion period is drawing to a close, please let us know if you have any remaining questions so that we can address them before it ends.
Summary: This paper introduces the Adaptive Frontier Exploration on Graphs (AFEG) problem, motivated by network-based disease testing, and develops an efficient Gittins index-based policy for it. The authors prove the policy is optimal for tree-structured graphs and demonstrate its strong empirical performance against baselines on both synthetic and real-world public health data.
Strengths: The paper's main strengths, as noted by all reviewers, are its practical problem formulation, its extensive empirical results on a real-world application, and its delivery of the first practical, polynomial-time algorithm for computing Gittins indices in this important class of branching bandit problems.
Weaknesses: The main weakness, highlighted during a detailed discussion with one reviewer, is that the core mathematical property (piecewise linearity) enabling the algorithm is a straightforward derivation from prior work, raising valid questions about the depth of the theoretical novelty.
Reasons for Decision: The decision to accept this paper is based on the judgment that its significant algorithmic contribution outweighs the concerns about pure theoretical novelty. The authors successfully designed the first efficient and practical algorithm for a problem that has been theoretically understood but computationally unsolved for two decades. This non-trivial leap from theory to a usable method, supported by an application and empirical evidence, constitutes a valid contribution to the field.
Discussion Process: While a high-level debate on the definition of novelty with one reviewer remained unresolved, the discussion supports the view that the paper's main contribution is a non-trivial algorithmic advance. The authors also successfully addressed other concerns, such as the assumption of oracle access to the data distribution. The authors will be expected to revise Theorem 6 in the final version to more precisely state their assumptions about the underlying probabilistic model, as discussed during the review process.