Thank you for your time and effort for reviewing our work, and for your thoughtful feedback. We appreciate the opportunity to address your concerns in this rebuttal and we will add a version of the discussion below in our revision.

Gittins index computation example

We agree that the mathematical details of Gittins index computation can be difficult to follow, especially without prior familiarity. To provide better intuition, we will add the following worked example (with illustration) to the appendix of our revision. (Note: We're sorry that the "align" and "cases" are not rendering well on OpenReview...)

Consider a graph G on 7 nodes with edges . Let be the root node in this graph.

Consider the following simple joint binary distribution which assigns higher probabilities to realizations where adjacent nodes have the same status:

where is the indicator whether the statuses of adjacent nodes and agree. One can verify that we have and for any non-root node with parent .

Now, suppose the discount factor and we have a reward equal to the binary label for each node. Then, the largest reward and .

Our Gittins computation begins from the leaves. For instance, let us consider the leaf node . When parent node , one can verify that

Missing \left or extra \right\begin{align} \phi_{X_4, 0}(m) &= \max \left\\{ m, \mathcal{P}(X_4 = 0 \mid X_2 = 0) \cdot \left[ 0 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] + \mathcal{P}(X_4 = 1 \mid X_2 = 0) \cdot \left[ 1 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] \right\\}\\ &= \max \left\\{ m, \left( 1 - \frac{1}{1+e} \right) \cdot \beta \cdot m + \frac{1}{1+e} \cdot \left[ 1 + \beta \cdot m \right] \right\\}\\ &= \max \left\\{ m, \beta m + \frac{1}{1+e} \right\\} \end{align}

That is, is the following piecewise linear function:

Meanwhile, when parent node , one can verify that

Missing \left or extra \right\begin{align} \phi_{X_4, 1}(m) &= \max \left\\{ m, \mathcal{P}(X_4 = 0 \mid X_2 = 1) \cdot \left[ 0 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] + \mathcal{P}(X_4 = 1 \mid X_2 = 1) \cdot \left[ 1 + \beta \cdot \Phi_{\emptyset, 0}(m) \right] \right\\}\\ &= \max \left\\{ m, \left( 1 - \frac{e}{1+e} \right) \cdot \beta \cdot m + \frac{e}{1+e} \cdot \left[ 1 + \beta \cdot m \right] \right\\}\\ &= \max \left\\{ m, \beta m + \frac{e}{1+e} \right\\} \end{align}

That is, is the following piecewise linear function:

Due to symmetry in , all the leaf nodes have exactly the same functions for .

Now, let us consider the computation of the function , which is required for the computation of . From Equation (2), we know that this involves the product of function derivatives . From above, one can check that this evaluates to the following piecewise constant function:

whose integration from to yields the following piecewise linear function :

Using Proposition 5, we can derive that

One can continue this computation up the rooted tree. Using our Gittins computation code, this produces

Stretching the experimental results

Thank you for this suggestion. During the rebuttal period, we conducted additional experiments where policies are evaluated on noisy approximations of the true underlying joint distribution . These experiments shed light on when and how performance degrades due to model mismatch. Due to character limits, we refer you to the rebuttals to other reviewers for full results, and we will include this expanded analysis in our revision.

Does the agent know the graph structure?

You are correct that our optimality guarantees assume full knowledge of the interaction graph . In real-world disease testing, is often revealed incrementally: individuals are tested as they visit clinics, are interviewed by public health staff, and may refer peers via voucher programs. A practical strategy is to let this process run for a fixed period, yielding a partial observation of , after which AFEG can then be applied to this discovered subgraph to guide the allocation of limited testing resources. This staged approach fits naturally into our framework. Extending our method to fully online graph discovery is a compelling direction for future work.

Notation Overload

We appreciate the reviewer's suggestion regarding potential ambiguity in our notation. We follow standard conventions in the graphical models literature, where variables and nodes are often used interchangeably, and the graph structure implicitly defines the factorization of the joint distribution. In this setting, it is common to use a symbol like to refer to both the node and its associated random variable, so that expressions like denote the event that variable takes value . That said, we understand that readers from other communities may find this notation unfamiliar, and so we will include a brief remark early in the paper clarifying this convention to aid accessibility.

Knowing in advance

Yes, we consider the setting where the structure of the interaction graph is first obtained before testing decisions are made. Once is known, the total number of nodes is fixed, and we can plan whom to engage subject to a testing budget. Exploring settings with uncertain or dynamic is an interesting future direction.

Importantly, our method does not require knowing the exact number of tests (i.e., the testing budget) in advance. Note that AFEG policies continually select an untested node given observed outcomes of tested nodes, and as a result could be seen as "anytime policies". That is, these AEFG policies can be executed sequentially and stopped at any point as resources are exhausted. In our experiments, we plot the cumulative performance of each policy as testing progresses, i.e., the total reward attained after testing 10%, 20%, …, up to 100% of the population. To assess performance under a specific budget, one can take a vertical slice of the plot at the desired percentage. For ease of comparison, we highlight the 50% mark with a dotted line in the figures in our submission, but any vertical slice yields a valid comparison. Each policy thus defines a full budget-performance tradeoff curve, and our method is not tuned for any particular testing threshold.

Generality of our result, and its relation with AFEG and branching bandits

Thank you for this observation. AFEG is a structured formulation that reflects real-world network testing constraints, particularly the frontier condition. While AFEG instances are not necessarily trees, one of our key contributions (Section 3.1) is to show that Gittins index policies are optimal for AFEG when the input graph is a forest.

As noted on Lines 134–140, although Gittins index policies are known to be optimal for branching bandits [KO03], no efficient implementation had been proposed previously. We believe our work is the first to provide a polynomial-time, polynomial-space implementation of Gittins indices in discrete branching bandits with history-dependent rewards, enabling their use in structured problems such as network-based disease testing.

By "structured settings", we mean other domains where the problem can be reduced to a branching bandit formulation, e.g., tree-based search under uncertainty or structured sequential diagnosis. We focused on AFEG due to our public health motivating application, but agree it would be worthwhile to explore more general applications in future work.

Figure 4 and Greedy versus DQN

Thank you for highlighting this. We observe that greedy outperforms DQN in some experiments, which can seem counterintuitive. One likely explanation is that DQN requires significant data to train and can overfit or underperform when training data is sparse or environments are not sufficiently varied. In contrast, greedy policies exploit strong local heuristics, which can be quite effective in certain structured networks.