Probabilistic Factorial Experimental Design for Combinatorial Interventions
摘要
评审与讨论
This paper studies the combinatorial intervention problem. The authors propose a probabilistic factorial experimental design, where each unit independently receives a random combination of treatments according to specified dosages. They derive a closed-form solution for the near-optimal design in the passive setting and a numerically optimizable solution for the near-optimal design in the active setting. Simulation results are provided to validate their findings.
给作者的问题
According to Figures 1 and 2, appears to be exactly optimal, rather than merely near-optimal. Do the authors conjecture that this is indeed optimal for general ? Similarly, in the general constrained case, is the uniform dosage of conjectured to be optimal?
论据与证据
They are generally well-supported to me. The simulation results clearly align with the theories.
方法与评估标准
The simulations are carefully designed to validate the theoretical results. For example, the stated near-optimality of setting the dosage to 1/2 for each treatment in the passive setting is clearly demonstrated in Figures 1 and 2. Additionally, the simulations for the active setting illustrate the effectiveness of adaptively choosing the dosages in accordance with the proposed theory.
理论论述
The proofs appear sound to me.
实验设计与分析
The experimental designs are generally sound. However, the paper could be strengthened by including a sensitivity analysis that varies .
补充材料
I reviewed most of them, with a particular focus on Section B.
与现有文献的关系
The proposed probabilistic factorial design includes both full and fractional factorial designs in the literature as special cases. It serves as a flexible realization of a factorial design.
遗漏的重要参考文献
I don't see any obvious missing references.
其他优缺点
Strengths: The paper is generally well written and smooth to follow. Then extensions are enlightening. Weaknesses: Simulation results on real-world dataset are missing.
其他意见或建议
The x-axis in Figure 2 seems to be the dosage value rather than .
We thank the reviewer for finding our extensions enlightening. We find the reviewer's suggestions insightful and accordingly lay out additional experiments and their results.
Simulation results on real-world dataset are missing.
We thank the reviewer for sharing this concern. Our paper is concerned with how an experimenter could optimally construct a dataset through choice of dosage, and therefore, experiments with real-world data necessitate close collaboration with individuals engaged in experimentation. However, we could have simulated the performance of various dosages using a real-world dataset, if this dataset included samples with all combinations. For example, for each combination generated by a given dosage, we could draw a corresponding point from the dataset to construct a new dataset consistent with the dosage. Unfortunately, we are not aware of such complete datasets. We plan to collaborate with biological experimenters in the future to create datasets based on our results.
While existing real-world datasets are not feasible for experimentation, we added a semi-synthetic simulation in the following way: we use a real-world Boolean function with , which is a reliability function originally presented in Quality and Reliability Engineering International of full-degree [1]. We create datasets based on this function, where each sample corresponds to a combination (according to the dosage) and the corresponding value of the function, with noise added. We conduct experiments using this function, results of which can be found in tabular format below. We note that is relatively low, but we were unable to find a completely-defined Boolean function of higher dimension.
However, the paper could be strengthened by including a sensitivity analysis that varies k.
We thank the reviewer for this suggestion. We have accordingly conducted the following experiment: we use a real-world Boolean function with and of full degree (described above). We replicate the first experiment in our paper, where we investigate the effect of . We investigate values of ranging from through . We display the average loss () over dosages at each distance (where we perform trials with each dosage), to be displayed in graphical format in our paper. Here we use samples.
Even when the model is misspecified, we see that the half dosage appears optimal and observe the loss increase as we move further away from the half dosage.
The x-axis in Figure 2 seems to be the dosage value rather than
We thank the reviewer for catching this, which we will fix accordingly in the paper.
According to Figures 1 and 2, appears to be exactly optimal, rather than merely near-optimal. Do the authors conjecture that this is indeed optimal for general k? Similarly, in the general constrained case, is the uniform dosage of conjectured to be optimal?
Based on our experiments and the general heuristic that in linear regression, one would like features which are "spread out," we conjecture that the half dosage is optimal and that the uniform dosage in the constrained case is also optimal for general . It is difficult to prove exact optimality, as we must compare the quantity across different dosages. While we were able to show that the inner quantity concentrates as the number of samples grows, it is not clear to us how to compute the mean for a fixed number of samples.
Reference: Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2017.
This paper introduces probabilistic factorial experimental design for combinatorial interventions, where each treatment is assigned a dosage between 0 and 1, and units randomly receive treatments based on these probabilities. This framework generalizes both full and fractional factorial designs by allowing random assignment of treatment combinations rather than deterministic selection. The authors model outcomes using Boolean functions with Fourier expansions to capture bounded-order interactions. They prove that uniform half-dosage allocation () is near-optimal in single-round experiments, with optimality up to a factor of . For multi-round experiments (i.e., active learning setting), they develop an acquisition strategy that adapts dosages based on previous observations. The work also addresses practical constraints like limited treatment supply. Experiments results on simulated datasets demonstrate that the proposed strategies outperform random dosage selection.
给作者的问题
please see Other Strengths And Weaknesses.
论据与证据
The paper's claims are generally well-supported by theoretical analysis and empirical evidence.
方法与评估标准
The methods and evaluation criteria in this paper are appropriate for the problem of optimal experimental design for combinatorial interventions. My only concern is that all the evaluations are conducted on synthetic data. It would be great if the authors could conduct some experiments on semi-synthetic or real-world datasets.
理论论述
I checked the proof of Theorem 4.2, which establishes the near-optimality of half-dosage allocation. The proof appears sound, using concentration inequalities and eigenvalue properties to bound the estimation error.
实验设计与分析
I checked the experimental designs in Section 6, both the passive setting and active setting simulations. The authors appropriately test the theoretical claims by comparing estimation errors across different dosage strategies. I have some minor concerns:
-
The simulations use synthetic data generated from the same model class assumed in the theory, but this might not reflect the robustness of the proposed framework to model misspecification.
-
In the active setting, the authors could include more existing active learning methods as baselines beyond random and half-dosage strategies for a more comprehensive evaluation.
补充材料
I checked the supplementary material, particularly Section B.
与现有文献的关系
This paper extends classical factorial design literature by introducing a probabilistic framework that addresses scalability issues in traditional full and fractional factorial designs. The active learning component relates to Bayesian experimental design and sequential experimental design, though with acquisition functions specific to the probabilistic factorial framework. The work also complements recent advances in causal inference for combinatorial interventions and provides theoretical foundations for experimental practices used in biological perturbation experiments.
遗漏的重要参考文献
All the essential related works are discussed.
其他优缺点
Strengths: The theoretical analysis in this paper is rigorous and well-organized, making it easy to understand the theoretical results and their practical implications.
Weaknesses:
-
Please see my comments in the previous parts.
-
The computational complexity of the active learning approach is not thoroughly discussed.
-
The empirical results suggest that the optimal acquisition strategy only outperforms the half strategy slightly in the active setting. This raises questions about whether the half strategy might be preferable in practice since it requires no computation or learning procedure. A more thorough discussion of this trade-off between computational complexity and performance gain would strengthen the paper.
其他意见或建议
Please see Other Strengths And Weaknesses.
We thank the reviewer for appreciating our theoretical analysis, as well as for their many valuable suggestions. Below, we address the reviewer's concerns and lay out modifications we will make according to the reviewer's suggestions.
My only concern is that all the evaluations are conducted on synthetic data. It would be great if the authors could conduct some experiments on semi-synthetic or real-world datasets.
Due to the character limit, we refer the reviewer to our response to Reviewer jf24's first point under "Simulation results on real-world dataset are missing".
The simulations use synthetic data generated from the same model class assumed in the theory, but this might not reflect the robustness of the proposed framework to model misspecification.
We thank the reviewer for pointing this out. While Boolean functions are universal approximators, our low-degree assumption can cause misspecification, as recognized by the reviewer. In many applications the low-degree assumptions holds, especially in biology, but degree misspecification may still exist. To address the reviewer's concern, we have conducted an additional experiment where the model is misspecified. Please see the response to Reviewer jf24, under the comment about "sensitivity analysis." Here, we use a real-world full-degree Boolean function (), and fit assuming lesser values of .
In the active setting, the authors could include more existing active learning methods as baselines beyond random and half-dosage strategies for a more comprehensive evaluation.
We appreciate the reviewer's suggestion. For the comparison with passive baselines, we have added an additional baseline based on partial factorial design. Results are shown in the table below. For the comparison with active strategies, since multiple combinatorial interventions are drawn in each round (administered by a selection of dosage), we are not aware of existing methods that can be easily adapted to this setting. However, we would be happy to include additional baselines if the reviewer has specific suggestions.
Here we compare a Resolution fractional design versus our optimal strategy and half dosages. Each round has samples, with and .
| Round 1 | Round 2 | Round 3 | Round 4 | Round 5 | |
|---|---|---|---|---|---|
| Optimal dosage | |||||
| Half dosage | |||||
| Fractional factorial design |
Bolded entries show the lowest loss among each round, where we see that our optimal dosage strategy outperforms the other strategies after the first round.
The computational complexity of the active learning approach is not thoroughly discussed.
The number of iterations for the optimizer to converge is roughly , and the complexity of each iteration is (where the first term comes from the matrix multiplication of and the second term comes from computing the eigenvalues of ). Recall the definition of to be the number of interactions under consideration, i.e. for small . Therefore, the overall complexity is for small . In practice, we may recommend using a proxy, which only involves the inverse of the minimum eigenvalue: . We found that numerically optimizing this was significantly faster and that the solver was consistently accurate. While the complexity computed above should be the same for this approach, in practice it takes many less iterations to converge.
The empirical results suggest ... A more thorough discussion of this trade-off between computational complexity and performance gain would strengthen the paper.
We thank the reviewer for this suggestion and will include a discussion of this trade-off in our paper. In the case where there are not many samples (compared to features) per round, we find that the optimal acquisition strategy more clearly outperforms the half strategy. This is because when we have a smaller number of samples, we will need to "correct" as the distribution of combinations will be more lopsided and further away from the uniform distribution. Therefore, in scenarios where each round has few samples, we think it is worth computing the optimal acquisition dosage. When we have a large relative to , the half strategy and optimal strategy perform very similarly. While the computational complexity of finding the optimal strategy can quickly scale, in practice it only takes a matter of seconds to compute.
Reference: Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2017.
This paper is concerned with the problem of experimental design in the high dimensional factorial setting where users may be administered combinations of treatments, and the aim is to administer a subset of treatments such that all combinations are recovered. The authors frame this problem in terms of the Fourier transform of boolean functions and assuming that the treatment status can be relaxed to probabilities of treatment. After this transformation the authors use tools from optimal experimental design for the selection mechanism. Extensions are provided to subsets and heteroskedastic settings. Empirical results show strong performance.
给作者的问题
I am curious how this approach (specifically the active learning setting) interacts with adjustment using user covariates. Does this change the design considerations?
论据与证据
All theoretical claims made are well supported by theory provided in the paper.
方法与评估标准
Yes, the method is quite sensible (and interesting), evaluation criterion is appropriate.
理论论述
Yes, I reviewed all proofs and they are sound to my reading.
实验设计与分析
Yes. The experiments are sound, though I would have like to seen a more complete comparison to partial factorial experiments.
补充材料
Yes, I reviewed all supplementary material.
与现有文献的关系
This paper addresses an interesting and highly relevant problem of factorial experimental design. While the problem itself dates back to Fisher, the authors provide a nice contribution to the literature.
遗漏的重要参考文献
The authors should have a broader literature review of the partial factorial design literature.
其他优缺点
Overall, I think this paper is a creative approach to the problem of design of factorial experiments. My main complaint, as I mention above, is that the experimental evaluation here is severely limited.
其他意见或建议
N/A
伦理审查问题
N/A
We thank the reviewer for appreciating our method and the thoughtful suggestions. We would like to address the concerns and questions of the reviewer as below.
The authors should have a broader literature review of the partial factorial design literature.
We thank the reviewer for this suggestion. We will add the following paragraph to Section 2 to expand our discussion of the partial factorial design literature. In addition, we are happy to include any specific references the reviewer believes would further strengthen our coverage of related work.
"A fractional design is one where samples are used, each with a different combination [1]. These combinations are carefully selected to minimize aliasing. Aliasing occurs when, for the combinations selected, the interactions are linearly dependent [2][3]. In a full factorial design, there is linear independence so there is no confounding when the model is fit. In a fractional design, some aliasing will always occur in a full-degree model; however, methods proposed in literature select combinations such that the aliasing of important effects (i.e. degree-1 terms) does not occur [2]. With a low-degree assumption, aliasing can be avoided entirely. Fractional designs can be classified by their resolution (denoted by ), which determines which interactions can be potentially confounded. For example, a Resolution V fractional design eliminates any confounding between lower than degree-3 interactions, appropriate for degree-2 functions [4]. Of particular interest in literature are minimum aberration designs, which minimize the number of degree- terms aliased with degree- terms [5][6]."
I would have like to seen a more complete comparison to partial factorial experiments.
We thank the reviewer for this suggestion. We have conducted an additional experiment, where we compare the half dosage versus a partial factorial design in the passive setting. Here, we generate a degree- Boolean function with . We use a Resolution design with samples for each approach. Results are shown below, averaged over trials and with std.
| Fractional design | Half dosage |
|---|---|
With fewer samples, the careful selection of combinations will make a difference, so the fractional design can outperform the half dosage. But in many cases, especially in biological applications, careful selection of combinations is not possible which is why the much more flexible dosage design is preferable, as it enables the administration of an exponential number of combinations by choosing a linear number of dosages.
However, in the active setting, the optimal dosage can outperform a fractional design. Please see the experiment in response to Reviewer CWrH, under "... the authors could include more existing active learning methods as baselines".
I am curious how this approach (specifically the active learning setting) interacts with adjustment using user covariates.
We thank the reviewer for this question. We could assume the following setup in the passive setting: there are users with known covariates , each of which receives the combinations determined by the dosage (so that we have a total of samples). Assuming the covariates have a linear relationship with the outcome, i.e. , then the optimal dosage in the passive setting is where , with and is as defined in the paper. We conjecture that is still the half dosage. To extend to the active setting, the same objective is used as in the paper except is replaced with . We are happy to consider alternative models of user covariates if the reviewer has any specific suggestions.
References:
Box, George EP, William H. Hunter, and Stuart Hunter. Statistics for experimenters. Vol. 664. New York: John Wiley and sons, 1978.
Gunst, Richard F., and Robert L. Mason. "Fractional factorial design." Wiley Interdisciplinary Reviews: Computational Statistics 1.2 (2009): 234-244.
Mukerjee, Rahul, and CF Jeff Wu. A modern theory of factorial design. Springer Science & Business Media, 2007.
Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2017.
Fries, Arthur, and William G. Hunter. "Minimum aberration designs." Technometrics 22.4 (1980): 601-608.
Cheng, Ching-Shui. Theory of factorial design. Boca Raton, FL, USA: Chapman and Hall/CRC, 2016.
The paper introduces a probabilistic factorial experimental design to address the optimal experimental design problem for combinatorial interventions. The contribution of the paper:
- The paper introduces a probabilistic factorial experimental design for a given choice of dosage vector.
- The paper provides a closed-form solution for the near-optimal design for passive and active settings.
- The authors explore extending the design framework to incorporate constraints and noisy scenarios.
给作者的问题
- The proposed design strategies appear to depend on the choice of dosage (may require prior knowledge from experimenters), which is a subset of the full factorial design. As a result, the outcomes of the proposed approach may be suboptimal. Could the authors elaborate more on this and discuss how to address it?
论据与证据
Yes, the paper provides the theoretical proofs and empirical evidence to support the claims.
方法与评估标准
The authors validated the proposed approach using a simulated dataset for both passive and active settings.
理论论述
Yes
实验设计与分析
Yes
补充材料
Yes
与现有文献的关系
Building on previous work, this paper utilizes Boolean functions and Fourier transforms to establish the theoretical foundation of its approach.
遗漏的重要参考文献
No
其他优缺点
Strengths:
- The paper addresses a significant gap in scalability challenges in factorial design with combinatorial interventions.
- The theoretical framework is robust, with clear assumptions and derivations.
Weaknesses:
- The use of Boolean functions and Fourier transforms is not new, as similar approaches have been explored in prior work, such as Agarwal, A., Agarwal, A., and Vijaykumar, S.Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions.
其他意见或建议
No
We thank the reviewer for appreciating our theoretical framework and scalability challenges it addressed. Below, we address the concerns and questions brought up by the reviewer.
The use of Boolean functions and Fourier transforms is not new, as similar approaches have been explored in prior work, such as Agarwal, A., Agarwal, A., and Vijaykumar, S.Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions.
We thank the reviewer for this comment. As noted in section 2 (last paragraph), we used Boolean functions to model combinatorial interventions, as it can easily model the scenario where the outcome is mainly driven by low-order effects in the absence of higher-order interactions -- a common assumption in fractional factorial design. In addition, it captures generalized surface models (see section 3.1). However, the main contribution of the paper lies not in the use of Boolean functions, but in the proposal of the novel probabilistic experimental framework and the accompanying theoretical analysis of the dosage choice (described in detail in section 1), which, to our knowledge, has not been explored in prior work.
The proposed design strategies appear to depend on the choice of dosage (may require prior knowledge from experimenters), which is a subset of the full factorial design. As a result, the outcomes of the proposed approach may be suboptimal. Could the authors elaborate more on this and discuss how to address it?
A full factorial design can be formulated within our framework. In particular, there would be rounds, each with sample. In order to fix the sample, the dosage would be chosen to be deterministic, i.e. . In addition, when the number of samples is large enough for a full factorial design to be implemented, the half dosage is closely related to this design as the half dosage induces a uniform distribution over combinations. Therefore in such cases, the two approaches perform similarly. Per our theoretical results, we suggest the experimenter uses the half dosage, which requires no prior knowledge.
Short summary: This paper proposes a clever probabilistic factorial experimental design to address the optimal experimental design problem for combinatorial interventions. All reviews unanimously vote to accept this paper (either as “weak accept” or “accept”). The reviews acknowledge that the paper addresses a key technical gap and provides a creative solution. The reviews also find the theoretical contributions solid and robust. The paper is also well-written making it easy to understand. One key limitation identified in the reviews is the severely lacking experimental section. The future improvements of the paper and follow up work could center around more exhaustive experiments and real-world datasets and deeper discussion of computational complexity.