/10

Poster4 位审稿人

最低2最高4标准差0.8

ICML 2025

Permutation-Free High-Order Interaction Tests

Zhaolu Liu,Robert Peach,Mauricio Barahona

提交: 2025-01-24更新: 2025-08-16

摘要

Kernel-based hypothesis tests offer a flexible, non-parametric tool to detect high-order interactions in multivariate data, beyond pairwise relationships. Yet the scalability of such tests is limited by the computationally demanding permutation schemes used to generate null approximations. Here we introduce a family of permutation-free high-order tests for joint independence and partial factorisations of $d$ variables. Our tests eliminate the need for permutation-based approximations by leveraging V-statistics and a novel cross-centring technique to yield test statistics with a standard normal limiting distribution under the null. We present implementations of the tests and showcase their efficacy and scalability through synthetic datasets. We also show applications inspired by causal discovery and feature selection, which highlight both the importance of high-order interactions in data and the need for efficient computational methods.

关键词

high-order interaction; kernel method; permutation-free

评审与讨论

审稿意见

评分: 22025-03-11

This paper introduces permutation-free kernel-based tests for detecting high-order interactions in multivariate data. The proposed methods, xdHSIC, xLI, and xSI, leverage V-statistics and cross-centring to achieve a standard normal null distribution, eliminating computationally intensive permutations. Empirical evaluations demonstrate the efficiency and scalability of proposed methods while maintaining comparable statistical power. Applications to causal discovery and feature selection highlight the methods' scalability and utility in real-world scenarios involving complex dependencies.

给作者的问题

N/A

论据与证据

The methods assume i.i.d. samples, which is acknowledged as a limitation but not validated against dependent data (e.g., time series).
The paper claims that the theoretical computational complexity of its permutation-free tests (e.g., xdHSIC, xLI, xSI) scales as O(dn^2), where d is the number of variables and n is the sample size. However, the experiments primarily focus on smaller values of d rather than demonstrating scalability to extremely large d or real-world datasets with thousands of variables.

方法与评估标准

The paper claims that the theoretical computational complexity of its permutation-free tests (e.g., xdHSIC, xLI, xSI) scales as O(dn^2), where d is the number of variables and n is the sample size. However, the experiments primarily focus on smaller values of d rather than demonstrating scalability to extremely large d or real-world datasets with thousands of variables.
From Figure 6, it seems that the permutation-free methods require a larger sample size. The paper doesn't seem to discuss what factors the accuracy of the test methods is related to. It's also unclear whether the permutation-free methods can achieve the same accuracy as the permutation-based methods under reasonable assumptions or conditions. Additionally, since the computational complexity of the permutation-free methods is related to the sample size n, if more samples are needed to meet the accuracy requirements, the computational cost will also increase.

理论论述

The paper introduces novel definitions (e.g., Definition 3.1 for xdHSIC) and hypotheses (e.g., Hypothesis 3.2 for joint independence) to enable permutation-free high-order interaction tests. While the theoretical framework is logically consistent and supported by mathematical derivations (e.g., V-statistics, cross-centring), the justification for these definitions and hypotheses could benefit from additional evidence to enhance their solidity. For example: rigorous validation of assumptions and ablation studies.

实验设计与分析

The paper claims that the theoretical computational complexity of its permutation-free tests (e.g., xdHSIC, xLI, xSI) scales as O(dn^2), where d is the number of variables and n is the sample size. However, the experiments primarily focus on smaller values of d rather than demonstrating scalability to extremely large d or real-world datasets with thousands of variables.

补充材料

Sorry I didn't review the supplementary material.

与现有文献的关系

The paper positions these definitions as extensions of Shekhar et al. (2023).

遗漏的重要参考文献

It would be valuable for the authors to elaborate on the connection between permutation-free tests and random Fourier feature-based methods in the context of high-order interaction detection, given their shared focus on computational efficiency and kernel approximation.

其他优缺点

N/A

其他意见或建议

N/A

作者回复

2025-04-01

We thank the reviewer for their review. Below we discuss the concerns raised by the reviewer.

Re: iid assumption: We have added a discussion of how our method might be applicable to time series in the response to Reviewer vR6R. In summary, our method is still applicable if multiple iid realisations of a times series (stationary or non-stationary) are available. If only one time series is observed, then one might have to carry out some transformations, (e.g. stock returns in the real-world financial example below shown in Figure S3 in link: https://imgur.com/a/uju1pHq) or assume certain mixing conditions.

Re: real and large dataset: We have added a large real financial dataset (504 variables and 1004 samples) to demonstrate the scalability and broad applicability of our proposed method. In summary, we show that in highly-regulated sectors such as Energy and Utilities, there are consistently high 3-, 4-and 5-way interactions due to redundancy. Moreover we observe that companies in Information Technology and Health Care are prone to external factors therefore have low intra-sector interactions. Importantly, these experiments took ~8 hours without any parallelisation on a 2015 iMac. We estimate the time taken for the permutation-based methods to be at least 2 weeks. See further details in response to Reviewer s8B8.

Re: only small $d$ considered, example of thousands of variables: In a system consisting of $M$ (thousands of) variables, high-order behaviours are captured up to order $d << M$ . In general, detecting $d$ -order interactions among a group of $M$ variables is inherently combinatorial and quickly becomes infeasible as $M$ and/or $d$ increase. However, recent studies across various application areas have focused on identifying interactions beyond pairwise ( $d>2$ ), and have shown that even low-order interactions ( $d=3,4$ ) significantly impact network structure and dynamics, see Battiston et al. (2020) and Santoro et al. (2023). These findings underscore the practical benefits of high-order interaction tests, even to moderate $d$ , in revealing new structural and functional properties that would be overlooked by pairwise analyses alone, as our financial example also highlights.

Re: the accuracy and computational complexity: In our experiments, we observe that, when the sample size is low, the permutation-based methods achieve very similar performance, and in some cases slightly more powerful, than permutation-free methods (as noted by the reviewers for the causal discovery example in Fig. 6). We believe this is because in the low sample regime permutation-free methods are not able to capture the full structure of the data due to the data splitting—specifically, the test statistics are computed as the inner product between empirical embeddings estimated from two separate sample halves. Therefore, as a rule of thumb, if both the data available and the order of interest are relatively small, i.e., both $n$ and $d$ are small, then we could consider using the permutation-based methods. In such a case, the trade-off between computation time and statistical power would not be unfavourable. More concretely, when $n\leq 50$ and $d\leq 5$ the permutation-based method could be approximated well enough with just 100 permutations making it both computationally feasible and statistically powerful. However, outside of this regime we would strongly recommend the permutation-free method introduced in this paper.
We note that we do not make additional assumptions compared with Liu et al., (2023b). Both methods rely on iid data.

Re: ablation studies: Thank you for the suggestion. Constructing the test statistics using V-statistics not only enhances computational efficiency but is also essential before applying cross-centring to achieve normality under the null. Gretton et al., (2007) and Liu et al.,(2023b) employ V-statistics to enable traditional centring for these purposes. For the importance of cross-centring, please see the ablation study of the cross-centring technique in response to Reviewer s8B8.

Re: random fourier features: We thank the reviewer for bringing up Random Fourier Features (RFF). RFF is a technique that constructs a low-dimensional feature map such that its inner product is close to the true kernel matrix. As a result, it reduces the computational complexity of computing the kernel matrix from $O(n^2)$ to $O(nm)$ where $m$ is the number of RFFs. This could be advantageous, as it is linear in $n$ , but the hyperparameter $m$ would have to be tuned to make tradeoffs between efficiency and accuracy. For each individual subtest, RFF could reduce the $O(dn^2)$ to $O(dnm)$ . This means that in future work, we can also apply RFF on top of the permutation-free strategy (which has already eliminated the number of permutations $p$ ) in order to achieve better performance.

We will include this discussion in the revised manuscript and pursue it in our future research, with thanks.

审稿意见

评分: 42025-03-13

The paper studies testing for independence across many variables. The Streitberg interaction is a way to test for any factorization of a joint distribution. While it had been kernelized before, now it is kernelized and centered with sample splitting, in a way that avoids permutations.

给作者的问题

N/A

论据与证据

I find the claim to be clearly presented and well supported.

方法与评估标准

Yes.

理论论述

It would be nice to have a formal statement of the limiting distribution under the null. If I understand correctly, the authors describe it verbally and suggest it is immediate from their constructions and previous results, but it would be good to formalize it.

I do not see any technical issues in the results that are currently presented, which seem like straightforward summaries of computational complexity.

实验设计与分析

The experimental designs seem reasonable, and inherited from earlier works in the literature.

补充材料

N/A

与现有文献的关系

The key techniques are from Shekhar et al. (2023) and Liu et al. (2023b)

遗漏的重要参考文献

N/A

其他优缺点

While the result is not groundbreaking, it is clearly presented. The paper also makes a technical literature accessible in a nice way.

The main weakness is that the paper lacks a formal statement of the limiting distribution under the null. I would like the authors to address this point.

其他意见或建议

I think a natural connection could be to testing kernel mean embeddings of counterfactual distributions as formalized in "Kernel methods for causal functions: dose, heterogeneous and incremental response curves".

“kerne” on page 4.

作者回复

2025-04-01

We thank the reviewers for their positive review and recognising our effort of making the technical literature accessible. We have corrected the typo in our revision and address your concerns below.

Re: formal statements of normality: The normality results follow directly from our construction and previous results, hence we had left them out for space considerations and to focus on the new results on computational efficiency of the permutation-free tests used for high-order interactions. But we agree it will be better to have these statements formalised briefly.

Proposition 1 (Null normality of $\overline{\mathrm{x}}\mathrm{dHSIC}$ ) Let $h_I$ be the core function of the V-statistic for $\overline{\mathrm{x}}\mathrm{dHSIC}$ . We assume that $E(h_I^2)<\infty$ , i.e. the second moment of the core function is finite. Then under the null hypothesis in Hypothesis 3.2 (joint independence), $\overline{\mathrm{x}}\mathrm{dHSIC}\sim \mathcal{N}(0,1)$

Proposition 2 (Null normality of $\overline{\mathrm{x}}\mathrm{LI}$ ) Let $h_L$ be the core function of the V-statistic for $\overline{\mathrm{x}}\mathrm{LI}$ . We assume that $E(h_L^2)<\infty$ , i.e. the second moment of the core function is finite. Then under the null hypothesis in Hypothesis 3.6 (Lancaster Factorisation), $\overline{\mathrm{x}}\mathrm{LI}\sim \mathcal{N}(0,1)$

Proposition 3 (Null normality of $\overline{\mathrm{x}}\mathrm{SI}$ ) Let $h_S$ be the core function of the V-statistic for $\overline{\mathrm{x}}\mathrm{SI}$ . We assume that $E(h_S^2)<\infty$ , i.e. the second moment of the core function is finite. Then under the null hypothesis in Hypothesis 3.9 (Complete Factorisation), $\overline{\mathrm{x}}\mathrm{SI}\sim \mathcal{N}(0,1)$

Sketch of the proofs: After sample splitting, the V-statistics are no longer degenerate, and hence follow a standard normal distribution under the null hypotheses. The existence of second moment assumptions are sufficient here and have been used as a standard assumption in kernel-based methods.

Typing long and complex equations in OpenReview is cumbersome, so we have omitted the more technical details here, but we will include full proofs in the revised manuscript.

Re: causal paper linked: We thank the reviewer for suggesting this interesting connection to testing the kernel mean embeddings of the counterfactual distributions. The counterfactual distribution and its kernel mean embeddings formalised in Singh et al. indeed presents us an opportunity to explore the high-order interactions associated with counterfactual outcome in a multivariate regression setting. For instance, it would be interesting to compare the interactions between the covariates and the outcomes before and after the intervention. Additionally, if there are multiple outcomes variables of interest, one can assess the interactions between the outcomes before and after interventions. We will discuss this interesting connection in the revised manuscript.

审稿意见

评分: 42025-03-14

This paper proposes a set of permutation-free high-order interaction tests, addressing the computational inefficiency of existing permutation-based kernel hypothesis tests. It uses cross-centering techniques to derive test statistics that follow a standard normal distribution under the null hypothesis. The authors provide theoretical justification, empirical validation on synthetic data, and comparisons with existing methods. The results demonstrate significant computational advantages over traditional approaches, making high-order interaction testing feasible for larger datasets.

给作者的问题

In Figure 3, why $xdHSIC$ and $\bar{x}SI$ fail to detect the partial factorization, with different interaction strengths? Please explain and give a more intuitive analysis of it.

论据与证据

I did not find any problematic claims.

方法与评估标准

The evaluation criteria are appropriate, with comparisons to state-of-the-art permutation-based methods (e.g., dHSIC, LI, SI) and applications to real-world tasks (e.g., causal discovery, feature selection).

理论论述

The proofs are sound, and the theoretical results are consistent with the empirical observations.

实验设计与分析

The experiments are comprehensive and support the claims made in the paper.

In Section 5.4, why the proposed method cannot work well when $n$ is less than 500? The authors should elaborate on it.

补充材料

I review the proofs and the experiments in Supplementary Material.

与现有文献的关系

The proposed kernel-based test can be applied in causal discovery and feature selection, bridging the gap between theoretical developments and practical applications.

遗漏的重要参考文献

None.

其他优缺点

Strengths:

The paper is well-written, with clear explanations of the theoretical and empirical contributions.
The authors conduct comprehensive synthetic experiments demonstrating the effectiveness of the approach, including comparisons with existing kernel-based tests.
The proposed methods are demonstrated in real-world applications, such as causal discovery and feature selection.

Weaknesses:

The proposed methods assume i.i.d. samples, which may not hold in some real-world scenarios (e.g., time series or network data). The authors also acknowledge this limitation.
While the experiments are comprehensive, additional real-world datasets and larger simulated datasets could further validate the practical utility of the proposed methods.
The proposed permutation-free method is claimed to be an improvement over permutation-based tests, but there is limited discussion on potential drawbacks or cases where permutation-based methods might still be preferable. It might be better to discuss cases when we opt to use permutation-free or permutation-based methods.

其他意见或建议

Please see Other Strengths And Weaknesses.

作者回复

2025-04-01

We thank the reviewer for their positive review. Below we refer to Figure S1, S2 & S3 in the link: https://imgur.com/a/uju1pHq.

Re: iid assumption: As the reviewers correctly pointed out, our methods depend on the data being iid and indeed in many application areas, the nature of the data is often, for example, temporal. If many realisations of each time series are present (and the realisations are iid) one can still use permutation-free tests acting on the repeated realisations. Concretely, this means that one computes $k(x_i, x_j)$ where $x_i$ and $x_j$ are the time series vectors. Indeed, such an approach would even enable the detection of interactions between non-stationary time-series when there are multiple iid realisations. However, in situations where only one realisation of the time series is present, then one would have to rely on transformations to make the data points iid (as in stock returns in the example below), or one must check the mixing conditions in Chwialkowski et al. (2016) to develop the permutation-free tests for time series data.

Re: real and large dataset: We have added a large real financial dataset (504 variables and 1004 samples) to demonstrate the scalability and applicability of our proposed method. In summary, we show that in highly-regulated sectors such as Energy and Utilities, there are consistently high 3-, 4- and 5-way interactions due to redundancy. Moreover we observe that the companies in Information Technology, Consumer Discretionary and Health Care are prone to external factors therefore have low intra-sector interactions. Importantly, these experiments took ~8 hours without any parallelisation on a 2015 iMac. We estimate the time taken for the permutation-based methods to be at least 2 weeks. See details in response to Reviewer s8B8.

Re: discuss when to use permutation-based methods and why $n=500$ for causal discovery example: In our experiments, we observe that, when the sample size is low, the permutation-based methods achieve very similar performance, and in some cases slightly more powerful, than permutation-free methods (as noted by the reviewers for the causal discovery example in Fig. 6). We believe this is because in the low sample regime permutation-free methods are not able to capture the full structure of the data due to the data splitting—specifically, the test statistics are computed as the inner product between empirical embeddings estimated from two separate sample halves. Therefore, as a rule of thumb, if both the data available and the order of interest are relatively small, i.e., both $n$ and $d$ are small, then we could consider using the permutation-based methods. In such a case, the trade-off between computation time and statistical power would not be unfavourable. More concretely, when $n\leq 50$ and $d\leq 5$ the permutation-based method could be approximated well enough with just 100 permutations making it both computationally feasible and statistically powerful. However, outside of this regime we would strongly recommend the permutation-free method introduced in this paper.

Re: Figure 3: The reason that both xdHSIC and xLI fail follows from their vanishing conditions, i.e., xdHSIC becomes zero in the presence of joint independence and xLI becomes zero when the factorisation contains at least one singleton. The partial factorisation in Figure 3b is constructed precisely such that the ground truth is $P_{12345}=P_{12}P_{345}$ so it does not fall into these two categories: (i) it is neither jointly independent nor (ii) contains singletons. Therefore, both measures are unable to vanish in the presence of this partial factorisation hence their type-II errors increase as the interaction strength increases. In contrast, the Streitberg test stays fixed near alpha=0.05 with a controlled type-I error. We apologise for this lack of clarity and will make this clearer in the revised manuscript.

审稿意见

评分: 32025-03-16

This paper introduces permutation-free high-order interaction tests for joint independence and partial factorization of d variables. Traditional kernel-based hypothesis tests, such as HSIC and its extensions (e.g., dHSIC), rely on computationally expensive permutation-based null approximations. The authors propose a new family of tests that eliminate permutation-based computations by leveraging V-statistics and a novel cross-centring technique to yield test statistics with a standard normal limiting distribution under the null.

给作者的问题

Consider datasets generated by nonlinear SCMs with different nonlinear forms, such as polynomial, sinc, and log.
Consider other kernels besides the Gaussian kernel
Show the performance when the sample size is large

论据与证据

Yes.

However, to better demonstrate the effectiveness of the proposed method, it would be beneficial if the authors could: 1. Evaluate datasets generated by nonlinear SCMs with diverse nonlinear forms, such as polynomial, sinc, and log functions. 2. Explore alternative kernel choices beyond the Gaussian kernel. 3. Present performance results for larger sample sizes.

方法与评估标准

Yes. The used datasets include (1) Synthetic datasets (Multivariate Gaussian, XOR gates), (2) Feature selection datasets (high-order interactions in multivariate data). The evaluation metrics includes Type-I error control (Figure 9), Statistical power of permutation-free vs. permutation-based tests (Figure 2, Figure 4), and Computational efficiency comparison (Table 1).

It misses (1) SCMs with different nonlinear forms, (2) real-world datasets (e.g., finance, genomics) used for validation, and (3) ablation studies on cross-centering’s impact on variance estimation.

理论论述

I did not check the proofs.

实验设计与分析

Strengths: Computational efficiency evaluations provide practical insights into scaling. Comparisons with existing methods (HSIC, dHSIC, SHAP) demonstrate the framework’s advantages.

Weaknesses: Lack of sensitivity analysis on kernel choice. SCMs with different nonlinear forms are not considered.

补充材料

I did not review the supplementary materials.

与现有文献的关系

This paper extends kernel-based independence test and builds on high-order interaction testing.

遗漏的重要参考文献

It seems most references have been discussed.

其他优缺点

Strengths: • This paper proposes a novel permutation-free high-order dependency test. • Significant computational speedup (100x faster than permutation-based methods). • Strong theoretical grounding and empirical validation.

Weaknesses: • In the experiments, the authors did not consider nonlinear SCMs with different nonlinear forms.

其他意见或建议

No further comments

作者回复

2025-04-01

We thank the reviewer for their reviews. Below we refer to the Figure S1, S2 & S3 in the link: https://imgur.com/a/uju1pHq.

Re: SCMs with diverse nonlinear forms and alternative kernels: For the example in Fig. 5, we have added three new analyses where the V-structures are encoded by sinc, log and polynomial nonlinear forms. Furthermore, we have also included three kernels for each new nonlinear example: Rational-Quadratic kernel, Radial Basis Function kernel and Laplace kernel. Regardless of the non-linear form or kernel, our proposed method achieves similar accuracy compared with the permutation-based counterpart. See Figure S1 (a)(b)(c).

Similarly, for the DAG causal discovery application (Fig. 6), we have added a new nonlinear example that combines all three nonlinear forms simultaneously (sinc, log and polynomial) and compares three kernels. We find that our method is robust to these choices and achieves similar accuracy to the permutation-based counterpart. See Figure S1 (d).

Re: real dataset and large sample size: We have added a large real financial dataset to demonstrate the scalability and broad applicability of our proposed method. Specifically, we compute high-order interactions between stocks in the S&P 500 from 2020 to 2024 using their iid. daily returns (standard assumption in finance [Ali & Giaccatto (1982)]). This dataset comprises 504 variables (stocks) and 1004 samples (returns), spanning 11 sectors. We compute 2-, 3-, 4-, and 5-way interactions for 500 sets of stocks within the same sector and 5000 sets of stocks randomly drawn from different sectors. The percentages of detected high-order interactions are reported in the accompanying bar plot (Figure S3). As expected, all 2-, 3-, 4-, and 5-way interactions are significantly more common within sectors than across sectors. For nearly all sectors, 2-way interactions dominate, which aligns with the GICS industrial taxonomy upon which these sectors are based upon. Interestingly, we observe that Utilities and Energy exhibit exceptionally high 3-, 4-, and 5-way interactions. This likely stems from the highly regulated nature of these sectors, resulting in stocks with similar returns and, consequently, high (within-sector) redundancy. Conversely, Information Technology, Consumer Discretionary and Health Care display lower high-order interactions. This suggests that companies within these sectors may be more closely connected to firms in other sectors as they are likely to be influenced by diverse external factors, indicating (across-sector) synergistic relationships rather than redundancy. These high-order interactions may help investors build a more diverse portfolio. These findings suggest that our method is capable of providing valuable insights into complex, real-world datasets. See Figure S3 (d) using the link.

Importantly, these experiments took ~8 hours without any parallelisation on a 2015 iMac with 4 GHz Quad-Core Intel Core i7 processor and 32 GB 1867 MHz DDR3 memory. We estimate the time taken for the permutation-based methods would have been at least 2 weeks. This highlights the scalability of our proposed method.

Ali and Giaccotto. Journal of the American Statistical Association 77.377 (1982): 19-28.

Re: ablation study of cross-centring: Thank you for this suggestion. We constructed an order-4 jointly independent MVG and show that only cross-centring makes the test statistic follow a standard normal distribution. Using traditional centring, or not using any centring techniques at all, results in undesired null distributions. See Figure S2.

最终决定Accept (poster)

2025-05-01

This paper introduces permutation-free high-order interaction tests for joint independence and partial factorization of d variables. The null hypothesis states that the joint distribution factorizes into a product of marginals, and the alternative hypothesis states that it does not. Unlike state-of-the-art approaches such as HSIC which requires expensive permutations to sample from the null distribution, the new approach does not, and significantly saves computation.

The core proposal is built on previous works of Shekhar et al. (2023) and Liu et al. (2023b). The paper is well received by the reviewers, noting the novelty of the proposed permutation-free test and its well-grounded theory (s8B8, vR6R, eBnj). The claim on computation saving is clearly presented and supported by the empirical results. Initial concerns raised in the official reviews that were addressed include

lack of sensitivity analysis on kernel choice [s8B8],
lack of experiments on real datasets [vR6R],
limited discussion on when permutation-based approaches can be better [vR6R, wudW]
lack of experimental results on non-i.i.d. data [wudW]

The AC sees the last concern as a minor one, considering other contributed results. The authors submitted a strong rebuttal that addresses all the above concerns. In the next revision, please include the new results on the financial dataset, structural causal models, and other results from the rebuttal.

AC recommendation: accept.