6.4

/10

Spotlight4 位审稿人

最低1最高6标准差1.9

3.8

置信度

创新性3.3

质量2.8

清晰度2.0

重要性2.5

NeurIPS 2025

SHAP values via sparse Fourier representation

Ali Gorji,Andisheh Amrollahi,Andreas Krause

OpenReview PDF

提交: 2025-05-12更新: 2025-10-29

TL;DR

Computing SHAP values for black-boxes and trees using sparse Fourier representation of models, orders of magnitudes faster than other methods

摘要

关键词

SHAPInterpretabilityXAIFourier

评审与讨论

审稿意见

评分: 5置信度: 42025-06-29

The paper introduces FourierSHAP, a novel method to approximate SHAP values for black-box models. First, a sparse Fourier approximation of the black-box is estimated. Then, thanks to a novel fast closed-form expression of SHAP values for Fourier representations of functions, the exact SHAP values (of the Fourier approximation) can be efficiently computed. The proposed method is specific to pseudoboolean functions.

To validate the FourierSHAP empirically, the authors consider four test cases with only binary or categorical features (transformed through one-hot encoding). For each test case, they first train a preliminary black-box model (MLP or tree ensemble) and compute a baseline with KernelSHAP. They show that FourierSHAP achieves good accuracy while offering several orders of magnitude improvement with respect to KernelSHAP.

优缺点分析

I really enjoy the novelty of the paper, which shows that as long as a sparse Fourier representation is possible for a black-box model, it is possible to heavily reduce the computational burden of SHAP values. As clearly motivated in the introduction, estimation of SHAP values is of great particular interest but usual running times prevent its broad application when the number of features is large. The new approach presented here is a new and promising point of view.

The paper is well-written and easy to follow, with novel (to the best of my knowledge) theoretical results fo Fourier representation. Sparse Fourier approximations of typical black-model models are validated in Figure 1, which is a very reassuring experiment. The running time improvements illustrated in all test cases are also impressive.

However, I have a major concern about the limitation of the method on pseudoboolean functions. Of course categorical features can be transformed through one-hot encoding, and in Appendix F.1 the authors discuss quantization strategies to handle continuous features. To me, it is unfortunately still unclear:

how the SHAP value of the initial feature is recovered from the SHAP values of the new features after one-hot encoding (this may be trivial, but in this case this should be explicitly mentioned)
how running time improvement would change with respect to KernelSHAP used directly on the initial features (e.g., on SGEMM)
how continuous feature quantization would be handled in practice, and the impact of the increase in dimension

I really think that the paper would benefit from at least one experiment with continuous feature to be more convincing.

问题

As mentioned above, my main issue lies with the generalization to continuous features. Could the authors comment on the expected additional computational burden in such scenario?
Instead of reporting only relative improvement with respect to KernelSHAP, would it be possible to report also the running time of KernelSHAP in e.g., the Appendix? From a practical perspective a $10^3$ improvement is only interesting as long as KernelSHAP runs for more than seconds
Other sparse representations, given as low-order additive decompositions of multivariate functions, are also popular, and they would also be computationally efficient since they break (to some extent) the exponential complexity. How would the authors position their work with respect to such competitors?

局限性

The authors discuss in Appendix F.1 the limitations related to continuous features, and in Appendix F.2 the extension of conditional SHAP values.

I think Appendix F.1 should be expanded to give more practical details on how continuous features can be handled, and also contain at least one such illustration.

最终评判理由

The new sparse Fourier representation introduced in this paper is an interesting new proposal for Shaply value estimation, and the computational speed-up it leads to is in practice impressive. The authors answered my concerns in the rebuttal, and I am convinced by their comments.

格式问题

No formatting issue.

作者回复

2025-07-31

We sincerely thank the reviewer for the positive and encouraging feedback. We are especially grateful for the recognition of the novelty in using sparse Fourier representations to compute SHAP values efficiently, as well as for the kind words on the clarity of the paper, theoretical contributions, and experimental validation. We also discuss points mentioned by the reviewer below.

SHAP Values of One-hot Encoded Features: We used the SHAP values of the individual one-hot components separately for each category and compared them across methods. We will clarify this in the paper to avoid confusion and make our evaluation choices more transparent.

Generalization to Continuous Features: As for continuous features, we agree this is an interesting direction for future work. In Appendix F.1, we discuss possible extensions for this purpose such as quantization. The challenge lies in the trade-off between discretization granularity and computational cost. Since the dimensionality increases with finer quantization, this has non-trivial impact on runtime and approximation error.

However, while interesting, we focus the scope of our work to discrete features, for which we can also provide theoretical guarantees and empirical validation. We think even for the discrete setting, our work proposes a SOTA method for computing accurate SHAP values efficiently and therefore makes for a strong algorithmic contribution.

Practical Significance of Speedups over KernelSHAP: Below we report the absolute runtimes of KernelSHAP (in seconds) on the SGEMM dataset across varying sizes of the query and background sets.

The first table corresponds to the same settings used in the main paper, where background sizes were kept relatively small. This decision was made intentionally: during experimentation, we observed that larger background sizes caused LinregSHAP and KernelSHAP to become prohibitively slow, making it impractical to include full-scale results across datasets and seeds. Despite this limitation, our method already achieved orders-of-magnitude speedups.

KernelSHAP Runtime (Main Paper Setup)

Query Size ↓ / Background Size →	10	20	30	40
10	2.0 s	3.9 s	5.6 s	7.8 s
20	2.9 s	5.8 s	8.7 s	11.7 s
30	3.9 s	7.5 s	11 s	14.4 s
40	4.9 s	9.6 s	14 s	17.9 s

To further emphasize the practical gains of FourierSHAP, we conducted additional experiments for this rebuttal with significantly larger query and background sets, closer to what might be used in real-world deployments. Despite the background dataset still being modest in size (100–400), KernelSHAP runtimes quickly increase to minutes, while FourierSHAP continues to run in milliseconds.

KernelSHAP Runtime (Larger Setup)

Query Size ↓ / Background Size →	100	200	300	400
100	35 s	71 s	104 s	133 s
200	43 s	84 s	129 s	169 s
300	52 s	107 s	161 s	212 s
400	69 s	132 s	197 s	265 s

FourierSHAP Runtime (Same Setup, in milliseconds)

Query Size ↓ / Background Size →	100	200	300	400
100	14 ms	28 ms	43 ms	57 ms
200	29 ms	58 ms	86 ms	110 ms
300	40 ms	79 ms	119 ms	159 ms
400	53 ms	106 ms	160 ms	213 ms

These results reinforce that our reported relative speedups (often 100×–10,000×) translate into meaningful real-world runtime improvements. We will include both tables in the final version for completeness.

Positioning of FourierSHAP vs. Multivariate Function Decomposition: Although we agree that low-order additive decompositions can yield computational benefits, it is unclear which specific work is being referred to. If the reviewer could point us to a particular method in mind, we could clarify how our approach compares.

We hope our response addresses the concerns raised by the reviewer, and we are happy to engage in further discussions.

2025-08-04

Thank you for your detailed answer.

SHAP Values of One-hot Encoded Features: yes, I understood from my reading that you were computing the SHAP values of each independent one-hot component. But my main point was how we can come back to the SHAP value of the initial categorical variable. To me, this is a crucial step which is missing.

Generalization to Continuous Features: maybe my comment was not clear enough. Of course I am not asking for additional experiments with continuous features at this point, but I would like to have a clear and detailed step-by-step procedure on how they could be computed (once again, not a SHAP value for each quantized value of the feature, but the entire feature itself, as said in the previous paragraph).

Practical Significance of Speedups over KernelSHAP: thank you for the running times, they really help me see the practical implications of your proposed algorithm.

Positioning of FourierSHAP vs. Multivariate Function Decomposition: I had no specific paper in mind, sorry, this was only a thought that crossed my mind. You can forget about this one!

2025-08-07

This is a great and thoughtful point that needs to be addressed in practice. To recover SHAP values of the original categorical variable, a common accepted practice that is also recommended in the SHAP library [Lundberg, GitHub Issue #397] is to sum the SHAP values of the one-hot features.

For continuous features, one can quantize the input features and apply the same approach by treating each quantile bin as a one-hot categorical feature. Proper treatment of continuous features would require a revaluation of the SHAP formula over $\mathbb{Z}_k$ , which is outside the current scope but aligns with our intended future extensions (see also section F.1 where we discuss this in more detail).

We again thank the reviewer for the clarifying questions, and hope this addresses all the points raised in the initial review.

2025-08-09

Thank you for the detailed answer, I strongly believe that these points, even if they may appear minor, should be mentioned somewhere (at least in the supplemental). I am convinced by the answer, and will update my score accordingly.

审稿意见

评分: 6置信度: 42025-06-30

This paper introduces a novel approach to tackling the computational challenges associated with calculating Shapley Values (SV) for binary input vectors. The authors begin by reviewing the fundamentals of the Discrete Fourier Transform (DFT) applied to binary inputs, and they introduce key concepts such as k-sparsity and degree.

The authors provide a thoughtful discussion on the likelihood that many real-world prediction functions satisfy the sparsity requirement. They support this claim with educational examples, including modular functions and functions with only pairwise interactions, which effectively illustrate the practical relevance of their assumptions.

Building on this foundation, the authors present their main result: surprisingly, the notoriously exponential summation inherent in SV computation can be expressed in a closed form when the input vector is represented in the binary Fourier space. This finding is a significant theoretical advancement.

Empirically, the authors demonstrate that their proposed method outperforms alternative approaches (FastSHAP, Kernel SHAP, DeepLIFT, and LigRegSHAP) in terms of SHAP value accuracy, further validating the practical utility of their contribution.

优缺点分析

Strengths: The authors make a substantial contribution to the field of Explainable AI (XAI) and potentially to game theory as well. To the best of my knowledge, a closed-form solution for SV computation has not been previously established in the game theory literature, which underscores the high originality and novelty of this work.

The authors also provide a meticulous analysis of sparsity in the Fourier space, grounded in a deep understanding of spectral biases in neural networks. As a machine learning researcher in industry, I concur with their assertion that, in most real-world applications beyond benchmark problems in flagship domains (text, speech, vision, and molecular data), lower-order interactions often dominate the meaningful patterns worth extracting from data.

The manuscript is generally well-written and accessible. Additionally, the authors demonstrate a strong focus on the computational aspects of their method, leveraging JAX for efficient vectorization and parallelization, which enhances the practical applicability of their approach.

Weaknesses: While the authors excel in explaining the implications of Propositions 1 and 2 with intuitive and straightforward examples, the main results in Section 4 are presented without sufficiently accessible explanations, which detracts from the paper’s readability. As a reviewer, I am providing a positive assessment under the assumption that the proofs are correct. However, if the authors have intentionally omitted detailed explanations to avoid scrutiny, I find this approach concerning and not conducive to transparent scientific discourse.

问题

I may add some later.

局限性

The authors need to explain the main limitation upfront: The theory is only for binary vectors.

格式问题

Nothing in particular.

作者回复

2025-07-31

We sincerely thank the reviewer for their assessment. We are pleased that the key strengths of the paper, namely, the theoretical novelty of the closed-form SHAP expression, the principled use of Fourier sparsity, and the empirical demonstration of substantial computational gains, were clearly recognized. We also appreciate the reviewer’s perspective on our contribution to Explainable AI and its potential impact on game theory, as well as the validation of our assumptions on spectral sparsity in real-world models.

Regarding the clarity of Section 4: we fully agree that the presentation is dense. This section introduces the main result, i.e., the closed-form SHAP expression in the Fourier basis, which is mechanically derived and is notationally heavy. The derivation relies on a sequence of algebraic manipulations and combinatorial identities that are correct but offer little in terms of intuition. For this reason, we chose to move the full derivation to Appendix B and keep the main text concise.

However, we understand that this may have made the exposition harder to follow, and we will revise this section in the final version to improve accessibility. Specifically, we will include a high-level walkthrough of the key steps of the proof and better guide the reader to the relevant appendix material.

We also want to reassure the reviewer that we have validated the correctness of the formula both theoretically and empirically. In Figure 5 located in the Appendix, we include experiments comparing our computed SHAP values against ground-truth values (computed via enumeration), which serve as an internal consistency check on the correctness of the formula.

Finally, we agree with the reviewer that the restriction to discrete inputs should be stated more clearly and prominently. Although we already mention this in Section 2.2 and dedicate Appendix F.1 to this topic, we will revise the introduction and abstract to explicitly highlight that the current theory is scoped to discrete features.

We are grateful for the reviewer’s valuable feedback, which we believe helps us improve the main text even further.

审稿意见

评分: 4置信度: 22025-07-02

This paper introduces FOURIERSHAP, a novel two-stage algorithm for efficiently computing SHAP values for models with discrete or binary inputs. The core contribution is to first approximate a model using a sparse Fourier representation and then apply a newly derived closed-form formula that approximates SHAP values via a simple, parallelizable summation, bypassing the traditional method's exponential complexity. The authors demonstrate that this approach provides significant speedups over existing baselines while maintaining strong accuracy.

优缺点分析

Strengths

S1. The approach using Boolean Fourier analysis is theoretically novel and exciting.
S2. The experimental validation is strong, with impressive and practical speedups.

Weaknesses

W1. The method is fundamentally designed for discrete inputs. However, the authors do suggest quantization as a workaround.
W2. The method's effectiveness relies on the assumption of spectral sparsity. While the authors justify this for some models, it may not hold universally.
W3. The justification for spectral bias in Section 3.1, which relies on the NTK, is not fully convincing. The author's arguments implicitly invoke the lazy regime, whereas there is now considerable work on more realistic "rich regimes". The motivation would be stronger with either more robust theoretical citations or direct empirical evidence of the claimed spectral sparsity.

问题

N/A

局限性

Yes

最终评判理由

In light of the new concerns raised, I have slightly decreased my score and confidence

格式问题

N/A

作者回复

2025-07-31

We are grateful for the positive and thoughtful assessment and happy to see that the reviewer finds our work both theoretically exciting and also having a strong experimental validation with significant speedups. We also acknowledge the weaknesses mentioned by the reviewer and discuss them below.

W1 - Discrete Input Assumption: We appreciate the reviewer drawing attention to the limitation of our method to discrete inputs, which is a key point we discuss in Appendix F.1 of the paper. There, we discuss possible extensions for this purpose such as quantization. The challenge lies in the trade-off between discretization granularity and computational cost. Since the dimensionality increases with finer quantization, this has non-trivial impact on runtime and approximation error. A true and general extension to continuous features (potentially involving new surrogate representations), remains an open and important direction for future work. We believe this requires a careful theoretical and empirical treatment.

We focus the scope of our work to discrete features, for which we have provided theoretical guarantees and empirical validation. We think even for the discrete setting, our work proposes a SOTA method for computing accurate SHAP values efficiently and therefore makes for a strong algorithmic contribution.

W2 - Spectral Sparsity Assumption: As the reviewer has thoughtfully pointed out the assumption of spectral sparsity is central to our method, and while it is theoretically and empirically supported for many real-world models (such as those presented in the paper), it is not always guaranteed. We have tried to be clear about this limitation and explained the scope of applicability in Section 3. Namely, we provided guarantees for the case of trees (Section 3.1), low-order functions (Proposition 1), MLPs (Section 3.2), and generic functions (Appendix A.2).

We also attempt to offer a broader intuitive argument in this rebuttal. Functions that only contain low-order interactions have a sparse Fourier spectrum by Proposition 2. We think low-order interactions is a useful inductive bias for models that can generalize well, especially when trained on limited data.

W3 - Limitations of NTK-based Justification for Spectral Bias: We thank the reviewer for raising this point. NTK theory was used primarily as a theoretical motivation in the specific context of MLPs. In Sections 3.1 and 4, we also cited empirical works that highlight spectral bias in fully connected neural networks. Additionally, the results shown in Figure 1 reflect this phenomenon in our own empirical studies.

Although NTK theory provides useful intuition in the lazy regime, we acknowledge its limitations and welcome deeper integration with insights from richer training regimes. We would be happy to consider incorporating additional references if there are particular works that the reviewer believes would help strengthen the motivation.

We appreciate the encouraging feedback from the reviewer and are excited to build on this work in future directions.

2025-08-04

Thank you for the response and clarification. I wish to maintain my cautiously optimistic score of 5. My main concern remains the clarity of exposition, as it can drastically lower the impact of this work. Nevertheless, I look forward to reading future versions of this paper.

审稿意见

评分: 1置信度: 52025-07-03

The paper "SHAP Values via Sparse Fourier Representation" introduces FOURIERSHAP, a two-stage algorithm for efficiently computing SHAP values by leveraging the observation that many real-world models—especially neural networks and tree ensembles—exhibit spectral sparsity and can be approximated by sparse low-degree Fourier representations. In the first stage, the method approximates the black-box model using a sparse Fourier transform. In the second stage, it derives a closed-form, exact expression for the SHAP values of each Fourier basis function, avoiding the exponential summation over coalitions and enabling highly parallelizable, amortized computations. This approach results in orders-of-magnitude speedups over existing SHAP methods (including KERNELSHAP and FASTSHAP).

优缺点分析

The paper touches upon a computationally challenging problem of computing exact Shapley values, whose complexity grows exponentially with the number of features. The fact that the method can compute the exact Shapley value is the main strength of the paper.

Although the paper seems a bit difficult to follow, FOURIERSHAP seems to approximate the trained model under explanation with a global, structured surrogate in the Fourier domain, enabling exact SHAP value computation without exponential cost. So, in principle, the authors put forward a global surrogate model that can approximate the trained model and able to compute Shapley values in closed form. Thus, the paper, IMHO, should be shared about providing a global surrogate model with Shapley value computation capability, not an approximation of the Shapley value itself for different models. As a result, the method is not comparable with amortizedSHAP (like FastSHAP which is studied in length in the paper).

Based on this inception, the paper should discuss and study how this global surrogate model can approximate the functions, e.g., is it a universal approximation? This approximation power would provide some perspective on how well the approximated Shapley values are, when the original model is replaced by the proposed surrogate model.

In the same line, the authors should discuss why this global surrogate model is preferred over TreeSHAP for instance, where we can compute the exact Shapley value. The current version of FourierSHAP seems to be focused on tabular data, and ensemble trees are the SOTA on tabular data, and we are able to compute exact Shapley values as well very efficiently. TreeSHAP is just an example, and there are a few other models that are capable of computing exact Shapley values.

That being said, the authors ironically approximate the decision trees to apply FourierSHAP; why not using TreeSHAP?

Last but not least, I did not why the authors use the term SHAP and FourierSHAP, while they try to approximate the Shapley value itself. So, I think the right terminology here is to use Shapley value and term the method as FourierShapley.

问题

Why did you refer to the method as FourierSHAP and not FourierShapley?

局限性

See above.

最终评判理由

positioning of the paper as a method and wrt to the literature

格式问题

Nothing specific

作者回复

2025-07-31

We thank the reviewer for their time and for thoughtful participation in our work. We appreciate the recognition of our method’s ability to compute exact SHAP values with orders-of-magnitude speedup through a sparse Fourier representation. Below are our responses to the concerns raised by the reviewer.

Positioning of FourierSHAP vs. FastSHAP: We thank the reviewer for pointing out this distinction. While FastSHAP amortizes the computation of SHAP values by learning a model that maps input features directly to SHAP values, it does so by implicitly learning both the surrogate and the SHAP estimator end-to-end. Therefore, errors in the final SHAP values can arise jointly from the model approximation and also SHAP estimation.

In contrast, FourierSHAP decouples these steps: we first approximate the original predictive model using a global sparse Fourier surrogate, and then compute the exact SHAP values of this surrogate. This separation allows us to provide theoretical guarantees and also fine-grained control of the surrogate approximation quality through the sparsity parameter.

Although different in structure, both methods aim to accelerate SHAP computation by orders of magnitude. Hence, we considered FastSHAP a relevant baseline for the empirical comparison. Moreover, we consider our method amortized, because the expensive training of the surrogate is done once and future evaluation of the SHAP value is, similar to FastSHAP, very fast. To re-iterate, we say the method is amortized because the upfront cost of training an expensive surrogate is done once and this cost is amortized over the large number of subsequent shap value predictions that are computationally extremely cheap.

Expressiveness of Surrogate Model: This is indeed an important point. We discuss this in Section 3.2 for the case of ensemble of tress and in Proposition 1 for the more general case of functions that can be decomposed as a sum of low order functions. For these classes of functions, the Fourier representation is exact, therefore we incur no approximation error in the SHAP values.

In addition to the points above, we also have a discussion in Section 3.1 on why our method can approximate MLP functions as well, and furthermore provided empirical experiments ourselves in Section 5, validating the approximation capabilities. Finally, in Appendix A.2 we have discussed the type of guarantees available for the surrogate function when approximating a generic black box. These are uniform bounds that bound the L2 norm of the error based on the spectrum of the function.

Positioning of FourierSHAP vs. TreeSHAP: We appreciate the reviewer raising this important point. Indeed, TreeSHAP allows exact SHAP computation for decision trees and is highly efficient. However, its applicability is limited to tree-based models.

FourierSHAP, in contrast, is model-agnostic and furthermore captures a broader class of functions. As mentioned above as well as in Section 3.2 and Proposition 1, trees and, more generally, low-order functions admit exact sparse Fourier representations. Therefore, FourierSHAP provides exact SHAP computation capability for the case of these models. Indeed, as shown in Figure 3, for decision trees, our method reproduces TreeSHAP outputs exactly but with order-of-magnitude speedups. Therefore, not only is our method exact for trees, it is also applicable to a broader class of models, and it is more computationally efficient. We believe this makes our method a better alternative for computing SHAP values.

Unnecessary Approximation of Trees: We would like to clarify that we did not approximate decision trees. We computed its Fourier transform exactly. The theory is detailed in Section 3.2. Thus, the surrogate is an exact representation, and the SHAP values derived from it are identical to those produced by TreeSHAP (see our response above to previous two concerns).

Naming of the Method: We appreciate the reviewer’s point and agree that terminology should be used carefully to avoid confusion. While Shapley values are rooted in cooperative game theory, the term SHAP values has become the standard when referring to their adaptation for feature attribution in machine learning models, as popularized by Lundberg and Lee [1]. We believe FourierSHAP is consistent with names in existing literature (e.g., KernelSHAP, TreeSHAP). We have addressed how feature attribution is related to cooperative games in Section 2.1.

In summary, our method enables efficient SHAP value computation by approximating the original model with a global Fourier surrogate. This approximation is exact for common classes such as ensembles of trees and other black-blox low-degree functions. The novelty lies in the closed-form SHAP computation in the Fourier basis, which, under sparsity assumptions, provides exact values and substantial computational gains. While the method can involve approximation of the underlying predictor, our focus is squarely on the accuracy and efficiency in the computation of the SHAP values themselves.

We hope that our detailed responses have addressed your concerns and clarified the contributions and scope of our work. If there are any remaining questions or further points to discuss, we would appreciate additional feedback otherwise we kindly ask you to please consider increasing your score.

[1] Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Advances in neural information processing systems 30 (2017).

评论- response

2025-08-04

I would like to thank the authors for their responses. Unfortunately, I found that my main concerns were acknowledged rather than adequately addressed. Given that my evaluation diverged significantly from those of the other reviewers, I carefully re-read the paper, and after this second reading, I am even more confident in maintaining my original score.

The central limitation of the proposed method lies in its applicability only to models trained on binary (or, with caveats, discrete) input features. This is not only a substantial practical constraint but also a case that has been thoroughly explored in prior work. For instance, the authors cite Van den Broeck et al. (2021), which presents an exact Shapley value computation for binary inputs, yet they do not provide any meaningful comparison or discussion of its relevance. More critically, this work has been generalized to arbitrary discrete features in Van den Broeck et al. (2022) and Arenas et al. (2023), a direction not acknowledged or discussed by the authors. Without a clear articulation of how the proposed method differs from or improves upon these existing approaches, the contribution appears limited in novelty.

Furthermore, I find the attempted comparison with FastSHAP to be misleading. FastSHAP is an amortized method for learning Shapley value approximations using a learned explainer model that generalizes across instances. In contrast, the method proposed in this paper performs exact instance-wise computation for binary inputs. The assumptions, scope, and objectives of the two methods are fundamentally different, and grouping them together under the umbrella of "faster attribution methods" is not justified.

The discussion around tree-based models is more promising, as this is an area with several practical applications (possibility to compare against TreeSHAP and variants). However, I did not find sufficient motivation or justification for this focus. There is no clear contrast with highly efficient existing methods such as TreeSHAP or Linear TreeSHAP (Bifet et al., 2022), which already support exact Shapley value computation in linear time. The paper does not clarify whether it offers any theoretical or empirical advantage over these alternatives.

The experiments presented are also relatively weak. I understand that restricting the input to binary features limits dataset choice, but this constraint does not justify the absence of direct comparisons with established baselines, especially those mentioned above and also in the context of tree-based models. For instance, the runtime comparison in Figure 3 lacks rigor, and it is unclear whether it meaningfully improves upon optimized implementations of existing methods such as Bifet et al. (2022) (already only slightly better than fastTreeSHAP).

In summary, the paper suffers from issues in positioning, limited scope, lack of comparison with relevant literature, and weak empirical justification. I therefore maintain my original score.

References

[1] Van den Broeck, G., Hacohen, G., & Sarfati, G. (2022). On the Tractability of SHAP Explanations. Journal of Artificial Intelligence Research (JAIR).

[2] Arenas, M., Barceló, P., Borge-Holthoefer, J., & Reutter, J. (2023). On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results. Journal of Machine Learning Research (JMLR).

[3] Huang, Y., Chen, Y., & Lee, J. (2024). Updates on the Complexity of SHAP Scores. Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI).

[4] Marzouk, A., Belle, V., & Van den Broeck, G. (2025). On the Computational Tractability of the (Many) Shapley Values. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS).

[5] Marzouk, A., Belle, V., & Van den Broeck, G. (2024). On the Tractability of SHAP Explanations under Markovian Distributions. Proceedings of the 41st International Conference on Machine Learning (ICML).

[6] Bifet, A., Read, J., & Xu, C. (2022). Linear Tree SHAP. Advances in Neural Information Processing Systems, 35, 25818–25828.

评论- Author Response to New Concerns

2025-08-05

Our clarifications to the new points raised by the reviewer follow.

On the positioning vs LinearTreeShap[6] and incomprehensive experiments: LinearTreeShap is an algorithm used to compute SHAP values for trees in the conditional setting, whereas we compute SHAP values in the interventional setting. Therefore, it has not been included as a baseline. For tree-based models, our best baseline is an optimized GPU implementation of TreeSHAP in the interventional setting, which, to our knowledge, represents the state of the art for the interventional scenario.

As already mentioned in lines 101 to 108 and more thoroughly in the dedicated related works sections A.3 and future work F.2, the conditional setting is a fundamentally different setting [7]. The interventional setting computes SHAP values by independently perturbing features, effectively simulating causal interventions and isolating each feature’s direct impact on the prediction [8][9]. In contrast, the conditional setting accounts for dependencies by sampling missing features from their conditional distribution, which can entangle attributions due to correlations. The interventional setting tends to be more robust and model-agnostic, especially when conditional distributions are hard to estimate [7]. For these reasons, it is often preferred for feature attribution when the focus is on direct effects or in the presence of complex or unknown dependencies. The conditional setting requires assumptions about the data distribution that we do not make or address in this work.

On the positioning vs various theoretical work [1]-[5]: Regarding recent theoretical advances that also compute SHAP values for models defined on the boolean cube: there are fundamental differences.

(i) Firstly, all of the work except for [4], address conditional SHAP settings, not the interventional setting we study (see above). This point is already mentioned in the related work Section A.3 for [1].

(ii) None of the work above addresses the case where the model is represented by a sparse Fourier representation and provides a closed form solution for SHAP values in that case which is our main contribution.

(iii) These papers are all fundamentally very theoretical in nature and use a symbolic, logic-based approach rooted in tractable circuits and complexity theory. The results in the papers have roots in complexity classes and are of the type ``there exists a polynomial time algorithm that solves the problem''. Our approach is more practical with a solid algorithm. Our guarantees are purely spectral and more in the spirit of Machine learning papers. This can be seen especially when invoking NTK for the case of MLPs.

[4]: This paper has a few results in the interventional setting. There is a relevant result on computing SHAP values being #P-Hard for MLPs. This is in a theoretical setting where an MLP is considered as a structure with arbitrary weights. We invoke a more practical setting using NTK theory which stipulates that models trained with gradient descent, end up having weights that can not be any arbitrary set of values. See Section 3.1 of our paper which explains the clear distinction between these settings.

[2], [3]: In addition to all the points mentioned above, the papers [2], [3] have the additional constraint that the predictor is a classifier that is either binary [2] or taking finite values in $\{1, \dots, K}$ [3].

On the positioning vs FASTShap: As explained in the previous rebuttal as well, FourierSHAP is amortized because the computationally expensive step of fitting a global sparse Fourier surrogate is performed only once; after this, computing SHAP values for any number of new queries is extremely fast and efficient. This means the high up-front cost is distributed across many predictions, making individual SHAP evaluations nearly instantaneous, similar to other amortized SHAP approaches such as FASTShap.

References

[7] Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In International conference on machine learning, pages 9269–9278. PMLR, 2020.

[8] Janzing et al., Lenon Minorics, and Patrick Bloebaum. Feature relevance quantification in explainable ai: A causal problem. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 2907–2916. PMLR, 26–28 Aug 2020.

[9] Judea Pearl. Causality. Cambridge university press, 2009.

评论- response

2025-08-07

I would like to thank the authors for their response. However, I feel that some of my points were not well-received. For example, I pointed out relevant works on computing exact Shapley values that are not discussed in the paper. While the authors have explained the differences well in their response, such a discussion should be included in the paper itself to better position the contribution within the existing literature.

I appreciate the authors' engagement in the discussion, but I will maintain my original score.

评论- Additional references

2025-08-08

If accepted we will update the manuscript's related work section with [3,4,5]

最终决定Accept (spotlight)

2025-09-17

This paper presents a computational efficient, exact calculation of Shapley values for models with categorical features via a sparse Fourier representation.

It has split the reviewers:

4ytr argues for rejection on the grounds that the paper adds little to e.g. Van den Broeck et al. (2022) and Arenas et al. (2023)), in addition to not sufficiently recognising those papers' contributions.
pDrB favours acceptance, finding FourierSHAP novel, efficient and fast, while expressing some concerns about exposition.
UB9P strongly favours acceptance - but does note that the Section 4 exposition is dense, and - thus - the reviewer cannot vouch for the results. No reviewer has found an error in the paper.
ihkk favours acceptance as well, also citing the paper's novelty, efficiency and speed. This reviewer's impression has improved with the author engagement - hoping that some of the points arising in correspondence will be incorporated into the paper.

The authors, replying to 4ytr, emphasise:

their analysis applies to interventional/baseline Shapley values, in contrast to the cited papers.
none of the cited papers work through the Fourier representation, which allows a closed form representation.
their approach is explicitly constructive, presenting a closed form solution, while the cited work tends to present existence results.

My review of van den Broeck et al. and Arenas et al. largely favours the authors' claims:

vdB explicitly works with the 'original' Lundberg & Lee (observational) Shapley value. Arenas et al. are not explicit about their choice, but do not seem to adopt any sort of $do()$ formalism.
neither vdB et al. nor Arenas et al. mention Fourier representations. I do not know the literature well enough to know whether there is an isomorphism between the formalisms considered. I did not notice closed form solutions in vdB or Arenas.
vdB and Arenas do both consider algorithms.

Thus, while I - with 4ytr - would like to see a more careful comparison to the existing literature, I also share the view of the positive reviewers, that the paper does seem to make a contribution to the literature.

12/09/25: updated Meta-Review to 'accept' from 'accept (oral)', following query from PCs: I had understood 'oral' to be a step up from 'poster', rather than a step up from 'highlight'. Above all, I would follow the judgement of the SAC, who has written one of the main papers that this paper builds upon.