PaperHub
7.3
/10
Spotlight3 位审稿人
最低6最高8标准差0.9
8
6
8
3.0
置信度
正确性3.7
贡献度3.0
表达3.3
ICLR 2025

Provably Accurate Shapley Value Estimation via Leverage Score Sampling

OpenReviewPDF
提交: 2024-09-24更新: 2025-04-28
TL;DR

We propose a theoretically motivated method for estimating Shapley values that outperforms Kernel SHAP.

摘要

Originally introduced in game theory, Shapley values have emerged as a central tool in explainable machine learning, where they are used to attribute model predictions to specific input features. However, computing Shapley values exactly is expensive: for a model with $n$ features, $O(2^n)$ model evaluations are necessary. To address this issue, approximation algorithms are widely used. One of the most popular is the Kernel SHAP algorithm, which is model agnostic and remarkably effective in practice. However, to the best of our knowledge, Kernel SHAP has no strong non-asymptotic complexity guarantees. We address this issue by introducing *Leverage SHAP*, a light-weight modification of Kernel SHAP that provides provably accurate Shapley value estimates with just $O(n\log n)$ model evaluations. Our approach takes advantage of a connection between Shapley value estimation and agnostic active learning by employing *leverage score sampling*, a powerful regression tool. Beyond theoretical guarantees, we find that Leverage SHAP achieves an approximately 50% reduction in error compared to the highly optimized implementation of Kernel SHAP in the widely used SHAP library [Lundberg & Lee, 2017].
关键词
Explainable AIActive RegressionShapley ValuesLeverage Scores

评审与讨论

审稿意见
8

This paper modifies the Kernel SHAP algorithm, which approximates the Shapley value and model-agnostically quantifies the role of each feature in a model prediction. The Kernel SHAP did not enjoy theoretical convergence guarantees, and the authors propose a modification of this algorithm that offers a theoretical convergence guarantee and similar numerical performance.

优点

This paper is very well written, introduces the context of their work beautifully, and provides both a theoretical and practical contribution to the field.

缺点

A weakness is that it might feel niche, but as a non-specialist in interpretable AI, I cannot judge the importance of the Shapley values. If this information is important, then the author's contribution is quite important because it removes some level of heuristic thanks to their theoretical contribution.

问题

N.A.

评论

Dear Reviewer m993,

Thank you for your time and feedback!

We would like to add some context to the importance of approximating Shapley values. In the explainable AI literature, Shapley values are the de facto method for providing explanations for machine learning predictions on tabular data. In particular, the original SHAP paper by Lundberg and Lee titled “A Unified Approach to Interpreting Model Predictions” has more than 25,000 citations. Further, the corresponding SHAP library (of which Kernel SHAP is probably the most widely utilized algorithm) has been used by more than 20,000 repositories, as reported by GitHub. As such, we believe Leverage SHAP has the potential for significant impact because of its superior empirical performance and its strong, non-asymptotic theoretical guarantees.

审稿意见
6

The paper develops a computationally efficient method for approximating Shapely values, which are useful in interpretable machine learning. The proposed method provides a modification of the well-known Kernel SHAP method, with additional non-asymptotic theoretical convergence guarantees and better empirical performance. The authors use a modified sampling without replacement approach to optimize their method and report experiments with ablation studies for the optimizations.

优点

  • Estimating Shapley scores accurately and efficiently is an important problem in explainable machine learning. The paper provides a theoretically principled approach for this problem.
  • The approach seems to outperform Kernel SHAP and optimized Kernel SHAP baselines in the experiments.

缺点

  • The main theoretical result (Theorem 1.1) is somewhat unsatisfactory as it does not directly compare the true and estimated Shapely values. The authors address this via Corollary 4.1, but it has a non-intuitive γ\gamma term which is can be large and makes the approximation guarantees weaker. Are there conditions under which γ\gamma is guaranteed to be small? This would better help understand the limitations of current theoretical results.
  • The experiments could include more baselines like (Jethani et al., 2021), (Mitchell et al., 2022b), and (Yang & Salman, 2019) for a more comprehensive comparison with the state-of-the-art.
  • The technical novelty in proving the new theoretical results also appears to be limited. The main result seems similar to the active learning guarantee Theorem 1.1 of Shimizu et al. ICLR 2024, and it is not clear what additional technical insights are needed to develop the current result. A discussion of novel technical insights needed to develop the current result would be helpful.

Typos:

  • Line 155, Line192 finte, Line 255 contrained
  • References are incorrectly bracketed

问题

  • Which algorithm in the literature does the Optimized Kernel SHAP in the experiments correspond to?
  • Do the theoretical guarantees for leverage SHAP continue to hold when optimizations like paired sampling, and sampling without replacement are not applied? Is there a difference in the guarantees with and without the optimizations?
  • In Table 2, Leverage SHAP w/o Bernoulli sampling seems to outperform in some cases. Is there a way to understand this? An analysis or discussion around this would be very useful to better understand the method's behavior.
评论

We note that gamma is often quite small in experiments. Since computing γ\gamma requires writing down the entire (exponentially) large regression problem, we can only compute γ\gamma for datasets with small nn. For each of these datasets, we train a model 100 times (from different random initializations) which induces a different vector bb. Then we compute γ=Aϕb22/Aϕ22\gamma = \| \| A \phi - b\|\|_2^2 / \|\| A \phi \|\|_2^2. For all datasets, we found that the 75th percentile of γ\gamma was less than 2. Please see below for the full summary statistics.

Datasetnn1st Quartile2nd Quartile3rd Quartile
IRIS40.0002340.2661.03
California80.1580.2980.449
Diabetes100.1740.3210.513
Adult120.1800.3950.703
评论

Dear Reviewer 7UDD,

Thank you for your time and feedback! We address your concerns here and your questions in a comment below.

Corollary 4.1 [...] has a non-intuitive γ\gamma term which is can be large and makes the approximation guarantees weaker.

We agree that the dependence on γ\gamma is not ideal, although dependence on this quantity seems inherent to all regression-based algorithms, including Kernel SHAP. This is illustrated in Figures 5 and 6. As discussed in our response to Reviewer LX1d, it would be interesting to consider lower bounds for Shapley value estimation in future work: it may be that the dependence on γ\gamma is unavoidable for any method that, like our method, only uses a near-linear number of value function evaluations.

Are there conditions under which γ\gamma is guaranteed to be small?

This is an interesting question. γ\gamma is known to equal 00 for linear value functions, which covers e.g., the case when Shapley values are used for feature attribution in linear machine learning models (when the baseline is the all 0 vector). We are not sure if there are other more general conditions under which γ\gamma can be bounded, but it is a nice question for future work.

We do note that gamma is often quite small in experiments. Since computing γ\gamma requires writing down the entire (exponentially) large regression problem, we can only compute γ\gamma for datasets with small nn. For each of these datasets, we train a model 100 times (from different random initializations) which induces a different vector bb. Then we compute γ=Aϕb22/Aϕ22\gamma = \| \| A \phi - b\|\|_2^2 / \|\| A \phi \|\|_2^2. For all datasets, we found that the 75th percentile of γ\gamma was less than 2. Please see a comment below for all the summary statistics.

The experiments could include more baselines like…

Thank you for these suggestions. We address each paper below.

(Jethani et al., 2021)

The Fast SHAP algorithm (Jethani et al., 2021) is complementary to our setting because the algorithm is trained (using many many value function evaluations) to predict Shapley values for multiple explicands and baselines. In contrast, the algorithms we consider, like Kernel SHAP and our Leverage SHAP method, estimate the Shapley values for a single value function vv corresponding to a single explicand and baseline pair without any pretraining. Which algorithm to use depends a lot on the setting and if the expensive training cost of Fast SHAP can be amortized against enough Shapley value computations. We will add further discussion of this point to the next version of the paper.

(Mitchell et al., 2022b)

We have added a comparison to the permutation algorithm (Strumbelj & Kononenko, 2010 and Mitchell et al., 2022), which we present in a comment below. We find that Permutation SHAP sometimes outperforms Optimized Kernel SHAP but only very rarely does it outperform Leverage SHAP.

(Yang & Salman, 2019)

The only paper we could find with these two authors + year is the paper “A Fine-Grained Spectral Perspective on Neural Networks”, which we do not believe is relevant to Shapley value estimation. Perhaps you meant a different reference? Please let us know.

The technical novelty in proving the new theoretical results also appears to be limited...

We do not consider the proof of our Theorem 1.1 to be a main technical contribution of the paper. It is well known that leverage score sampling provides a sample complexity bound of O(nlogn+n/ϵ)O(n \log n + n/\epsilon) for general active linear regression problems, of which the Shapley value estimation problem can be viewed as a special case. Such results date back to work by Sarlos (FOCS, 2006) and others: a good reference is the book Sketching as a Tool for Numerical Linear Algebra (Woodruff, 2014). While our result requires some extra work (to handle paired sampling and sampling without replacement), the proof is a straightforward generalization of this prior work. Theorem 1.1 from Shimizu et al. could likely be used as well since our paired-sampling distribution satisfies the \ell_\infty- independence condition in that paper. However, that result naively only applies to kk-homogenous sampling distributions, which our distribution is not (since we do not sample a fixed number of rows from AA). Modifying our approach to match the Shimizu result is likely possible, although ultimately it was simpler to just prove the guarantee we needed directly.

Instead, we consider the technical contribution of the paper to be the realization that 1) leverage score sampling can be directly applied to the Shapley value estimation problem and 2) this can be done in a computationally efficient way given the structure of the matrix AA. The second point is important, as applying prior work on leverage scores directly would suggest the need to compute these scores, of which there are 2n2^n for the Shapley value estimation problem.

Typos

Fixed, thank you!

评论

Due to space constraints, we answer your questions here.

Which algorithm in the literature does the Optimized Kernel SHAP in the experiments correspond to?

The Optimized Kernel SHAP algorithm is the Kernel SHAP implementation in the SHAP repository. The algorithm originally comes from the SHAP paper by Lundberg and Lee in 2017. It was modified to include paired sampling (a.k.a. antithetical sampling) after work by Covert and Lee in 2021. We will clarify this in the paper.

Do the theoretical guarantees for leverage SHAP continue to hold when optimizations like paired sampling, and sampling without replacement are not applied? Is there a difference in the guarantees with and without the optimizations?

Yes, the exact same theoretical guarantees for Leverage SHAP still hold even without paired sampling and sampling without replacement. However, the optimizations have a significant impact on experimental efficiency. It would be great if this difference was reflected in the theory, although we suspect that it likely comes down to constant factors in the analysis, which are difficult to get sharp (since, e.g., we rely on matrix concentration bounds without sharp constants).

In Table 2, Leverage SHAP w/o Bernoulli sampling seems to outperform in some cases. Is there a way to understand this? An analysis or discussion around this would be very useful to better understand the method's behavior.

Thank you for bringing this to our attention! There was a bug in our code that was causing Leverage SHAP to take slightly fewer than mm samples in expectation, hurting its performance in comparison to Leverage SHAP w/o Bernoulli sampling. We will resolve this issue and rerun experiments. Doing so should slightly improve the results for Leverage SHAP.

评论

We have added a comparison to the permutation algorithm (Strumbelj & Kononenko, 2010 and Mitchell et al., 2022), which we present below. We find that Permutation SHAP sometimes outperforms Optimized Kernel SHAP but only very rarely does it outperform Leverage SHAP.

IRISCaliforniaDiabetesAdultCorrelatedIndependentNHANESCommunities
Optimized Kernel SHAP
Mean3.28e-140.002482.331.81e-050.0007390.0006490.0055121.8
1st Quartile2.12e-140.0002790.5492.16e-060.000270.0001870.0007075.85
2nd Quartile3.55e-140.001381.265.43e-060.0005460.0003850.002413.0
3rd Quartile4.22e-140.00363.031.63e-050.001010.0009640.0066525.1
Permutation SHAP
Mean0.0005260.0009682.531.77e-050.0006980.0008060.0049928.4
1st Quartile4.06e-140.0001380.2939.31e-078.79e-059.32e-050.0003595.41
2nd Quartile5.57e-100.0004020.9683.51e-060.0004280.0004020.0014414.7
3rd Quartile7.04e-060.001013.141.04e-050.0009170.0010.0043836.3
Leverage SHAP
Mean3.28e-140.0001860.635.21e-060.0004580.0003590.0038514.7
1st Quartile2.12e-141.91e-050.06316.3e-070.0001399.51e-050.0003333.6
2nd Quartile3.55e-148.31e-050.3282.33e-060.0003760.0002350.001498.9
3rd Quartile4.22e-140.0002310.7697.09e-060.0006170.0005560.0040115.3
评论

I thank the authors for clarifying my questions and appreciate the updates and responses to my review. I have updated my score based on the rebuttal.

审稿意见
8

This paper provides a new algorithm for approximating Shapley values with provable guarantees. Shapley values have widespread application in ML as a way of formalizing individual feature contributions to a model's final prediction, although this paper considers the fully general setting in terms of a generic value function v:2[n]Rv : 2^{[n]} \to \mathbb{R} (where nn is the number of features or "players").

The algorithm builds on a known way of formulating the Shapley value as the solution of a certain 2n2^n-dimensional linear regression problem. The widely used "Kernel SHAP" algorithm approximates the solution by essentially subsampling the rows of the design matrix in a certain way. But a more principled approach is to subsample according to leverage scores, a well-studied concept in statistics. The catch is that naively, leverage score take time polynomial in the size of the design matrix (so 2O(n)2^{O(n)}) to compute. The key idea the authors exploit to get around this is that for the specific Shapley design matrix, the leverage scores can actually be written down in a simple closed form.

This allows them to efficiently solve the underlying regression problem and carry over known guarantees from the leverage score toolbox. Specifically, they show that the estimated Shapley values are close to the true Shapley values in a certain sense (both in terms of the optimum achieved and the values themselves). They show further refinements using paired sampling without replacement.

Finally, they show various experiments demonstrating that the resulting algorithm indeed outperforms the previous best Kernel SHAP algorithm in practice.

优点

Shapley values are a basic and important topic in interpretable AI and beyond, finding wide application in practice. The problem of efficiently estimating them well is a very well-motivated one. This paper makes a very nice and useful contribution to this problem. The key theoretical insight of analyzing the form of the leverage scores is simple but very clever and elegant, and allows them to make use of a very well-studied toolbox in statistics (although there is still technical work to be done). It immediately feels like the "right" way to solve the problem. The resulting algorithm is theoretically sound, clean, simple, as well as effective in practice. The paper is also very clearly written, with a clear description of all the relevant context as well as clear exposition in general. I did not verify all the proofs in complete detail but they seemed correct to me.

缺点

I do not see any major weaknesses. I do think would be helpful for the authors to discuss the limitations of the Leverage SHAP algorithm a bit more (e.g. does it strictly dominate all prior algorithms?), and provide some context on what still remains open in this space (see below for related questions).

问题

  1. It would be nice to have a fuller picture of the Shapley value problem from a broader approximation algorithms standpoint. The linear regression approach is ultimately just one approach to the problem. Do we have a sense for the best possible approximation one can hope for? Are there any computational inapproximability results?
  2. Also, it's natural to ask whether additional structure makes the problem easier (e.g. for the specific setting of feature attribution). I realize this is indeed true for decision trees and such. But I am curious if the authors have an idea of a "dream result" under less restrictive but still reasonable structural assumptions, especially for feature attribution. For feature attribution for a general black box model, is the Leverage SHAP approach likely to be the best?
  3. One thing that was not totally clear to me from the experiments is whether Leverage SHAP strictly dominates Kernel SHAP at every point in the running time vs accuracy graph. That is, for any fixed running time budget, is it always better to run Leverage SHAP? Theoretically, one concern could be that Leverage SHAP necessarily requires sampling m=Ω(nlogn)m = \Omega(n \log n) rows (IIUC based on Thm 1.1), whereas I believe Kernel SHAP allows you to pick mm arbitrarily (albeit without a guarantee). Thus perhaps for small running time budgets, maybe Kernel SHAP can sometimes be more effective than Leverage SHAP. Or perhaps Kernel SHAP is just better optimized as a practical implementation. I think Figure 3 (sample size mm vs error) nearly answers this question, but my question is whether there is any subtlety in how mm translates to actual running time. And in general if there is any catch to the "strictly dominates Kernel SHAP" question.

Minor nit: in two places (lines 132 and 266) there is a typo: "principal" -> "principle".

评论

Dear Reviewer LX1d,

Thank you for your time and feedback! We address your questions below.

It would be nice to have a fuller picture of the Shapley value problem from a broader approximation algorithms standpoint …

Computationally, the dominant cost in estimating Shapley values for applications like feature explanation is almost always computing the value function, v(S)v(S). As such, our work and much of the prior work, focuses on reducing the number of sets for which the function might need to be evaluated to estimate ϕi\phi_i for all ii. The most relevant papers that propose methods for doing so are discussed in Section 1.1. For example, Strumbelj & Kononenko 2010 and Mitchell et al. 2024 reuse samples by selecting a random permutation S1S2SnS_1 \subseteq S_2 \subseteq … \subseteq S_n and using v(Si)v(S_i) to evaluate both v(Si+1)v(Si)v(S_{i+1}) - v(S_i) and v(Si)v(Si1)v(S_{i}) - v(S_{i-1}) when computing a standard Monte Carlo estimate of the summation form of the Shapley values (our Equation 1). In general, the regression approach taken by KernelSHAP tends to be more efficient than alternative approaches like this, which is why we focus on improving that method.

Are there any computational inapproximability results?

This is an interesting question. As mentioned, since evaluating the value function, vv, generally dominates the complexity of computing Shapley values, we believe a query complexity model would be a natural way to address this question – i.e., how many queries to to vv are required to estimate the Shapley values to some given level of accuracy? In this model, Ω(n)\Omega(n) is a natural lower bound on the number of function evaluations needed to exactly compute the Shapley values, and we conjecture that this lower bound could be improved to Ω(n/ϵ)\Omega(n/\epsilon) to obtain a (1+ϵ)(1+\epsilon)-approximation in the sense provided in our paper.

Here’s an informal argument for the Ω(n)\Omega(n) lower bound: Consider the special linear setting where the set function is given by v(S)=iSαiv(S) = \sum_{i \in S} \alpha_i for some set of weights α1,,αn\alpha_1, \ldots, \alpha_n. In this setting, it can be seen that the Shapley values are exactly equal to α1,,αn\alpha_1, \ldots, \alpha_n. So, we must learn these weights exactly to learn the Shapley values. If we query v(S)v(S) for <n< n choices of SS, we obtain a linear system with more unknowns than equations, so cannot determine the values of α1,,αn\alpha_1, \ldots, \alpha_n.

As for obtaining a stronger lower bound, a natural starting point would be the Ω(n/ϵ)\Omega(n/\epsilon) lower bound proven for general active linear regression problems in the COLT 2019 paper “Active Regression via Linear-Sample Sparsification” by Chen and Price. This at least rules out a better algorithm based on linear regression – we will have to think more if it can be extended to a general sample complexity lower bound for Shapley value estimation. It’s a great question for future work!

But I am curious if the authors have an idea of a "dream result" under less restrictive but still reasonable structural assumptions…

Our O(nlogn+n/ϵ)O(n \log n + n /\epsilon) guarantee is already quite close to what we conjecture is optimal, so we think of this guarantee as almost a “dream result”. We think it would interesting to consider if other problem "structure" can lead to even better sample complexity, although for now we have focused on the general problem. Shapley values are used for a wide variety of machine learning models and Shapley value estimation algorithms like KernelSHAP are often treated as a "black-box", so as a first step, it's important to obtain results that work without additional assumptions.

One additional “dream” would be to remove the γ\gamma factor in the 2\ell_2-approximation guarantee. We believe this factor is inherent for regression-based approximation algorithms (as experimentally confirmed in Figures 5 and 6). However, we could imagine there is a non-regression-based algorithm that does not have a dependence on the quality of the optimal regression solution, as measured by γ\gamma.

One thing that was not totally clear to me from the experiments is whether Leverage SHAP strictly dominates Kernel SHAP at every point in the running time vs accuracy graph…

All of the regression-based methods (including Optimized Kernel SHAP and Leverage SHAP) need at least nn samples so that the approximate regression problem has full rank, hence all our experiments are run with mnm \geq n. In these experiments across all eight datasets, we find that Leverage SHAP always dominates Optimized Kernel SHAP (except for cases when both methods manage to obtain the exact Shapley values up to machine precision, as in Column 1 of Table 1). You are correct that the guarantee on Leverage SHAP requires m=Ω(nlogn)m = \Omega(n \log n), but the method can still be run for smaller values of nn, and we find experimentally that its performance remains superior for all mnm \geq n.

Due to space constraints, we continue in a comment.

评论

I do think it would be helpful for the authors to discuss the limitations of the Leverage SHAP algorithm a bit more (e.g. does it strictly dominate all prior algorithms?)...

We find experimentally that Leverage SHAP dominates even Optimized Kernel SHAP on every dataset and setting of mm. However, a limitation of the Leverage SHAP algorithm, and regression-based algorithms including Optimized Kernel SHAP more generally, is that the quality of the approximation depends on the optimal regression solution. This is reflected by the γ\gamma term in the 2\ell_2-norm approximation, and all regression-based algorithms have an empirical dependence on this factor as shown in Figures 5 and 6.

Minor nit: in two places (lines 132 and 266) there is a typo: "principal" -> "principle".

Fixed, thank you!

评论

Thank you for the responses. What I meant by hardness of approximation is formal results establishing e.g. NP-hardness for approximating the solution up to some factor. The number of calls to vv seems to be only naively upper bounded by 2n2^n and lower bounded by Ω(n)\Omega(n). This is a vast gap. What for example is the complexity of obtaining a constant factor approximation? Is it feasible in polynomial time (assuming vv can be evaluated in polynomial time for all SS)? Or is it computationally hard? Hope the question makes more sense now.

The other responses are satisfactory and I will keep my score.

评论

Thank you for clarifying. To ask about computational hardness of approximation, we need to be a bit more concrete about what exactly we mean by a “constant factor approximation”. In your setting, Theorem 1.1 would already imply a polynomial time algorithm for finding approximate Shapley values ϕ~\tilde{\phi} that satisfy Zϕ~y<CZϕy||Z\tilde{\phi} - y|| < C ||Z \phi - y|| for constant C. I.e., when the Shapley value problem is viewed as an optimization problem, we provide a constant factor approximation in polynomial time.

On the other hand, suppose we wanted to, e.g., ensure that 1Cϕiϕ~iCϕi\frac{1}{C} \phi_i \leq \tilde{\phi}_i \leq C \phi_i for all ii. We claim that obtaining this goal for any constant CC is NP-hard in many settings. In particular, to ask about computational hardness, we also need to to be more concrete about what the input to the algorithm is: i.e., is vv given as a polynomial size circuit, a polynomial size formula, or as a black-box, unit cost oracle? In all of these settings the problem is NP-hard to approximate. In particular, consider the case when vv either evaluates to 00 for all sets, or evaluates to 00 for all sets except for a single set SS. In the first case, the Shapley values will all equal 00, whereas in the second case, the Shapley values for indices in SS are non-zero. As such, to obtain a multiplicative approximation, we need to determine if there is some set SS for which vv does not evaluate to 0. When vv is a black-box, finding such a set clearly requires Ω(2n)\Omega(2^n) time (we can only enumerate all possible inputs). However, it is still hard when vv is given other forms. For example, if vv is a given as a circuit, then determining if there is an S for which v(S)0v(S)\neq 0 is equivalent to the NP-complete circuit SAT problem.

We will add comments on this in the next version of the paper, as others might have the same question. This argument also makes it clear why asking for a multiplicative approximation is not reasonable, and why instead we might care about an approximation in objective value. This is the case for many computational problems: for example, in the class of NP-hard optimization problems like set cover, we do not typically ask to approximate the optimal solution itself, but instead to find another solution with nearly the same objective value as the optimal solution (whether or not it is similar to that optimal solution or not).

评论

Thank you for the further clarifications. It seems like there are nontrivial subtleties in properly formulating the problem in a way that gets at what we want (IMO, that vv is a black box oracle and that we care most about query complexity). I definitely agree of course that in many approximation algorithm settings one wants an approximation to the objective function and not to the solution itself. It is just that Shapley values are an unusual case where arguably the "primary" definition is direct (namely Eq 1), and only as a secondary matter do we start to view it as a solution of a certain optimization problem (Eq 2). Somehow the objective function seems a bit more post-hoc / less canonical here.

In any case, the takeaway for me is that perhaps a hardness of approximation view is not so straightforward. I do think some of this discussion would be suitable somewhere in the paper or an appendix.

AC 元评审

This work considers the important Shapley value estimation problem. Authors show that for specific design matrices, there is a closed-form solution that can be used for leverage score sampling, which reduces the computational cost by orders of magnitude. All reviewers agree that the paper is well written and easy to follow, the proposed algorithm is interesting and significant.

审稿人讨论附加意见

Authors addressed clarity questions on their technical contribution during rebuttal. The overall rating is high, and thus there was no AC-reviewer discussion.

最终决定

Accept (Spotlight)