PaperHub
6.0
/10
Rejected3 位审稿人
最低5最高8标准差1.4
5
8
5
3.7
置信度
正确性2.7
贡献度2.7
表达3.0
ICLR 2025

Certified Robustness to Data Poisoning in Gradient-Based Training

OpenReviewPDF
提交: 2024-09-28更新: 2025-02-05
TL;DR

This work introduces provable guarantees on the behavior of models trained with potentially manipulated data, based on convex relaxations of the reachable set of parameters.

摘要

关键词
Data PoisoningCertified RobustnessNeural Networks

评审与讨论

审稿意见
5

This paper proposes to certify the robustness of a model trained on possibly poisoned dataset through the robustness of its parameters. Specifically, the model performance is bounded by the worst-case performance over all possible parameters that can be trained over all possible poisoned datasets under a capability constraint on the attacker. The authors therefore propose to certify a parameter-space bounds, and provide a practical algorithm when the model is a fully-connected neural network. The authors demonstrate their certified accuracy through the proposed method on various datasets.

优点

  1. The paper is well-organized and easy to follow.
  2. This paper provides an alternative perspective to certify the model performance through the robustness of model parameters.
  3. The topic of AI security and robustness regarding malicious model attacks attracts a signficant amount of attention in recent years.

缺点

Major Concerns:

  1. While it is intuitive that the variation of model parameters can serve as a tool to evaluate its robustness, the guarantee it could provide is extremely loose, at least as presented in this work. In particular, the authors consider the worst-case parameter interval against all possible poisoning dataset as in Eq. (6), and relax it further to Eqs. (8) and (9). It is questionable how useful such a valid but loose bound will be. This concern is to some extent confirmed by the experimental results. Examples include (i) in Fig 1, either a small perturbation level or a small number of poisoning data points is considered to avoid meaningless result. In Fig 1(c), flipping only 4 labels causes an almost void certified bound. (ii) in Appendix H, the authors acknowledge that their bound is extremely loose compared to real-world attacks and corresponding model performance.

  2. The practical evaluation of Eq. (6). As acknowledged by the authors in Section 3, obtaining an parameter interval by Algorithm 1 is intractable in general. The authors provide an approach to solve Eq. (10) for fully-connected neural networks only. It seems too ambitious to derive the reachable parameter interval against all potential attacks. Even if we can somehow approximate it, as mentioned in the first point, this interval may turn out to be vacuous.

  3. Attack goals formulated in Section 2.2. While authors claim the goal of the attacker is to maximize the training loss such as Eqs. (3), (4) and (5), it makes more sense to consider controlling the generalization error or the test loss, because people are generally more interested in the test performance. To this end, the experiments should also report the certified and true performance on test datasets. Moreover, Eq. (5) is more similar to adversarial attacks instead of backdoor poisoning. A backdoor poisoning attack has dual goals of preserving the performance of clean data and ensuring the accuracy of the backdoored data in terms of the target label.

Minor Concerns:

  1. It is unclear to me what is the close-form solution to Eq. (10) for neural networks in terms of ϵ,ν,n,m\epsilon, \nu, n, m. An algorithm explaining these details will help readers better understand it.

  2. Some results are neither proved or properly referred, such as Theorem 3.3 and B.1, and Proposition 1 and 2.

问题

Please see the Weakness section above.

评论

We thank the reviewer for their consideration of our submission and their useful feedback. Below we address the weaknesses that they offer.

Proofs

We thank the reviewer for pointing out the missing proofs and mislabelled proposition. We have changed the labels on Propositions 1 and 2 to be the same, since they refer to the same result. Theorems 3.3 and B.1 present descent direction bounds for the bounded and unbounded adversaries, respectively. We have included proofs to all theorems and propositions in the updated manuscript.

Lack of scalability

While the reviewer is correct in their assessment that our proposed methodology over-approximates the worst-case parameter interval, it is not the case that the corresponding guarantees are vacuous. In particular, we point the reviewer to our later experimental results, such as UCI regression, OCT-MNIST, and PilotNet, where non-vacuous guarantees are provided for poisoning attacks of up to m=10000, m=600, and m=600, respectively.

As acknowledged extensively in the paper, our proposed algorithm has particular problem classes for which the bounds are relatively weaker, such as certain classification and training settings. We agree that in these settings our guarantees may not be practically useful, for example the weak guarantees for label-flipping in the halfmoons or MNIST datasets. However, these results are included both to highlight the particular advantages and limitations of our method, and as a starting point for future work. Additionally, the weakness of our method on a particular setting does not preclude the strong guarantees that we observe for other attack settings.

Outside the practical application of our robustness certificates, our approach offers a novel theoretical testing paradigm to understand the sensitivities of standard learning algorithms and data pipelines to poisoning attacks. As an example, though there exists a practical gap between our certificates and real poisoning attacks (Appendix H), models with weaker guarantees appear to be more susceptible to attack when compared with models with stronger guarantees (Figures 7 and 8). Therefore, even weak bounds can provide comparative insights into the robustness of training pipelines.

Practicality

The reviewer is correct that obtaining an exact parameter interval is not tractable. However, Algorithm 1 provides a means to compute an over-approximate parameter interval. This is looser than the true parameter interval but may still be useful in many settings, as addressed above. The over-approximate reachable parameter set is indeed a valid bound against all potential attacks (see Theorem 3.2).

Regarding model architecture, while in this paper we focus on feedforward networks, our algorithm can be extended to general architectures in the future.

Attack goals

We thank the reviewer for pointing out a lack of clarity in Section 2.2. We note that “untargeted” poisoning attacks specifically target the training loss with the goal of denial of service / preventing model convergence. This is following the taxonomy of poisoning attacks in [1]. Other poisoning attack goals, namely targeted poisoning and backdoor attacks, are evaluated over a separate test dataset. All results in the experimental sections are reported with respect to test datasets.

Regarding backdoor attacks, the loss as formulated represents the adversary's goal assuming any of the test data may be backdoored, and does not specify the clean data. As the "defenders" against the attack, we do not know a priori which of the test data is targeted by a backdoor attack. Thus, the formulated loss represents the performance on the test data assuming each point may be targeted. In practice, an attacker may only target a certain subset of the test data, or have more complicated attack goals. If this subset of data is known, it is trivial to use our parameter-space bounds to compute certificates on the clean and backdoored losses separately.

[1] Tian et al. 2022. “A Comprehensive Survey on Poisoning Attacks and Countermeasures in Machine Learning”

Closed-form gradient bounds

Solving the optimization problem in Eq. (10) is in general not tractable for neural network models. Instead, we provide an approach for soundly bounding the optimal value of this optimization problem via bounds-propagation. Bounding Eq. (10) requires the following steps:

  1. Compute bounds on the output of the neural network using our CROWN-based linear bound propagation algorithm.
  2. Given the bounds on the output of the neural network, compute bounds on the gradients of the network using interval bound propagation.

In principle, step 1 proceeds as in [2], but with our modified expressions for the equivalent weights and biases. Step 2 follows the interval bound propagation procedure as outlined in Appendix F.

[2] Zhang et al. 2018. "Efficient neural network robustness certification with general activation functions."

评论

I thank the authors for their detailed response.

I agree with the point that this paper provides a tool to understand the sensitivities of standard learning algorithms and insights into their robustness. The experimental results on OCTMNIST and PilotNet also demonstrate that the certification is valid.

Nevertheless, the tightness of practical evaluation of such bound remains concerning. It looks like the reachable parameter interval will explode after a few iterations as it grows larger and larger, and it is suprising for me to see the results on PilotNet is non-vacuous. Also, does the calculation of ΔΘ\Delta\Theta in Algorithm 1 (line 6), or solving Eq. (10), realistic for models other than feed-forward neural networks? This paper's solution to this optimization step seems based on Prop 1, which is specific to NNs.

Overall, I increase my score to 5. I am willing to further increase the score if the authors can address the remaining concerns.

评论

We greatly appreciate the reviewer's time and continued consideration of our rebuttal. We hope the below clears up their concerns and are happy to provide any further clarification if needed.

The reviewer is correct in their assessment that under standard training circumstances the ground-truth reachable parameter set (and therefore our reachable parameter interval) will become vacuous as the number of training iterations grows. Yet, in cases where only fine-tuning is needed or where only a small proportion of the training data is untrusted, our bounds ought to remain tight and practically useful. We hope that future works will improve the approximations we make and/or will modify the training procedure/model itself to make it less susceptible to poisoning adversaries.

On the question of application to other architectures and models: Prior and concurrent works on bound propagation for neural networks have demonstrated the ability to bound robustness properties in complex architectures including convolutional neural networks [1] and transformer architectures [2]. Outside of neural networks, similar bounds to ours have been investigated for gradient-boosted decision trees [3]. Application and analysis using our framework can be carried out in this setting, though one must carefully employ our Definition 1 and Definition 2 in order to compute bounds on gradients.

[1] - https://arxiv.org/abs/1811.12395 [2] - https://dl.acm.org/doi/10.1145/3453483.3454056 [3] - https://arxiv.org/abs/1906.03526

审稿意见
8

This submission considers the problem of providing provable robustness guarantees to training-time adversarial attacks. Specifically, the considered perturbation models admits changes to a constrained number of nn feature vectors and mm labels, with optional constraints on their norm. The adversary's objective is to perform a succesful targeted or untargeted poisoning attack, or a backdoor attack.
Unlike prior work, the authors propose to generalize existing convex relaxation based methods for neural network verification to (1) determine a reachable set of model parameters at training time and (2) at inference time determine a reachable set of model outputs given this parameter sets.
The reachable set of model parameters is iteratively expanded with each gradient step. Given the perturbation model and bounds from the previous iteration, lower and upper bounds for the gradients are determined via convex relaxation while (1) considering worst-case choices of inputs and parameteers for max(m,n)max(m,n) inputs and (2) worst-case choices of parameters for the remaining inputs.
The strength of the resultant bounds for different perturbation model parameters are evaluated by training a model from scratch on a simple regression dataset and dimensionality-reduced MNIST, as well as fine-tuning on two other vision datasets.

优点

  • This submission proposes a novel solution to an extensively studied problem (susceptibility to training-time attack), which has received renewed attention due to the current excitement around language models.
  • The problem is well-motivated via references to prior work on poisoning attacks.
  • The method is clearly distinguished from prior aggregation-based approaches.
  • The idea of applying iterative reachability analysis to SGD is very natural. It is somewhat surprising that no one has tried this ansatz before, i.e., the work closes a very obvious gap in existing literature.
  • Rather than proposing a completely new relaxation approach, the work expands the well-established CROWN method (soundness appears to be good).
  • The method is applicable to a very general perturbation model.
  • The in-depth discussion of the method and its limitations in Section 3.4 is commendable.
  • The work opens various obvious avenues for future work that could be of interest for the wider robust ML community (refining single- and multi-neuron bounds, generalizing beyond feed-forward networks, enforcing consistency of perturbations between gradient steps etc.)
  • The manuscript is well-structured, beginning with a high-level approach (Paramaeter-space certification) and a specific instantiation of this approach (Abstract gradient training), before discussing the technical details of its individual components.

缺点

  • I agree with the authors' argument that the approach taken by this method is orthogonal to prior work. Nevertheless, the experiments would be significantly more informative if one were to use prior work as baselines. Instead of demonstrating "the method for verifying robustness does in fact verify robustness for sufficiently small perturbations", one could demonstrate "the verification method fills a useful niche in the certified-accuracy/runtime space for certain dataset / model sizes".
  • The authors claim that their procedure was applicable to any first-order optimizer like Adam (see ll.198-199). This claim appears to be incorrect, since it would require a reachability analysis of the optimizer's internal state (moments etc.). Instead, the method appears to be limited to vanilla SGD.
  • The submission does not include an implementation. Since the proposed method is rather involved, and implementing verifiers requires significant care, this significantly hinders reproducibility. I would strongly encourage the authors to provide code.
  • Even if one were to try and reimplement the method, the description of hyperparameters and other aspects of the experimental setup is hardly sufficient to reproduce any of the results (see Appendix J). I would strongly encourage the authors to upload experiment configurations with their code.
  • The discussion of related work is mostly relegated to Appendix A. This is an integral part of the paper and should (maybe with slightly more concise writing) be moved to the main text.

Minor weaknesses:

  • While the authors provide asymptotic bounds on runtime complexity, they do not specify the wall-clock time needed for veryifing the different models in the experiment section. This would make it easier to gauge the practical feasibility of the proposed method.
  • Based on the asymptotic runtime complexity, the use of small-scale models, and fine-tuning instead of training from scratch, the method appears to be hardly scalable to real-world settings -- except for some special cases. I do, however, not consider this a major weakness, as this work represents a first step in a novel research direction.
  • The definition of targeted attacks (Eq. 4) assumes the existence of a single, data-independent set of safe outputs SS. However, what constitutes a safe output may often be data-dependent (e.g., whether steering left/right is safe depends on whether the ground truth direction is left/right in the PilotNet dataset from Fig. 5). It should be easy to generalize the proposed method to this use case, since one just has to evaluate the model output bounds differently.
  • The definition of backdoor attacks (Eq. 5) does not enforce that the model should have high accuracy on unperturbed samples. As such, the guarenteees are likely to be very loose upper bounds on unnoticable backdoor attacks. It would probably be better to solve a constrained optimization problem over the set of reachable model parameters.
  • Prior work on parameter-level verification for Bayesian NNs is mentioned, but it is not explained why these methods are not applicable to the considered problem. For instance, Wicker et al. (2020) [1] appear to determine a "safe" set of weights such that only some desired set of outputs is reachable, which is essentially inverse to Theorem 3.1.

[1] Wicker et al. Probabilistic Safety for Bayesian Neural Networks. UAI 2020.


Overall, I believe that this submission is a meaningful, well-motivated contribution to the field of robust machine learning. It proposes a very natural approach to robustness certification under training-time attacks that closes an obvious gap in existing literature.
The submission is primarily held back by poor reproducibility and somewhat uninformative experiments, which do not provide any insights about when parameter-space certification is preferable to existing aggregation-based approaches.
I thus recommend borderline/weak acceptance. Should the authors address either of these weaknesses (e.g. by sharing a link to anonymized code or comparing to some aggregation / randomized smoothing method on MNIST), I will further increase my score (even if the experimental results are negative).

EDIT: Following the rebuttal, which addresses both of my primary concerns, I have decided to increase my score to 8 (accept).

问题

  • Shouldn't we use SEMinm+n\mathrm{SEMin}_{m+n} in Theorem 3.3, since the mm feature vectors and nn labels that are perturbed can belong to disjoint sets of samples (see see ll.117-118)?
  • Aside from increased computational cost, are there any reasons why you decided to extend CROWN instead of more recent methods like (α,β\alpha,\beta)-CROWN, MN-BAB etc.?
评论

We thank the reviewer for sharing their detailed feedback on our submission, which has helped us clarify and strengthen several aspects of the submission. Below, we address their comments in detail.

Weaknesses

  • We agree with the reviewer that a direct comparison with existing certified training algorithms would help demonstrate both the advantages and limitations of Abstract Gradient Training. Therefore, we have added Appendix J.1, which provides a direct comparison between AGT and a DPA, a popular aggregation-based certified training algorithm. We temporarily include the figure from (Levine & Feizi, 2020), and are working on reproducing the results of DPA ourselves for the final version of the paper. We note that we have not provided a comparison to randomised smoothing approaches, as they provide only probabilistic guarantees of robustness.
  • Regarding reproducibility, we provide the full source code for our experimental results at the following anonymized link https://anonymous.4open.science/r/agt-DAAB. The source code will additionally be published with the final version of this paper.
  • Our certification framework is in fact applicable to any first-order optimization algorithm, such as SGD variants (e.g. momentum-based methods) and Adam. The reviewer is correct that Algorithm 1 only applies to vanilla SGD, but it can be extended to other optimization algorithms. This does indeed require reachability analysis of the optimizer's internal state (e.g. storing valid bounds on the moments at each iteration), but this fits within our framework, requiring small changes to Algorithm 1. We have provided an example of bounding SGD with momentum in the linked source code (under abstract_gradient_training/optimizers.py).

Minor Weaknesses

  • We have updated Appendix J with a comparison of the run-time of AGT vs standard training using Pytorch.optim. We highlight the following results here: For the UCI regression dataset (Figure 2), each certified training run takes approximately 50 seconds using our implementation of Abstract Gradient Training. For comparison, training the same model in vanilla pytorch takes approximately 20 seconds. On the other hand, fine-tuning PilotNet (Figure 5) in AGT takes ~110 seconds, compared to 80 seconds in Pytorch. Thus in practice AGT comes with only a modest computational penalty.
  • As written, the definition of safe output S in Eq. (4) does not take into account data-dependent safety. As noted by the reviewer, this is not a fully general setting and the safe set for a particular prediction may depend on the ground truth label, for example. It is simple to generalise our certificates to this setting, and notationally we will adapt our formulation by adding S(x, y) to the definitions in Section 2.2.
  • The definition of backdoor attack goal does indeed not take into account the performance on unperturbed samples. However, as the "defenders" against the poisoning attack, we do not know a priori which of the test data is targeted by a backdoor attack. Thus, the formulated loss represents the performance on the test data assuming each point may be targeted, which is a worst case assumption which may be loose in practice.
  • The prior work of Wicker et. al. 2020 computes a per-input safe-weight set in order to certify probabilistic safety for BNNs. Despite being different overall problems, the key computational differences between that approach and ours are (1) we must compute bounds on the gradient thus requiring propagation both forwards and backwards through the network architecture (2) Wicker et. al. 2020 builds a "safe-weight set" given an input-output specification and trained machine learning model whereas we compute the reachable parameter set of the training algorithm itself. The key similarity is that if one replaces the sampling approach in Wicker et. al. 2020 (used to build a safe weight set) to be the output of AGT then the approach of Wicker et. al. can be used to bound test-time adversaries in our poisoning setting. But this still requires AGT, our primary contribution.

Questions

  1. We thank the reviewer for pointing out this issue. The reviewer is correct that max(n,m)\max(n, m) should only be used in a “paired” poisoning setting, where the same data points must be chosen for feature and label poisoning. In the case of the non-paired setting, m+nm + n should be used in place of max(n,m)\max(n, m) in Theorem 3.3. We have corrected this in the revised copy of the paper, along with re-running the affected plot (Figure 1.d).
  2. In principle, many neural network verification algorithms can potentially be extended to fit within our framework. Indeed, α,β\alpha,\beta-CROWN in particular requires only a small tweak to the CROWN-style bounds we present in our paper. However, extending other verification algorithms for use in Abstract Gradient Training is outside the scope of this work. CROWN was chosen for this work as a trade-off between complexity and tightness.
评论

Thank you. This addresses both of my primary concerns (reproducibility and comparison to aggregation-based methods w.r.t. runtime and certified accuracy). I have increased my score accordingly.

审稿意见
5

This paper attempts to provide provable guarantees on the behaviour of models trained with potentially manipulated data without modifying the model or learning algorithm. The method uses convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning attack, and provide bounds on worst-case behaviour.

优点

The paper proposes a framework, with a (claimed to be novel) bound propagation strategy, for computing bounds on the influence of a poisoning adversary on any model trained with gradient-based methods, from the above, a series of proofs are suggested to bound the effect of poisoning attacks, finally, empirical evaluation are proposed to illustrate the approach.

缺点

While I might have missed the rationale for computing the set of all reachable trained models given how loose such (worst-case) bounds can be, the approach seems to claim novelty while overlooking important previous contributions, in particular, the use of influence functions in robust statistics, and their revival in modern machine learning.

Cook, R. D. Detection of influential observation in linear regression. Technometrics, 19:15–18, 1977.

Cook, R. D. Assessment of local influence. Journal of the Royal Statistical Society. Series B (Methodological), pp. 133–169, 1986.

Cook, R. D. and Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics, 22:495–508, 1980.

Cook, R. D. and Weisberg, S. Residuals and influence in regression. New York: Chapman and Hall, 1982.

Koh, P.W. and Liang, P., 2017, July. Understanding black-box predictions via influence functions. In International conference on machine learning (pp. 1885-1894). PMLR.

问题

My main question would be on how this approach compares (in terms of computing the exact influence of a poisoned dataset) to the finer analysis in the references provided above.

评论

We thank the reviewer for bringing to our attention the previous literature regarding influence functions. Indeed influence functions are relevant and have been previously applied to the problem of training-set adversarial attacks. We will update our related works section with these in consideration.

We now turn to pointing out the differences between their approach and our approach. In particular, we note that influence functions provide only an approximate measure of the influence of a training point, whereas our method aims to provide certificates (i.e. formal, non-probabilistic, guarantees) of the maximum influence a training point can have. Paraphrasing [1], influence functions give an efficient approximation to the change in the loss given a small perturbation to a training point. [1] further performs a series of approximations to compute this sensitivity for neural network models. Towards data-poisoning, they then use their approximate influence function to compute training-set attacks by perturbing existing training data (i.e. taking the role of the adversary).

In contrast to this, the reachable parameter set computed by our algorithm takes a distinctly different approach. Firstly, we compute a sound reachable set, that is for any training data within the allowable perturbation, the parameters of the model are provably guaranteed to lie within the reachable set. Additionally, our approach leverages convex-relaxation and bound propagation, while influence functions (in the context of [1]) rely on quadratic approximations to the empirical loss function. Finally, our approach aims to provide certified guarantees on trained models, taking the role of the model owner / "defender".

We hope that this addresses the reviewers questions and concerns and are happy to answer any further questions or discuss further concerns the reviewer might have.

[1] Koh, P.W. and Liang, P., 2017, July. Understanding black-box predictions via influence functions. In International conference on machine learning (pp. 1885-1894). PMLR.

评论

Thanks to the author for their answer. I keep my concern about this approach, which is similar to the concern on tightness from reviewer yEud. Specifically, the tightness of practical evaluation of such bound remains concerning (in fact impractical) as the reachable parameter interval will explode.

For the fundamental comparison between a formal (non-probabilistic) approach as in this paper, and the statistical approach based on approximations as in influence function based methods, it might be needed to remind the authors that approximations can sometimes be more precise than exact (or formal) computations ! and this case is a perfect illustration for this, since the reachable set can be very broad and practically useless as an information, when trying to narrow down the poisoning capabilities of the attacker. I will unfortunately keep my score and recommend that the authors thoroughly compare their approach to the one with influence functions.

评论

We appreciate the reviewers time and continued engagement with our paper. We believe the most recent response suggests a misunderstanding of the overall purpose of our contribution, which we should have clarified.

Our work intends to develop a formal proof, i.e., a sound certification that an attacker cannot achieve a poisoning goal. For instance, from the second sentence of our abstract: “Provably bounding model behavior under such attacks remains an open problem.”

We note that while influence functions represent a valuable form of analysis for poisoning attacks, they cannot be used to certify this robustness and therefore do not constitute an alternative solution to our problem of interest.

We appreciate the reviewers concern about the tightness of our verification approach; however, as we have discussed with Reviewer yEud, our experimental results on OCTMNIST and PilotNet suggest that though over-approximate (to ensure soundness), our bounds are still both valid and useful.

评论

We thank all of the reviewers for the detailed reviews. We believe that the edits in response to these reviews have improved the clarity of our work. We have provided a revised version of the manuscript with the following changes (highlighted in blue):

  • Added proofs for Theorem B.1, Theorem 3.3 and Proposition 1.
  • Added wall-clock time comparison to Appendix J.
  • Added direct comparison with aggregation-based algorithms.
  • Collected all proofs into a dedicated appendix (I) for clarity.
  • Changed max(n,m)\max(n, m) to n+mn + m in Theorem 3.3, and re-run the affected plot (Figure 1, panel d).
  • Added clarification to Section 2.2 regarding test vs training time attack goals and data-dependent safe-sets.
  • Corrected the heading of Figure 4.
AC 元评审

This paper explores the problem of providing provable robustness guarantees to training-time adversarial attacks. The proposed framework certifies robustness against poisoning and backdoor attacks by bounding the set of all reachable parameters, with worst-case guarantees on model performance and attack success rate.

Reviewers were interested in the approach and found the ideas fairly novel. However, there were major concerns that the bounds given by the proposed method were extremely loose and hence may not be meaningful in practice. This point was not resolved through the rebuttals. Also, the experiments were not well grounded in prior research since the authors did not use baseline methods from the literature. While this was partially added during rebuttals, the experiments have yet to be presented with sufficient rigor.

The work is promising, and the authors have worked to improve it during this review process, but the changes required are not quite complete, and are too major to be accepted as is. I am recommending rejection, but want to encourage the authors to resubmit to a comparable conference after taking the reviewer’s feedback into consideration.

审稿人讨论附加意见

Reviewers asked about the practical scalability of the approach, and during the rebuttal the authors both shared code, and discussed a few small scale experiments on computational cost. These are both beneficial, but the experiments as presented are informal and should be fleshed out in a future submission. This type of evidence would be more convincing if run on larger scale datasets.

Reviewers were concerned that the bounds given by the proposed method seem to be extremely loose and hence may not be meaningful in practice. This was not satisfactorily solved by the authors during the discussions.

Reviewers found that the experiments and discussions in the paper were not well grounded in prior research. For example there was a lack of comparable baselines used for comparisons. The proposed method takes an orthogonal approach to robustness, and it was discussed that no existing methods are directly comparable. However, this is not a good excuse to leave out all baselines. The authors should instead strive to discuss what the alternative methods are, and be clear about the differences with their approach, while at the same time providing empirical comparisons in the most fair way possible. This would enable the reader to make informed judgements on the relative value and quality of each type of robustness guarantee.

I note that the community often criticizes new approaches for not performing on par with incumbents, while ignoring that incumbents have had years of tuning and engineering to optimize these aspects. The actual performance of the proposed method compared to incumbents like randomized smoothing is not a major factor to me in my final decision. However, the lack of any comparisons is a major concern.

最终决定

Reject