PaperHub
5.0
/10
Rejected4 位审稿人
最低3最高6标准差1.2
6
5
3
6
3.0
置信度
正确性2.3
贡献度2.0
表达2.0
ICLR 2025

Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution

OpenReviewPDF
提交: 2024-09-27更新: 2025-02-05

摘要

关键词
Out-of-Distribution DetectionAdversarial Gradient AttributionSafety

评审与讨论

审稿意见
6

Due to the non-zero gradient behaviors of OOD samples, which do not exhibit significant distinguishability, this negatively impacts the accuracy of OOD detection. Considering this issue, the paper proposes a novel OOD detection method called S & I based on layer Splitting and gradient Integration via Adversarial Gradient Attribution. The proposed method incorporates adversarial attacks into attribution to explore the distributional characteristics of ID and OOD samples. Experiments demonstrate that S & I algorithm achieves state-of-the-art results.

优点

The proposed method has been evaluated on multiple datasets.
The paper provides theoretical proof to ensure the rationality of the method.

缺点

  1. The experiments conducted in the paper do not effectively support the proposed arguments.
    a) The paper mentions that "we argue that the non-zero gradient behaviors of OOD samples do not exhibit significant distinguishability, especially when ID samples are perturbed by random noise in high-dimensional spaces, which negatively impacts the accuracy of OOD detection." However, the experiments in the paper are just some general evaluations and do not highlight the points argued above. There is no experiment to discuss detecting the non-zero gradient behaviors of OOD samples.
    b) There is a lack of ablation studies to verify the importance of each part.

  2. The distributional differences caused by adversarial attacks are completely different from the differences between various datasets. Why can introducing adversarial attacks enhance OOD detection across different datasets?

  3. Can the proposed OOD detection be applied to distinguish between adversarial examples and clean examples?

  4. Is OOD detection still important in current deep learning? For example, in classification tasks with adversarial examples, one approach to avoid outputting the incorrect labels is to use detectors to refuse adversarial examples and only allow clean examples to be classified. However, these methods are gradually being abandoned because a robust system should be able to output correct labels for any input, not just simply refuse to output labels for adversarial examples. Therefore, under the settings of this paper, I am curious about which specific scenarios the OOD method is applicable or necessary to at present.

  5. Due to the presence of adversarial attacks and layer-splitting technology, will this lead to the consumption of a large amount of computational cost in the whole process?

Since I am not completely familiar with this task, I have set the confidence as 2.

问题

See Weaknesses.

评论
MethodFPS
our4.9019
GAIA6.3572
msp16.9090
odin6.6613
energy16.3934
gradnorm8.7750
rankfeat4.8416
react15.8378
评论

W1(a): Thanks to the reviewer for the suggestion. We would like to clarify that this view is one of the motivations of our paper. In lines 194-197, we explained that the attribution-based OOD detection method such as GAIA is developed based on the baseline selection of x'=0. GAIA checks OOD samples by counting the distribution of non-zero attribution gradients, so the highly OOD nature represented by non-zero attribution gradients has been proven. However, using the black image with x=0 as the baseline will make it difficult to retain the original semantic information of the sample when attributing, and it is easily disturbed by noise, which will cause deviations when counting non-zero attribution gradients, reducing the accuracy of OOD detection. Adversarial samples can retain the semantic information of samples while introducing minimal perturbations, so we designed an adversarial attribution-based OOD detection method by layer splitting and attribution integration, corresponding to the pseudo code of lines 409-414. It is worth mentioning that since our method is a whole, we did not split the algorithm for verification, and extensive experiments, especially the performance on the ImageNet dataset, have proved the effectiveness of our algorithm, that is, it can more accurately count the distribution of non-zero attribution gradients and thus improve the OOD detection effect.

W1(b): Thanks to the reviewer for the suggestion. We would like to clarify that the first two contributions summarized in the introduction are iterative, and the first contribution (Introducing adversarial examples in OOD detection) paves the way for the second contribution (Layer-splitting technology and integration using adversarial attribution). They are a whole, which can be seen from our pseudo code. So the ablation cannot be done separately.

W2: The purpose of adversarial attacks is to detect the stability around the input sample points, especially in the case of noise. As we mentioned in the response to Weakness 1 (a), GAIA uses x=0 as the baseline. For different datasets, this choice is often complex and ad-hoc, and x=0 is not suitable for all datasets. Adversarial attacks can use existing samples to generate adversarial samples with semantic similarity to the input point samples through a small number of iterations, which is more suitable for different datasets and is not affected by differences in dataset distribution.

W3: We would like to clarify that this task is beyond the scope of OOD detection and belongs to a different setting. In fact, the definitions of adversarial samples and OOD samples are also different, as can be seen in lines 143-157 and 210-234. The current methods for distinguishing adversarial examples from clean examples are common in adversarial defense, which is not relevant to the task of this article. OOD detection is more focused on identifying out-of-distribution data, that is, detecting whether the test sample comes from outside the distribution of the training data, so as to avoid the model from making unreliable predictions on unfamiliar inputs. Adversarial sample detection targets adversarial samples generated by adversarial attacks that are disguised as within the distribution. We believe that the question raised by the reviewer may be feasible in theory, but it needs to be further verified in combination with the nature of the adversarial attack.

W4: Thanks to the reviewer for the insight into our paper, we agree that an ideal robust system should be able to handle all inputs, including adversarial samples, and not just reject adversarial samples. However, OOD detection is still indispensable in safety-critical applications such as autonomous driving, medical care, finance and other fields. As an additional safety mechanism, OOD detection ensures that inputs that differ significantly from the training distribution do not lead to potentially dangerous decisions. Although rejection-based adversarial defense methods are being revisited, OOD detection can be used as an auxiliary mechanism to identify unknown distributions and help achieve robust processing. Therefore, we believe that OOD detection can still play a significant role in scenarios where system reliability, security, and adaptability to novel data are crucial.

W5: Thanks to the reviewer's suggestion, we provide a time consumption comparison between our method and other baselines, with the evaluation indicator FPS, that is, the number of image frames generated per second. It can be seen that on the ImageNet dataset, although our method is slightly slower than GAIA, it has achieved a significant performance improvement. For Rankfeat, our method can not only achieve faster running efficiency but also perform more accurate OOD detection. Therefore, we believe that the running cost is an acceptable price.

评论

Thanks for the responses. Some of my concerns (W4, W5) have been addressed, but I still have concerns (W1, W2, W3) following:

For response 1:
Thanks for the further explanation, and I have noted the related descriptions in the original manuscript. However, my concern is that the existing experiments do not effectively support the claims in the paper, and some key experimental results are missing. Hence, in my previous review, I would like to see more experiments specifically considering non-zero gradient behaviors and ablation studies. Also, I believe that the current architecture allows for ablation studies, such as using a) a baseline model without any changes, b) a model without layer-splitting technology, and c) the final proposed model. If there is any error in my understanding, please point it out.

For response 2&3:
To better discuss, I further summarize my two questions: First, the model is trained with adversarial examples but is not applied to tasks with adversarial attacks. Instead, it is used for OOD detection in traditional, non-attack scenarios.
A natural question arises: Why use adversarial examples for training? Is it feasible to train with noisy examples (not adversarial perturbations, but common corruptions)? And what are the differences between the two? The second question is, why has not the proposed model trained with adversarial examples been applied to adversarial detection tasks, e.g., treating adversarial examples as a special case of OOD?
Your responses partially addressed my questions. However, the original manuscript introduces the concept of adversarial examples in many places, yet the final goal is completely unrelated to adversarial tasks. I believe this somewhat reduces the readability of the paper.

In summary, the W2&3 are not the main reason for my borderline rejection of the paper, but I strongly recommend that the authors clarify these two concepts and their corresponding analyses more clearly in the paper. More importantly, if the authors can further provide the experiments mentioned in W1 and more discussion, I am willing to increase my score.

评论

Thanks to the reviewer for the reply. For W1 What we want to clarify is that since non-zero gradient represents a high confidence OOD sample probability (as expressed by GAIA), it is obvious that the accuracy of calculating non-zero gradient will significantly affect the accuracy of OOD detection. This is the starting point for designing our method, and the performance improvement in extensive experiments (especially on large-scale ImageNet) has proven that our method can more accurately calculate the attribution gradient (i.e., we can obtain more accurate non-zero and zero gradient distribution) to obtain better OOD detection performance.

For the ablation experiment, we thank the reviewer for inspiring us, although we believed in our previous reply that our method is integrated (adversarial attack is the prerequisite for adversarial attribution, and we need to perform layer splitting technology when attribution to ensure that the attribution gradient calculation on each layer is accurate because changes in the input of the previous layer will affect the attribution gradient on the subsequent layer), but it is true that separate ablation studies of the adversarial attack module and the layer splitting module can help us know the role of each module. We conducted the three ablation experiments a), b), and c) as the reviewer proposed, and the results are as follows:

Table 1. OOD Detection Method: Ours (Without Changes on Baseline)

OOD DatasetFPR95 (%)AUROC (%)
iNaturalist34.7592.48
Textures55.0588.73
Sun24.8595.38
Places40.6091.47
Average38.8192.02

Table 2. OOD Detection Method: Ours (Without Layer Splitting)

OOD DatasetFPR95 (%)AUROC (%)
iNaturalist28.7093.82
Textures61.7985.72
Sun32.7592.90
Places55.8084.12
Average44.7689.14

Table 3. OOD Detection Method: Ours

DatasetFPR95 (%)AUROC (%)
iNaturalist28.1593.76
Textures39.1792.90
Sun33.2392.68
Places46.9888.33
Average36.8891.92

We are also willing to provide some heuristic explanations for the results of ablation experiments. Comparing Table 3 with Table 2 (without layer splitting technology), we can find that both the FPR95 and AUROC indicators have experienced a huge performance degradation, which is consistent with our motivation, because if layer splitting is not performed, the input of the previous layer will interfere the attribution gradient calculation on the latter layer. Comparing Table 3 with Table 1 (without changes to the baseline), we can find that the difference in AUROC between the two is very small (only 0.1), but on FPR95, our method is about 2% lower than Table 1, which strongly proves that introducing adversarial sample as the baseline to attribution can significantly improve the accuracy of attribution gradient calculation. Therefore, by integrating these two modules, we can obtain better OOD detection performance.

评论

Thanks for the further responses. In the experiments provided, there are two points of concern:

  1. Comparing Tables 1 and 2, it is evident that after introducing adversarial examples, the performance of the model declines on both FPR95 and AUROC. This seems to indicate that the first contribution has a negative impact.
  2. Although the proposed method slightly outperforms the baseline on average performance, when observed separately across four datasets: On FPR95, the proposed method performs better on only two datasets, and on AUROC, it also outperforms on just two datasets. Therefore, the current experimental results do not adequately support the claims.
评论

Thanks for the reviewer's reply. For the first point, we would like to clarify that we think Table 1 and Table 2 are not comparable because they control different variables. Table 1 corresponds to non-adversarial + layer splitting, while Table 2 corresponds to adversarial+ non-layer splitting. Therefore, if we want to understand the role of each module, we suggest comparing Table 1 and Table 2 with Table 3 respectively. The experimental results of Table 2 and Table 3 have strongly proved the necessity of layer splitting technology. For the experimental results of Table 1 and Table 3, we will discuss at the second point.

For the second point, first of all, the average performance is a key indicator to measure the overall performance of the method. We have achieved the best results on the average performance (especially for FPR95, and in the anomaly detection task, low false positive rate is particularly important for practical applications because it directly affects the false alarm rate of the system). Although the performance of individual datasets may not be as good as the baseline, this can be attributed to the characteristics of the dataset rather than the defects of the method itself. Secondly, we would like to emphasize that for FPR95, our improvement on the Textures dataset is close to 16%, while for the dataset Sun with the largest performance drop, it is only within 10%. The complex textures and high variability of the Textures dataset can better reflect the advantages of the adversarial module in complex scenarios. Finally, we would like to point out that we emphasized the integrity and synergy of adversarial attack and layer splitting in our previous response. We provided ablation experiments at the request of the reviewer. Since the connection between each part of our method is very close, forcibly splitting them for analysis may lead to the destruction of their synergy, resulting in the result failing to reflect the true potential of the method. Even so, our method still dropped 1.93% in average FPR. We believe that this result is enough to illustrate the importance of the adversarial module.

评论

Thanks to the authors for the clarification and most of the confusion has been resolved.
However, my original request was for the authors to provide experimental results for non-adversarial+non-layer splitting as the baseline. If possible, please provide this. Since the discussion time is close to the end, the authors can just provide some simple results.

评论

Thank the reviewer for the reply. Your comments and feedback are of great significance to our work! We have supplemented the experimental results of non-adversarial+non-layer splitting according to your request, as shown in Table 4:

Table4. OOD Dataset (Non-adversarial + Non-layer splitting)

DatasetFPR95 (%)AUROC (%)
iNaturalist47.0588.92
Textures60.1180.76
Sun46.3586.62
Places67.0079.57
Average55.1383.97

It can be seen that compared with Table 1 of Non-adversarial + layer splitting, Table 4 (Non-adversarial + Non-layer splitting) achieved a performance reduction of 16.32% and 8.05% on FPR95 and AUROC, respectively, and performed worse on each dataset, which shows the effectiveness of the layer splitting module. Compared with Table 2 of adversarial + Non-layer splitting, Table 4 (Non-adversarial + Non-layer splitting) achieved a performance reduction of 10.37% and 5.17% on FPR95 and AUROC, respectively, and performed worse on other datasets except the FPR95 indicator of Textures, which strongly shows the effectiveness of the adversarial module.

We sincerely hope that our above response has adequately addressed the reviewer’s concerns. We would be truly grateful if the reviewer could consider the possibility of a score adjustment.

评论

Thanks for the response. The above results have effectively addressed my concern (W1b). Additionally, I think these are important experiments that should be added to the manuscript.

Overall, considering that the authors addressed most of my concerns (W1b, W4, W5) well and explained others (W1a, W2&3), I decided to increase my rating from 5 to 6.

评论

We sincerely appreciate the reviewer’s consideration in raising our score. Your feedback and suggestions have been incredibly helpful for improving our work. If you have any further suggestions or questions you would like to discuss, please feel free to let us know!

评论

For W2&3 We want to clarify that the role of the adversarial attack itself is to use the trained adversarial samples as the attribution baseline, so as to obtain a more accurate attribution gradient distribution (we have cited the literatures in lines 202-206 to illustrate the feasibility). GAIA has stated that non-zero attribution gradients represent the probability of OOD samples with high confidence. This observation serves OOD detection. Therefore, the purpose of using adversarial attacks is still to obtain accurate attribution gradient distribution for OOD detection. This is not related to the tasks of conventional adversarial attacks. Let's take an example, I need to use a large language model to perform image tasks, but I do not absolutely need to train a large language model. For not using noisy examples, the reason we use adversarial samples is that adversarial samples can retain the semantic information of the original samples with minimal perturbations, which is defined by the properties of adversarial samples (this is also one of the reasons why we introduced the definition of adversarial samples in the submission, which serves our subsequent mathematical proof). In addition, although the adversarial samples use manipulated labels, the labels are still selected from the label set of ID samples. According to the definition of OOD samples, we can consider them to be ID samples. This is very important and is the premise of our proof in lines 304-316. Noisy examples do not have this property because we cannot guarantee that their labels are still in the label set of ID samples.

审稿意见
5

This paper proposes the S&I method to enhance the performance of OOD detection.

优点

1 The structure is clear.

缺点

1 The experimental results are suboptimal, with an average improvement of no more than 0.5% (AUROC metric) across different datasets. If the improvement is not significant, it suggests that the problem addressed in this paper may not be highly important. Please provide an experiment that can significantly enhance performance.

2 In Figure 2, the meanings of the x and y axes are not explained; the two histograms on GAIA lack subtitles to distinguish them. Please clarify these details.

3 The first two contributions summarized in the last paragraph of Chapter 1 require an ablation study to demonstrate their effectiveness separately.

问题

See weaknesses.

评论

W1: We thank the reviewer for the valuable comment. We would like to clarify that the insignificant improvement of our method on CIFAR100 does not mean limited effect, but means that our method can achieve the same or even slightly better performance than GAIA on small datasets. Besides, we would like to emphasize that our approach demonstrates significant improvements on the larger-scale ImageNet dataset. This distinction highlights the strength of our method in addressing the challenges of OOD detection in large-scale environments, which is a critical focus of our work. We also want to clarify that current OOD detection metrics such as FPR95 and AUROC have already achieved promising performance across many benchmark datasets. However, our approach prioritizes robustness and reliability in large-scale scenarios like ImageNet, where these challenges become more pronounced.

W2: We thank the reviewer for the valuable comment. We hope that the following clarifications can alleviate the reviewer's doubts. The x and y in Figure 2 represent the attribution gradient values ​​and frequency values ​​in the frequency histogram, respectively. The GAIA frequency histogram above represents the attribution gradient distribution before using GAIA for OOD detection. It can be seen that it is difficult to distinguish which value of the attribution gradient represents the OOD sample. After performing GAIA, as shown in the frequency histogram below, the OOD samples represented by non-zero attribution gradients can be distinguished. For our method, by performing multiple adversarial attacks to analyze the feature distribution shifts from ID adversarial samples to OOD input samples, we can progressively identify high-confidence non-zero gradients, thereby obtaining the true explanation pattern representations denoted by the shaded regions.

W3: Thank you for the reviewer's valuable suggestion. We would like to clarify that the first two contributions summarized in the introduction are iterative, and the first contribution paves the way for the second contribution. They are a whole, which can be seen from our pseudo code. So the ablation cannot be done separately.

评论

The authors' response has not addressed my concerns. I still believe that the algorithm proposed in this paper provides only a very marginal improvement in the AUROC metric, with an increase of less than 0.5% even on ImageNet. This suggests that the problem addressed by the proposed method is either not very significant or overlaps with problems already solved by existing methods. If that is not the case, the authors could provide compatibility experiments to demonstrate the orthogonality of their method with existing approaches.

评论

Dear Reviewer rQF8, thank you very much for your detailed review and valuable feedback on our work. We truly appreciate your insights and have given further thought to your suggestions, especially regarding the issue of limited performance improvement. We would be delighted to discuss your feedback and our responses further. If there are any aspects that require additional clarification or if you have any further suggestions, we would be more than happy to provide a timely response.

评论

If you can provide an experiment that demonstrates the significance of your method, I would be very happy to raise my score. Although performance improvements on existing benchmarks have become difficult, the authors could find a scenario that benefits the proposed method (explaining the rationale of the scenario) to highlight the significance of the proposed approach.

评论

Dear Reviewer rQF8,

Thank you for your constructive feedback and the opportunity to elaborate further on the significance of our work.

In the context of out-of-distribution (OOD) detection, even seemingly marginal performance improvements in percentage terms can have substantial practical implications, particularly in large-scale datasets. For instance, in our experiments on a dataset containing 10,000 samples, our method detected 90 additional OOD samples compared to the state-of-the-art GAIA method. Scaling this up to 1,000,000 samples, our approach could potentially detect nearly 10,000 additional OOD samples.

The significance of detecting these additional OOD samples becomes clear when considering real-world applications:

  • Financial fraud detection: Missing even a few anomalous transactions could result in significant financial losses.
  • Network security: A small number of undetected abnormal traffic instances could lead to major breaches.
  • Industrial manufacturing: Undetected defective products could trigger quality crises.
  • Autonomous driving: Missing the detection of a few pedestrians or vehicles could result in severe accidents.

By enhancing OOD detection capabilities on large-scale datasets, our method improves system reliability and mitigates potential risks in specific domains.

Furthermore, we have uploaded the additional OOD samples detected by our method to an anonymous code repository https://anonymous.4open.science/r/S-I-F6F7/additional_OOD_samples/ for detailed analysis. Interestingly, these extra detections predominantly belong to the "plant" category. This suggests that our model exhibits superior detection performance for certain OOD sample categories.

Such insights are critical for applications where specific categories of OOD samples hold particular significance:

  • In autonomous driving, accurate detection of pedestrians or unexpected obstacles is crucial for preventing accidents.
  • In medical diagnostics, identifying specific lesions or abnormalities is vital for timely treatment.

These findings highlight that improvements in OOD detection methods are not only about enhancing overall performance but also about addressing high-stakes, real-world scenarios where accurate detection can significantly reduce risks and improve system robustness.

We hope this additional explanation provides further clarity on the importance and applicability of our method. Thank you again for your valuable feedback, and we remain open to any further discussion or suggestions.

Best regards,

Authors of Submission 9432

评论

Thank you for your in-depth review and feedback on our work. Regarding the limited improvement in AUROC you mentioned, we would like to explain via the perspective of consistency of performance improvement with a horizontal view:

Although our performance improvement is limited in absolute terms compared to the baseline, our experimental results show that our method has achieved consistent improvement in comparison with each baseline method. This comprehensive and consistent improvement fully demonstrates the robustness of our method. Especially on complex large-scale datasets such as ImageNet, while it is very difficult to maintain such consistent improvement across various benchmarking methods, our results technically reflect its capability to be a universal and robust method across the board.

评论

Dear reviewer rQF8,

Thank you again for your thoughtful feedback on our work and for giving us the opportunity to address your comments. In our previous rebuttal, we gave examples to illustrate the importance of OOD detection in real-world tasks such as finance, network security, and medical diagnosis. In addition, to specifically illustrate the effectiveness of our method, we conducted experiments on the iNaturalist dataset and saved the detected OOD samples in the provided anonymous link.

iNaturalist, as a dataset containing 10,000 animal and plant samples, we can detect about 100 more OOD samples than the SOTA baseline GAIA, and the performance gap will be larger if it is extended to a larger dataset. Interestingly, we found that these OOD samples mainly belong to the "plant" category. This phenomenon shows that our method has excellent performance in tasks such as agricultural anomaly monitoring. Since it is necessary to accurately distinguish between crop and non-crop plants (such as weeds or diseased plants), the detected OOD samples may be unknown diseases, abnormal weeds, or unregistered plants, providing support for precision agriculture. Expanding to environmental monitoring and disaster warning tasks, detecting abnormal vegetation phenomena in the environment can also provide early warning signals, such as the impact of environmental pollution and climate change on plant communities.

In addition to the comparative experiments, we also conducted ablation experiments on the first two contributions of our method, namely the adversarial module and the layer splitting module, to verify the effectiveness of our method.

Table 1. OOD Detection Method: Ours (Non-adversarial + layer splitting)

OOD DatasetFPR95 (%)AUROC (%)
iNaturalist34.7592.48
Textures55.0588.73
Sun24.8595.38
Places40.6091.47
Average38.8192.02

Table 2. OOD Detection Method: Ours (Adversarial + Non-layer splitting)

OOD DatasetFPR95 (%)AUROC (%)
iNaturalist28.7093.82
Textures61.7985.72
Sun32.7592.90
Places55.8084.12
Average44.7689.14

Table 3. Detection Method: Ours (Non-adversarial + Non-layer splitting)

DatasetFPR95 (%)AUROC (%)
iNaturalist47.0588.92
Textures60.1180.76
Sun46.3586.62
Places67.0079.57
Average55.1383.97

It can be seen that compared with Table 1 of Non-adversarial + layer splitting, Table 3 (Non-adversarial + Non-layer splitting) achieved a performance reduction of 16.32% and 8.05% on FPR95 and AUROC, respectively, and performed worse on each dataset, which shows the effectiveness of the layer splitting module. Compared with Table 2 of adversarial + Non-layer splitting, Table 3 (Non-adversarial + Non-layer splitting) achieved a performance reduction of 10.37% and 5.17% on FPR95 and AUROC, respectively, and performed worse on other datasets except the FPR95 indicator of Textures, which strongly shows the effectiveness of the adversarial module.

We sincerely hope that our above response has adequately addressed the reviewer’s concerns. We would be truly grateful if the reviewer could consider the possibility of a score adjustment.

评论

Thank you very much for the author's response! The concerns regarding the ablation experiments have been addressed, and I have increased my score from 3 to 5. I am pleased that the author has identified a significant application scenario, i.e., the out-of-distribution detection of plant. If the author can provide experimental data for this scenario and offer some reasonable explanations, I would be inclined to accept the paper.

评论

In this section we list the category names of plants in the inaturalist dataset, most of which are composed of Latin:

Coprosma lucida, Cucurbita foetidissima, Mitella diphylla, Selaginella bigelovii, Toxicodendron vernix, Rumex obtusifolius, Ceratophyllum demersum, Streptopus amplexifolius, Portulaca oleracea, Cynodon dactylon, Agave lechuguilla, Pennantia corymbosa, Sapindus saponaria, Prunus serotina, Chondracanthus exasperatus, Sambucus racemosa, Polypodium vulgare, Rhus integrifolia, Woodwardia areolata, Epifagus virginiana, Rubus idaeus, Croton setiger, Mammillaria dioica, Opuntia littoralis, Cercis canadensis, Psidium guajava, Asclepias exaltata, Linaria purpurea, Ferocactus wislizeni, Briza minor, Arbutus menziesii, Corylus americana, Pleopeltis polypodioides, Myoporum laetum, Persea americana, Avena fatua, Blechnum discolor, Physocarpus capitatus, Ungnadia speciosa, Cercocarpus betuloides, Arisaema dracontium, Juniperus californica, Euphorbia prostrata, Leptopteris hymenophylloides, Arum italicum, Raphanus sativus, Myrsine australis, Lupinus stiversii, Pinus echinata, Geum macrophyllum, Ripogonum scandens, Echinocereus triglochidiatus, Cupressus macrocarpa, Ulmus crassifolia, Phormium tenax, Aptenia cordifolia, Osmunda claytoniana, Datura wrightii, Solanum rostratum, Viola adunca, Toxicodendron diversilobum, Viola sororia, Uropappus lindleyi, Veronica chamaedrys, Adenocaulon bicolor, Clintonia uniflora, Cirsium scariosum, Arum maculatum, Taraxacum officinale officinale, Orthilia secunda, Eryngium yuccifolium, Diodia virginiana, Cuscuta gronovii, Sisyrinchium montanum, Lotus corniculatus, Lamium purpureum, Ranunculus repens, Hirschfeldia incana, Phlox divaricata laphamii, Lilium martagon, Clarkia purpurea, Hibiscus moscheutos, Polanisia dodecandra, Fallugia paradoxa, Oenothera rosea, Proboscidea louisianica, Packera glabella, Impatiens parviflora, Glaucium flavum, Cirsium andersonii, Heliopsis helianthoides, Hesperis matronalis, Callirhoe pedata, Crocosmia × crocosmiiflora, Calochortus albus, Nuttallanthus canadensis, Argemone albiflora, Eriogonum fasciculatum, Pyrrhopappus pauciflorus, Zantedeschia aethiopica, Melilotus officinalis, Peritoma arborea, Sisyrinchium bellum, Lobelia siphilitica, Sorghastrum nutans, Typha domingensis, Rubus laciniatus, Dichelostemma congestum, Chimaphila maculata, Echinocactus texensis.

We hope the information would allow future research to reproduce our results.

评论

Your experimental setup is reasonable, i.e., "we selected the ImageNet dataset as the in-distribution (ID) dataset. To perform the plant OOD detection task, we chose the iNaturalist dataset as the OOD dataset. Specifically, we manually selected 110 plant categories from the iNaturalist dataset and randomly sampled 10,000 images for these categories.".

Please provide a comparison of different methods under this experimental setup in terms of the FPR95 and AUROC metrics. A significant improvement in these metrics would be persuasive to me.

评论

Dear Reviewer rQF8,

Thanks for rasing the socre and your feedback is of great significance to our work! We promise to include the ablation study in the revised manuscript. Regarding the out-of-distribution (OOD) detection task for plants, we are happy to share the relevant experimental data and provide further explanations.

Firstly, we selected the ImageNet dataset as the in-distribution (ID) dataset. To perform the plant OOD detection task, we chose the iNaturalist dataset as the OOD dataset. Specifically, we manually selected 110 plant categories from the iNaturalist dataset and randomly sampled 10,000 images for these categories. The related experimental codes and results can be found in our anonymous GitHub repository (https://anonymous.4open.science/r/S-I-F6F7/additional_OOD_samples/). The entire work is fully open-source.

The detected OOD samples are stored in the "additional_OOD_samples" folder. Interestingly, the OOD samples detected from the iNaturalist dataset are all plant species that are not present in the ImageNet dataset, demonstrating the effectiveness of our method in plant OOD detection tasks. This result can be further explained by the dataset characteristics: ImageNet is a general-purpose image classification dataset, while iNaturalist focuses on biodiversity and covers a wide range of fine-grained species. For certain visually similar plant categories, such as the Violet category in ImageNet and the Viola sororia category in iNaturalist, conventional OOD detection methods like MSP, ODIN, and GradNorm struggle to distinguish them and often classify OOD samples as ID categories. However, our method can accurately distinguish fine-grained plant species, as evidenced in the additional_OOD_samples folder.

We provide the following example for the reviewers’ reference:

Detection of angiosperms: The following link shows an OOD sample detected from the iNaturalist dataset, belonging to the Lotus corniculatus category (https://anonymous.4open.science/r/S-I-F6F7/additional_OOD_samples/b51107eaf0608d345a265f623c776706.jpg). The corresponding ID sample from the ImageNet dataset, belonging to the Rapeseed category, is available at this link: (https://anonymous.4open.science/r/S-I-F6F7/ImageNet_rapeseed/1.png).

By comparing these two images, it becomes evident that the visual differences between them are minimal, making it challenging for humans to distinguish visually similar plant species. However, our method effectively identifies fine-grained plant species, which has significant real-world implications, such as discovering new plant species or protecting endangered plant populations.

评论

Dear Reviewer rQF8

We hope this message finds you well. We are writing to kindly follow up on our recent response to your review comments regarding the experimental details. As the rebuttal deadline for ICLR is today, we wanted to ensure that you had the opportunity to review the additional information we provided.

We deeply value your constructive feedback, and we believe the details we shared address your concerns and could enhance your assessment of our work. Please let us know if there are any further clarifications or additional information you would need from us before the deadline.

Thank you for your time and effort in reviewing our submission.

Best regards,

Authors of Submission 9432

评论

Dear reviewer rQF8,

Thank you for your reply. Under the experimental setup we described, we conducted a comprehensive evaluation of various methods in terms of the FPR95 and AUROC metrics. The results are summarized in the table below:

MethodsiNaturalist (FPR95↓)iNaturalist (AUROC↑)
MSP63.9387.57
ODIN62.6989.36
Energy64.9188.48
GradNorm50.0390.33
Rankfeat46.5481.49
React44.5291.81
GAIA29.4993.51
Our28.5993.67

As shown in the table, our method achieves the best performance across both metrics, with the lowest FPR95 (28.59) and the highest AUROC (93.67). Compared to the strongest baseline, GAIA, our method reduces FPR95 by 0.90 and improves AUROC by 0.16. These results demonstrate the effectiveness of our approach in the plant OOD detection task, showcasing its potential for practical applications.For traditional methods such as MSP, ODIN, and GradNorm, our method demonstrates more significant improvements in both FPR95 and AUROC metrics. Additionally, we have provided an anonymous link containing one OOD image from the iNaturalist dataset and one ID image from the ImageNet dataset. The experimental results show that our method can effectively identify the finer-grained OOD category from a set of visually similar plant images and correctly classify it as the OOD sample.

We hope these results address your concerns and further validate the contributions of our work. If this resolves your concerns, we would kindly request that you could consider raising the score.

审稿意见
3

The paper proposes the Splitting and Integration methods for out-of-distribution detection. It is based on the method GAIA (Chen et al. 2023) and compares the proposed method with GAIA and other algorithms. Splitting is a simple division of network layers into parts (as indicated in line 118), and integration is based on gradient splitting. The integrated gradient is based on the IG method. The experimental results are comparable to GAIA but do not significantly surpass GAIA's performance in order to be a new state of the art.

优点

The paper proposes a modification to GAIA (Chen et al. 2023). GAIA uses image x = 0 as a baseline for Integrated Gradient attribution, whereas the proposed method uses adversarial image xadvx_{adv}). The results of the experiments slightly improve to GAIA in few cases.

缺点

The paper experiment results are almost the same as GAIA except for a slight improvement on one dataset (SVHN). The paper writing in most places is full of authors' assumptions, or at least the sentences give such an impression (see lines 115, 202, 311). These statements rather should be backed by evidence or citations. There are questions about the use of variables and their definition in many equations. (see Question section). Proof 2 appears to be based on the assumption that xadvx_adv of xoutx_out will exhibit overly confidence.

问题

Define the term "overly confident" or not overly confident.

Clarify the score OOD score in Algo 1 as τ\tau. This appears for the first time in the paper and is not used anywhere else.

In Eq. confidence score, Ω(x)\Omega(x) is, and ξ\xi is the threshold. The confidence score is not mentioned in Eq. 1 anywhere else. This disconnects the latter discussion. This should be referred to elsewhere in order to keep the concept flowing.

Define the boundary of threshold ξ\xi.

The author mentions this is the first time the authors use adversarial attribution. However, Sec in line 172 suggested that they are using the Integrated Gradient (IG) method by Sundarajan et al. 2017. Clerfy explicitly whether they are applying IG methods from literature when they take image x=0x=0 to xadvx_{adv} as baseline.

Define indices x(r,s)x_{(r,s)}. Are they pixel-wise features or indicate a feature of image x with the same dimension? This could also be related to Fig. 1 for more clarity.

Variable TT in line 188 indicates this is the number of intervals; in line 216, it is iteration. Rectify or clarify.

Figure 2 text should be made legible. Figure 2 caption should explain what is in the image rather than stating the method in general. Not explaining the figure properly makes this irrelevant.

Table 1 text is too small- should be made large, or the Table should be re-arranged.

评论

W1: We appreciate the reviewer’s observation regarding the performance. While it is true that our method achieves comparable results to GAIA on CIFAR-100, we would like to emphasize that our approach demonstrates significant improvements on the larger-scale ImageNet dataset. This distinction highlights the strength of our method in addressing the challenges of OOD detection in large-scale environments, which is a critical focus of our work. We also want to clarify that current OOD detection metrics such as FPR95 and AUROC have already achieved promising performance across many benchmark datasets. However, our approach prioritizes robustness and reliability in large-scale scenarios like ImageNet, where these challenges become more pronounced. In addition, we would like to clarify that the arguments pointed out by the reviewers, such as lines 202 and 311, are not our assumptions, but in fact they are based on evidence. For example, for line 202, we first explain at the beginning of the paragraph that gradient attribution-based OOD detection methods like GAIA are based on baseline selection with x'=0 (this is pointed out in GAIA), and the current attribution methods that use adversarial samples as baseline selection (such as Pan et al., 2021; Zhu et al., 2024b;a, which we cited) have been shown to achieve SOTA attribution performance. And the baseline selection of x'=0 in GAIA does not take into account the scenario of adversarial perturbations. We are the first to apply the concept of adversarial attribution to OOD detection. This is one of our contributions and we have added the correct reference in line 205.

For line 311, this is our logical reasoning based on the fact that OOD samples will be given overconfident predictions (see the description of Figure 2 in GAIA), not our assumption. We have cited GAIA in line 269 and explained the argument that "OOD samples typically exhibit overconfident predictions". Since the label of the adversarial sample belongs to the label set of the input sample, if the input sample is an ID sample, according to the definition of the OOD sample, the adversarial sample is obviously still an ID sample. Therefore, we believe in Proof 2 that f(xadv;θ)f (x_{adv} ; \theta) will not show overly confidence. As for the reviewer's statement that "Proof 2 appears to be based on the assumption that xadvx_{adv} of xoutx_{out} will exhibit overly confidence.", it is obvious that if the input sample is an OOD sample, according to this logic, f(xadv;θ)f (x_{adv} ; \theta) will show overly confidence. In short, this is our logical reasoning based on the viewpoint that GAIA has proved, not a temporary assumption we made.


Q1: As we answered in Weakness, GAIA has already explained the argument that OOD samples will be given Overconfident Prediction in Figure 2 of the paper. In addition, GAIA also cited the work [1] and [2] in related work to show the argument that "Networks tend to display overconfident softmax scores when predicting OOD inputs". We would like to clarify that we need this proven argument as the basis for our proof, and we have cited this argument in line 269. The concept of "overly confident" does not require additional definition, and its definition or not will not affect the reasoning of our paper.

[1] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.

[2] Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 41–50, 2019.

Q2: The OOD score τ in the pseudocode is the score that we need the algorithm to return to determine whether the input sample is an OOD sample. This score is what we need in the end. The higher the score, the higher the possibility that the input sample is OOD. Therefore, it only needs to be returned at the end of the pseudocode, just like the confidence score representing the predicted category will be returned at the end of the binary classification task.

Q3: We would like to clarify that Eq. 1 is only an explanation of the ID and OOD sample classification method in our problem definition. Here we refer to the problem definition in GAIA and compare the ID and OOD sample classification to a binary classification task. Ω(X) represents the classification method, such as GAIA or Rankfeat baselines, and ξ is the threshold of the binary classification task, which can be adjusted by the developer. Therefore, the discussion here is just to give readers who are not familiar with or do not understand the OOD detection task a mathematical understanding. Whether Ω(X) and ξ are mentioned later does not affect the introduction of our method.

评论

Q4: Please see the answer to Q3

Q5: Thanks for the reviewer's suggestion. We are confused about what you meant by "Clerfy explicitly". It seems that the reviewer thought that we used the attribution method in IG and wanted us to clarify our contribution of adversarial attribution and whether we used the baseline of x=0x=0 to x=xadvx=x_{adv} in IG. First of all, we are indeed the first to introduce the concept of adversarial attribution into OOD detection. Second, IG is not an adversarial attribution method, and its baseline is x=0x=0, not x=xadvx=x_{adv}. The purpose of introducing IG in line 172 is to provide readers with the mathematical background of IG, because GAIA performs attribution based on the choice of IG baseline x=0x=0, and our attribution method is completely different from IG (see Eq. 9-10) and the baseline is x=xadvx=x_{adv}. Therefore, our contribution is original, not using the attribution theory of IG and is different from the GAIA method based on IG.

Q6: We want to clarify that x(rs)\underset{(rs)}{x} is a pixel-level feature, and we have defined it in detail in lines 161-164, corresponding to an input sample x with width S and height R.

Q7: We would like to clarify that the T in lines 188 and 216 represents the same meaning, namely "interval" or "iteration". Since the baseline of IG is x=0x=0, it divides the path into T intervals and performs T iterations when iterating from x=0x=0 to x=xinputx=x_{input}. Since our adversarial attribution requires T gradient ascents to generate adversarial samples as the baseline, each round of iteration is equivalent to an interval of IG. Since T in lines 188 and 216 represents the same parameter, we use the same symbol to avoid confusion. We promise to add the description of T in the revised version.

Q8: Thanks to the reviewer’s suggestion, we would like to clarify that the purpose of the title of Figure 2 is to help readers understand the difference in the OOD sample distribution between GAIA and our method in Figure 2, namely, “By performing multiple adversarial attacks to analyze the feature distribution shifts from ID adversarial samples to OOD input samples, we can progressively identify high-confidence non-zero gradients, thereby obtaining the true explanation pattern representations denoted by the shaded regions.” We will modify Figure 2 to make it clearer to read.

Q9: We thank the reviewer for the suggestion, and we will update the table in the revised version to make it clearer to read.

评论

Dear Reviewer bTVY,

We hope this message finds you well. We are writing to kindly follow up regarding our rebuttal to your comments on our submission.

We have made our best effort to address the concerns you raised, and we believe our responses provide clarity and additional insights that may help in your evaluation. As the rebuttal phase concludes today, we wanted to check if there is any additional information or clarification we could provide to further assist your review.

Thank you very much for your time and effort in reviewing our work. We sincerely appreciate your feedback and consideration.

Best regards,

Authors of Submission 9432

审稿意见
6

The paper addresses the problem of out-of-distribution (OOD) detection, crucial for enhancing the robustness of deep learning models. The authors propose a novel method called S & I (Splitting & Integrating) that improves on existing gradient-based OOD detection approaches like GAIA by introducing adversarial examples and integrating gradient attribution across split intermediate layers of the neural network. The S & I algorithm smooths gradient fluctuations and identifies true explanation patterns, outperforming state-of-the-art (SOTA) methods on CIFAR100 and ImageNet benchmarks, as shown through extensive experiments.

优点

  • The introduction of layer splitting combined with adversarial gradient attribution integration is innovative.
  • The method demonstrates superior performance on CIFAR100 and ImageNet benchmarks, achieving lower FPR95 and higher AUROC compared to baselines.
  • The paper provides a mathematical basis and proofs for the concepts introduced.
  • The authors evaluate their approach against multiple baseline methods, showcasing clear advantages.

缺点

  • On CIFAR100, the performance gains over GAIA are not substantial, suggesting limited effectiveness in smaller label space datasets.
  • The algorithm's complexity, involving iterative adversarial updates and integration across multiple layers, may hinder scalability or applicability to extremely large models.

问题

  • Have you considered the potential security risks introduced by using adversarial examples as baselines, and how might this affect deployment?
评论

W1: We thank the reviewer for the valuable comments. We would like to clarify that the insignificant improvement of our method on CIFAR100 does not mean limited effect, but means that our method can achieve the same or even slightly better performance than GAIA on small datasets. Besides, we would like to emphasize that our approach demonstrates significant improvements on the larger-scale ImageNet dataset. This distinction highlights the strength of our method in addressing the challenges of OOD detection in large-scale environments, which is a critical focus of our work. We also want to clarify that current OOD detection metrics such as FPR95 and AUROC have already achieved promising performance across many benchmark datasets. However, our approach prioritizes robustness and reliability in large-scale scenarios like ImageNet, where these challenges become more pronounced.

W2: We thank the reviewer for the valuable comments. In fact, we want to clarify that each model can be layered, which is not a problem and will not affect its scalability. In addition, when GAIA calculates the gradient, it also calculates the corresponding gradient for each layer. Our layered operation is not much more complicated than GAIA. We are applicable to all scenarios where GAIA is applicable, and the gradients of these layers can also be split.

Q1: We thank the reviewer for the valuable comments. We would like to clarify that the introduction of adversarial attack in this paper does not affect AI security. The purpose of introducing adversarial samples is to create a data point that deviates a lot from the current input point to improve the accuracy of OOD detection and increase the adaptability of the model when it is perturbed by adversarial perturbations. In addition, according to the definition of OOD samples, adversarial samples are still ID samples, so no additional OOD samples will be generated to pollute the dataset.

评论

Dear Reviewer 8iu4,

Thank you for your thoughtful review and the valuable feedback on our submission. We greatly appreciate the time and effort you’ve dedicated to evaluating our work.

We have provided detailed responses to address your comments and would be grateful if you could kindly consider them before the rebuttal phase ends today. Please let us know if there is anything further we can clarify!Thank you again for your support and consideration.

Best regards,

Authors of Submission 9432

AC 元评审

This paper studies out-of-distribution detection. It improves the existing gradient-based OOD detection method GAIA (NeurIPS 2023) by introducing adversarial examples and proposing layer splitting and gradient Integration.

The reviewers share significant concerns about the effectiveness of the proposed method, as its experimental improvement is quite marginal compared to GAIA. On CIFAR100, the performance is almost the same as that of GAIA. On ImageNet, the performance improvements are also quite marginal. Although experimental results are not the only criterion for judgment, the presented experiments do not effectively support the proposed arguments and raise serious concerns about the effectiveness of the proposed method. The authors' responses help to address some other concerns, however, after rebuttal, there is still significant concern about its effectiveness.

Considering all reviews, the paper does not yet meet the acceptance threshold before these significant issues are adequately addressed.

审稿人讨论附加意见

The authors' responses help to address some other concerns, however, after rebuttal, there is still significant concern about its effectiveness.

最终决定

Reject