Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification
摘要
评审与讨论
The paper introduces a logical regularization method called L-Reg, aimed at enhancing the generalization ability of image classification tasks through a logical reasoning framework. L-Reg effectively simplifies the complexity of the model by ensuring that the generated atomic formulas align with the logical relationships between the images and labels, promoting balanced distribution in the feature space, and reducing the number of extreme weight values in the classifier. Both theoretical analysis and experimental results validate the effectiveness of L-Reg in various generalization scenarios, especially demonstrating outstanding performance in multi-domain generalization and generalized category discovery tasks.
优点
-
L-Reg effectively reduces the complexity of the model by balancing the feature distribution and reducing the number of extreme weight values in the classifier.
-
In Sections 3.1 and 3.2, the paper provides logic-based theoretical analysis and detailed derivation of the construction process of L-Reg. Furthermore, through experiments in Sections 4 and 5, the effectiveness of L-Reg in different generalization settings, especially in multi-domain generalization and generalized category discovery tasks, is validated.
-
Designed as a plug-in loss function, L-Reg is compatible with most existing frameworks, making it highly flexible and practical in real-world applications.
-
The paper also explores the relationship between logical reasoning and visual classification tasks, delving into the derivation of logic-based regularization terms to promote generalization, providing new perspectives and methods for research in related fields.
缺点
-
L-Reg may reduce the scope of semantic support, leading to a slight performance decrease on known datasets. It is hoped that the authors can analyze in more detail the reasons for this performance drop and provide possible improvement methods.
-
The appropriate function g for generating atomic formulas is a key factor. While the authors propose L-Reg as a regularization method to ensure that F(g(Xs), Ys) consists of atomic formulas, they do not elaborate on how to choose or design this function g. Further explanation on the specific methods or criteria for selecting g is desired.
-
In multi-domain generalization tasks, how L-Reg integrates with existing methods and how to adjust alpha to balance the two losses are details that the authors are encouraged to further elucidate.
问题
I hope the authors can further compare L-Reg with other regularization methods in terms of generalization ability and interpretability to highlight the advantages of L-Reg.
局限性
Authors have adequately addressed the limitations.
We really appreciate your insightful comments, and we address your weaknesses and questions point-by-point.
W1. Thank you for your insightful comment. As discussed in Paper Lines 344-358, L-Reg relies on the precondition that each dimension of the semantic features represents independent semantics. When this condition is not met, applying L-Reg can lead to performance degradation. This is because the atomic formulas constructed for different classes could become sub-optimal if the minimal semantic supports for different classes correlate with each other. In cases where improper is used, especially between known and unknown classes, the model may struggle to effectively filter out irrelevant features for unknown classes, which can result in features that inadvertently overlap with the minimal semantic supports for known classes, resulting in degradation for them.
To address this issue, we hypothesize that enforcing independence between and could lead to further improvements. To test this hypothesis, we conducted experiments using orthogonality regularization (Ortho-Reg) to enforce feature independence in mDG and GCD tasks. As shown in PDF Tab.2,3&4, results indicate that while Ortho-Reg alone may not be very effective, combining L-Reg with Ortho-Reg leads to significant improvements.
Based on these findings, we propose that the performance drop observed could be mitigated further with a well-designed model architecture or additional regularization techniques that enhance independence between and . We find this to be a very attractive and promising area of research, and we are eager to explore it further in future work.
W2. We appreciate your comment. To ensure a fair comparison and align with previous work, we use the same encoder, , as employed by earlier studies to validate our L-Reg approach. Insights into the selection or design of can be found in our response to W1. As noted, L-Reg relies on the condition that the semantic features and (for ) are independent. Models that achieve an orthogonal semantic space are thus well-suited for applying L-Reg, as they naturally align with this requirement.
W3. hank you for highlighting this point; it has been very insightful. Currently, we use a plain strategy of selecting the regularization weights in the range of [0.01, 0.001, 0.0001], keeping them relatively small compared to the scale of other losses (approximately 1:10). We appreciate your comments and acknowledge that there may be deeper theoretical or empirical insights related to this issue. We plan to explore this further in future research.
Q1. Thank you for your recommendation. For a thorough evaluation, we compare our L-Reg to the aforementioned Ortho-Reg and a sparsity-based regularization approach. To validate this fairly, we re-implement the Ortho-Reg and Bernoulli sample of the latent features from the sparse linear concept discovery models [3] on the same PIM backbone that we used. The results, presented in PDF Tab.2&3, indicate that L-Reg consistently yields the most significant improvements when these regularization terms are applied alone. Additionally, as mentioned, combining L-Reg with Ortho-Reg further enhances performance since a more proper is obtained.
The authors' rebuttal dispelled some of my concerns, and I choose to maintain the score.
We greatly appreciate your kind feedback and insightful comments. Those comments have significantly helped us improve our paper.
This paper addresses the multi-domain generalization (mDG), generalized category discovery (GCD), and the more challenging mDG+GCD task. The authors introduce a logical reasoning-based regularization term called L-Reg, which bridges logical analysis with image classification to enhance model interpretability and reduce complexity. The main idea of L-Reg is to identify a minimal set of semantics that can deduce the relationship between and , i.e., the semantic support. Theoretical analysis and experiments demonstrate that L-Reg improves generalization across mDG, GCD, and mDG+GCD.
优点
- This paper establishes a connection between logical reasoning and practical visual generalization problems, bringing novel insights for improving DG and GCD.
- The proposed L-Reg used over existing SOTA methods can further improve the SOTA performance.
- The experiments are comprehensive. The visualization results provide a good empirical understanding of the role of L-Reg.
缺点
- The presentation of this article is somewhat confusing for readers unfamiliar with logical reasoning, especially from Section 3.1 to Section 3.2. It would be better to add intuitive explanations of key concepts. For example, why should be atomic formulas for a logic to be a 'good general' one, or, what do the atomic formulas refer to in a real-world case?
- The mDG improvements of L-Reg are over GMDG. The performance of directly applying L-Reg is not shown, i.e., ERM+L-Reg.
问题
- Line 145 claims that "Semantics that occur frequently across samples often lack decisiveness for classification." However, samples from the same class often share the semantic predictive of the class.
- The definition of the "minimal semantics" is vague. The meaning of is also unclear in Eq. (3).
- Could you provide a concrete implementation of L-Reg? I wonder which layer(s) is(are) chosen for computing Eq. (3) in practice? Are the results sensitive to the selection of layers?
局限性
Yes.
We really appreciate your insightful comments, and we address your weaknesses and questions point-by-point. Some points of weaknesses and questions are combined because they are very associated with each other.
W1&Q2. Thank you so much for your comments, and sorry for any confusion caused by our writing. We will follow your kind suggestions and add some intuitive explanations for key concepts in our paper. Please also kindly refer to Reply to All Reviewers for the theoretical analysis of atomic formulas and their relationship to the interpretability of L-Reg. To address your questions more concretely, we use Paper Fig. 5 as a typical example to provide more analysis on atomic formulas and the interpretability of L-Reg.
For the known classes, the efficacy of L-Reg can be intuitively understood as extracting the minimal semantic supports for a given class label. As examples shown here, the presence of a guitar's fingerboard, even in unseen domains, helps classify a sample as belonging to the guitar category, whose informal form can be denoted as . For all known classes, samples with these minimal semantic supports are recognized accordingly.
In contrast, if a sample lacks these minimal supports for any known class, it is very likely categorized as an unknown class. This behavior stems from Paper Eq.10 which ensures through constraining . L-Reg further enhances the model's ability to identify minimal supports for unknown classes by filtering out co-covariant features associated with other classes and thus generalizing to unseen domains. Therefore, the very interpretable features for unknown classes from unseen domains can be extracted using L-Reg. Paper Fig. 5 (right side) demonstrates that the model with L-Reg can even extract facial features for the unknown person class and can generalize this to the unseen domain. Similarly, here we obtain an (informal) atomic formula as .
We will include a more detailed discussion on this topic in the final version of our paper.
W2. Thank you very much for this comment. Please refer to the Reply to All Reviewers for the experimental details on the ERM baseline. As shown in PDF Tab.4, under the same experimental settings and hyperparameters, incorporating L-Reg with ERM significantly enhances the overall mDG performance, improving it from 49.9% to 52.9%.
Q1. We apologize for any confusion caused and appreciate your constructive feedback. L-Reg is designed to focus on eliminating shared semantics across all classes rather than addressing frequent features within a single class. To clarify this, please refer to the PDF Fig.1. In this figure, we have included fixed images with the same coordinates and additional figures illustrating feature distributions of known and unknown classes. Note here we use the values of the first components of PCA results on the original features, denoted as .
Before applying L-Reg, the feature predominantly falls within the range [-0.4, -0.2] across all classes, especially identical between known and unknown classes. This indicates that a few specific semantics overly influence many features. L-Reg mitigates this issue by focusing the model on disentangled minimal semantic supports for classifying each class, thereby reducing feature complexity and enhancing generalization.
Q2. Many thanks for this comment. The code for L-Reg is available in the supplementary materials we have provided, and we will release all code and hyperparameters to facilitate the reproduction of our experiments.
Regarding sensitivity, as discussed in Paper Line 344-358 and illustrated in Paper Tab.4, applying L-Reg to the semantic features from the deep layers improves performance for unknown classes without negatively impacting known classes. We hypothesize this is due to the fact that our L-Reg is derived based on the precondition that is independent of each other. This condition holds for most deep-layer features but may not apply to shallow layers, and further regularizing the independence may lead to further improvements. To test this hypothesis, we conducted experiments with orthogonality regularization (Ortho-Reg) to enforce feature independence in mDG and GCD tasks. As shown in PDF Tab.2,3&4, while using Ortho-Reg alone may not be very effective, combining L-Reg with Ortho-Reg leads to further improvements. These findings support our hypothesis and suggest that L-Reg, particularly when applied to deep layers or in conjunction with Ortho-Reg, is beneficial.
Thank you for your detailed reponse and additional experimental results! Most of my concerns are addressed, and I'm willing to raise my rating to 6.
Thank you for your kind feedback. We will ensure that the relevant discussions are thoroughly incorporated into the final manuscript as suggested.
This work introduces a sample-based regularization technique, L-Reg, which goes beyond techniques like parameter-based L2 regularization by being more interpretable and demonstrating better generalization ability. The work formalizes the notion of semantic support to force the model to learn minimal sufficient statistics, quantitatively and qualitatively showing how that it leads to better generalization across multiple settings - multi-domain generalization, generalized category discovery and a new setting that is a combination of the two which they introduce.
优点
- A theoretically grounded paper with comprehensive experiments and results.
- The paper is well written in general.
- I especially liked how the algebraic logic formalism was neatly tied into this space. The idea of using semantic supports, although simple, is motivated and formalized well. I also liked how the negation of the semantic support set was used to formulate the optimization problem.
- The derivation of conditions required to hold under various settings is well done and makes the derivation of the regularization easy to follow. The proposed mDG + GCD setting is interesting.
缺点
- Although the method introduced is technically sound, with the baselines being quite comprehensive, the improvement in results seems minor. This suggests that accuracy may not be the right metric to compare against here. Considering the objectives proposed, shouldn’t other metrics beyond accuracy be considered?
- The paper states that “the semantics generated by the encoder and classifier can be combined to form atomic formulas”. I expected to see some of the actual learnt atomic formulae in the results - which I did not.
- From the formulation of the L_reg loss, it appears that any concept-based model that is sparse may be similar to the proposed formulation. How is the proposed method different from such methods -- formally and empirically?
问题
- Since the paper mentions “constructing atomic formulae”, would it be possible to actually extract and see how these look from the model? This might significantly strengthen the paper.
- How does this method compare, formally and empirically, with sparse concept-based models? Wouldn’t their loss be very similar to the proposed L_reg loss?
局限性
Yes, limitations have been addressed.
We appreciate your insightful comments, and we address your weaknesses and questions point-by-point. Some points of weaknesses and questions are combined because they are very associated with each other.
W1. We really appreciate this insightful comment. In our study, we adopted the commonly used accuracy metric to align with the previous work we compared against. We agree that other entropy-based metrics could potentially provide a more effective evaluation. Additionally, it may be worthwhile to propose novel logic-based metrics for a more comprehensive assessment. We find this to be a very attractive and promising area of research, and we are eager to explore it further in future work.
W2&Q1. We appreciate these comments. For a more detailed explanation of L-Reg's interpretability, please refer to Reply to All Reviews 1. Notably, the CAM visualizations in our paper illustrate the model's learned atomic formulas. By using L-Reg, the model derives interpretable atomic formulas, which can also be understood as the most important features for predicting a given class. To address your questions more concretely, we use Paper Fig.5 as a typical example to analyze L-Reg’s interpretability further.
For the known classes, the efficacy of L-Reg can be intuitively understood as extracting the minimal semantic supports for a given class label. As examples shown here, the presence of a guitar's fingerboard, even in unseen domains, helps classify a sample as belonging to the guitar category, whose informal form can be denoted as . For all known classes, samples with these minimal semantic supports are recognized accordingly.
In contrast, if a sample lacks these minimal supports for any known class, it is very likely categorized as an unknown class. This behavior stems from Paper Eq.10 which ensures through constraining . L-Reg further enhances the model's ability to identify minimal supports for unknown classes by filtering out co-covariant features associated with other classes and thus generalizing to unseen domains. Therefore, the very interpretable features for unknown classes from unseen domains can be extracted using L-Reg. Paper Fig. 5 (right side) demonstrates that the model with L-Reg can even extract facial features for the unknown person class and can generalize this to the unseen domain. Similarly, here we obtain an (informal) atomic formula as .
We will include a more detailed discussion on this topic in the final version of our paper.
W3&Q2. We really appreciate this inspiring comment. Following the Reply to All part 1, while a common sparse concept model may be able to achieve by filtering irrelevant features through the sparsity, it may not ensure , which is crucial for disentangling features used for predicting different classes. This limitation can potentially lead to degradation in generalization performance for common sparse concept models.
To investigate this fairly, we re-implemented the Bernoulli Sample of the latent features from the Sparse Linear Concept Discovery Models [3] on the same PIM backbone that we used to achieve the sparsity. The results in PDF Tab.2&3 indicate that while L-Reg consistently achieves overall improvement, the sparse concept-based approach does not consistently improve generalization, validating the aforementioned difference.
I thank the authors for the detailed rebuttal, and the efforts.
- The additional results, esp the comparison with sparse concept models, is useful. I'd suggest that this should be included in the main result tables. Since one of the significant claims of this work is the reduction in complexity of parameters, comparisons with sparse models would be necessary to show the usefulness of this approach.
- Thank you for the qualitative example and explanation on the atomic formulas. It would be great to include a few qualitative results (positive and perhaps even cases where the method failed) in the appendix of the paper. This would greatly help understand the paper better.
The paper is meritorious, and I stay with my rating of WA.
We sincerely appreciate your acknowledgment of our efforts and your constructive suggestions. We will update the comparison results with sparse concept models in the main result tables and add more qualitative results in the appendix. Thank you once again for your valuable feedback, which has significantly contributed to the improvement of our paper.
The paper proposes a novel logical regularization termed L-Reg for visual classification. L-Reg encourages models to focus on the salient semantics and thereby emerges interpretability. The theoretical analysis provides clear connections between logical reasoning and L-Reg. Extensive experiments demonstrate that L-Reg also benefits the generalization of models to unseen domains and categories.
优点
- Studies on loss regularization have positive influences on various fields.
- The paper is well-presented, and L-Reg is clearly presented with rigorous theoretical analysis.
- The qualitative benefits of L-Reg are validated through experiments, and the generalization brought by L-Reg has been demonstrated sufficiently under three settings.
缺点
- As stated in the introduction section, interpretability is a longstanding focus among studies in regularization terms. The authors claim that L2 regularization might lead to ambiguous interpretability, which also serves as an important motivation and contribution for L-Reg. However, the interpretability of L-Reg has not been adequately discussed except for the introduction section. Further analysis from either qualitative or quantitative perspectives can greatly strengthen the presentation.
- The analysis of Figure 5 should be detailed, especially the examples of unknown classes in Row 3.
问题
See weakness.
局限性
The author have discussed both the limitations and potential societal impact. The limitations are left for future research.
These insightful comments are highly appreciated. We believe these two questions are very related; therefore, please allow us to address them together.
As discussed in Reply to All Reviews 1, the L-Reg's interpretability is rooted in learning the good general atomic formulas. Specifically, L-Reg encourages the model to identify the minimal semantic supports - the most important features - necessary for class recognition. Such an approach resembles humans' cognition process. Paper Fig.1,5 and visualizations included in the Paper appendix show that L-Reg enables the model to learn distinctive features such as facial features for recognizing the person class, the long-neck feature for the giraffe class, and so on.
To address your questions more concretely, we use Paper Fig.5 as a typical example to analyze L-Reg’s interpretability further.
For the known classes, the efficacy of L-Reg can be intuitively understood as extracting the minimal semantic supports for a given class label. As examples shown here, the presence of a guitar's fingerboard, even in unseen domains, helps classify a sample as belonging to the guitar category, whose informal form can be denoted as . For all known classes, samples with these minimal semantic supports are recognized accordingly.
In contrast, if a sample lacks these minimal supports for any known class, it is very likely categorized as an unknown class. This behavior stems from Paper Eq.10 which ensures through constraining . L-Reg further enhances the model's ability to identify minimal supports for unknown classes by filtering out co-covariant features associated with other classes and thus generalizing to unseen domains. Therefore, the very interpretable features for unknown classes from unseen domains can be extracted using L-Reg. Paper Fig.5 (right side) demonstrates that the model with L-Reg can even extract facial features for the unknown person class and can generalize this to the unseen domain. Similarly, here we obtain an (informal) atomic formula as .
However, as shown in Row 3, significant domain shifts, such as those between the sketch domain and other domains, pose challenges. Specifically, the differences between the stick-figure style of sketches of persons and figures from other domains can hinder the model's ability to cluster sketches with other domains' figures when the class label is unknown. Thus, under this circumstance, the model may fail to extract meaningful features from those sketches. We acknowledge this limitation and will explore solutions in future work.
Once again, we appreciate your thoughtful feedback. We will incorporate this analysis into the final version of our paper.
The rebuttal has well addressed my previous concerns about L_Reg and Fig.5, so I change my rating as WA.
Thank you so much for your kind response. We greatly appreciate your constructive comments.
This paper mainly focuses on two problems: 1) How does logical reasoning relate to visual tasks such as image classification? 2) How can we derive a logical reasoning-based regularization term to benefit generalization?. Then, this paper proposes a method called Logical Reasoning Regularization based on the analysis of the two problems. Theoretical analysis and experimental results demonstrate that L-Reg enhances generalization across several scenarios.
优点
- The main contributions of this article are: 1) Building the relationship between logical reasoning and visual tasks such as image classification; 2) Rethinking the classification task from the logical reasoning perspective and proposing Logical Reasoning Regularization. Overall, the contributions are meaningful, and the paper is interesting.
- The paper is easy to read.
缺点
- As can be seen from Table 1 and Table 2, the proposed regular term has a weak improvement on the existing method.
- The analysis of this paper is incomplete and lacks theoretical analysis of sufficient conditions or sufficient and necessary conditions for meeting Atomic Formulas.
- Overclaim. This paper claims that the proposed L-Reg can reduce the complexity reduction. However, the proposed L-Reg is regarded as a regular term and is directly added in the learning objective. This does not reduce the computational complexity of the training period, but the extra regular term increases the computational overhead of the whole training process.
问题
- line 123, f_s should be f.
- In Figure 1, it can be seen from the visualization results of the first line that the learned features are basically concentrated on the background, and it is felt that the obtained model is overfitting or underfitting. I would like to know the specific parameter Settings, experimental codes, random seeds, etc., of the experiment.
- In Figure 3, what does the abscissa represent? What is the baseline being compared? To make the results more convincing, an additional baseline and dataset need to be added.
- In Figure 4, the coordinate proportions of the left subgraph and the right subgraph are inconsistent, so it is unfair to make a direct comparison. Secondly, the horizontal and vertical coordinates need to be explained what they represent. Finally, the left and right subgraphs are almost identical, It cannot be proved that +L-Reg can achieve real elimination of certain extracted semantics characterized by dominant frequencies across all samples.
- In lines 173-186, Why the constraint in eq. (6) is removed in eq. (8), please give a detailed derivation process.
局限性
The authors have adequately addressed the limitations.
We appreciate your insightful comments and we address your weaknesses and questions point-by-point.
W1. We humbly believe L-Reg delivers consistent and evident gains. Please refer to Reply to All for the highlighted improvements. Moreover, as you suggested, we further validate L-Reg's efficacy by applying it to the ERM for the mDG and CircuitFromer for additional Congestion prediction. PDF Tab.4 exhibits that L-Reg improves averaged performance with the ERM baseline from 49.9% to 52.9% and CircuitFormer from 0.6374 to 0.6553 in the Pearson metric, besides increases in other metrics.
W2. The sufficient and necessary conditions for achieving atomic formulas are the ultimate goal that remains a problem for the community. While we are working towards this, this paper proposes a first practical approach to approximate the most general atomic formulas. Please refer to the Reply to All, which provides more analysis about how the constraints in the paper are derived to achieve atomic formulas. Furthermore, according to the current limitation of L-Reg the paper discussed, we offer a future direction of obtaining proper to meet L-Reg's precondition. Additional experiments of ERM on mDG and GCD (cf. PDF Tab. 2,3&4) show that constraining proper leads to further improvement. We will discuss this grand topic in our revision and explore more aspects in the future.
W3. Sorry for any confusion. L-Reg is not designed to reduce the computational complexity but aims to reduce the complexity of the model parameters and the data features. As discussed in Paper Sec.3, L-Reg reduces the classifier's complexity by increasing the sparsity of its weights and features' complexity by removing over-dominated semantics across all classes (this comment is related to your Q3&4, where details of how L-Reg achieves these can be found). We will polish this part in the final version for better clarity.
Q1. Thanks. We will fix all these typos.
Q2. Paper Fig.1 shows CAM visualizations of the models trained under the GMDG with the RegNetY-16G backbone for mDG+GCD on the PACS with the unseen domain art painting, using DomainBed's protocols and codes. Both models share the same training parameters and seed 0. The only difference between them is that the latter uses L-Reg with its weight as an extra hyperparameter. Details such as specific parameter settings, as you requested, are in Paper Appendix E.1 and codes for our method are included in the supplementary materials. In short, we believe the comparison should be fair because of the same hyperparameters and the seed being used for both models except L-Reg. Note that the same models are used for Paper Fig.3&4, and all CAM visualizations in the Paper appendix.
Q3. Using the aforementioned models, Paper Fig.3(a) presents the heatmap of the classifiers' weights, where the x-axis is the index of the weights in the linear layers. Paper Fig.3(b) shows the distributions of the classifiers' weight values, with the x-axis representing the value of the normalized weights and the y-axis showing the count of weight values in bin intervals. Following your suggestion, the fixed figure with a denoted abscissa and more descriptive captions is shown in PDF Fig.2. PDF Fig.2 demonstrates that L-Reg reduces the classifier's complexity by alleviating extreme weight values. The heatmap further indicates that L-Reg increases the sparsity of the classifiers, thus better generalization. Please refer to the reply of W1 for details of more experimental results.
Q4. We apologize for any confusion caused by Paper Fig.4. We have re-drawn Paper Fig.4, now shown as PDF Fig.1, with distributions illustrated on the same coordinates and added features of known and unknown classes. Using the aforementioned models, PDF Fig.1 shows feature distributions based on the first component values after PCA (denoted as ). Before using L-Reg, mostly concentrates between [-0.4,-0.2] across all classes, indicating some specific semantics over-dominates the features. L-Reg alleviates this issue by forcing the model to obtain each class's minimal semantic supports, removing shared semantics across all classes to reduce feature complexity. PDF Fig.1 top row indicates the distance between the feature distributions of known and unknown classes is enlarged with L-Reg, making them more dividable for classification.
Q5. We provide more details about why in Paper Eq.6 can be safely omitted in the rest of the paper after the definition of logical framework. Consider the logic , we want to study 's logic that is defined in the form of , where are pseudo-components associated with . Particularly, is a subset of all possible world/domains from . For any and , it has . Further, in is defined as . [1] points out that the following condition is almost always satisfied: (Cond) , we have . Hence, the semantical consequence relation induced by coincides with the original syntactical while Cond holds. Due to , coincides with . Thus, can be safely omitted.
Thank you for your detailed response. My problems have been addressed well. I raise my rating to 6: Weak Accept.
We greatly appreciate your kind response and insightful comments, which have significantly helped us improve our paper.
Reply to All
We sincerely appreciate the reviewers' insightful comments, which have helped us refine and improve our paper. We have identified common concerns across the reviewers and address them collectively here. Detailed responses to individual reviewers are provided separately. Please note: References to contents from the paper are denoted with the prefix 'Paper'; 'PDF' indicates that the content is included in the uploaded PDF file.
1. More about atomic formulas and interpretability of L-Reg
Thank you for your comments regarding atomic formulas, which have guided us in highlighting the significance of L-Reg more effectively. The atomic formula is of the form or . Our aim is to find the good (most) general for class from which the interpretability of L-Reg is derived.
Consider , if is more general than , there will be a substitution such that [1,4]. should meet , which infers that (cf. Paper Eq.9) for predication of where is the semantic support (cf. Paper Def.3.1). Note here that the form of is constructed for , i.e., predicate whether the sample belongs to the class. Considering multiple classes , it has thus (cf. Paper Eq.10), which constrains that different minimal semantic supports should be used for predicting different classes.
The interpretability of L-Reg is based on , compelling the model to use distinct minimal semantic supports for each class. These minimal semantic supports can be interpreted as the most critical features for efficient prediction. For example, as shown in Paper Fig.1, the model with L-Reg has learned the facial features of the person class (see more examples in Paper Supp Fig.7-12), forming the (informal) atomic formula .
2. Improvements with L-Reg
Improvement highlights of L-Reg. We understand the points from the reviewers. Nonetheless, we humbly believe L-Reg actually leads to consistent and evident gains. Paper Tab 1-2 show the consistent overall improvements brought by L-Reg across different datasets in mDG and GCD, suggesting the feasibility of L-Reg. For GCD, a 6.7% improvement on the unknown classes of CIFAR100 and an average of 2.8% across unknown classes and all datasets also addresses L-Reg's efficacy for generalization. In mDG, the TerraInc dataset includes challenging camera trap images even for humans; L-Reg achieves a significant 2.2% increase on it and an average of 0.7% across five datasets.
Apply L-Reg to ERM Baseline for mDG. To further validate L-Reg's efficacy, we use ERM as the baseline on the TerraInc dataset for mDG. For a fair comparison, all experiments share the same hyperparameter settings and use the Regnety-16gf backbone. Original ERM results are also included alongside our reproduced results. The results in PDF Tab.4 reveal that ERM with L-Reg significantly improves mDG performance (from 49.9% to 52.9%).
Apply L-Reg to congestion prediction for circuit design. We also test L-Reg in Congestion prediction for circuit design on the CircuitNet [2] dataset by using the CircuitFormer [5] backbone. All parameters, except for L-Reg, remain consistent with CircuitFormer, and we follow its metrics. Results in PDF Tab.1 show improvements with L-Reg across all metrics and a significant increase in the pearson metric (0.6374 to 0.6553).
Compare L-Reg with more regularization terms. We compare L-Reg with other regularization terms: Ortho-Reg: the orthogonality regularization that constrains the independence of each dimension of the semantic feature ; and Sparsity: implemented as Bernoulli Sample of the latent features from the sparse linear concept discovery models [3] on our used PIM backbone. PDF Tab.2&3 demonstrate that L-Reg outperforms Ortho-Reg and Sparsity.
Limitation of L-Reg and possible solutions. As discussed in the Paper Limitation part and analysis around Paper Line 344-358, L-Reg is based on the precondition that each dimension of the represents an independent semantic. Thus, improper that does not meet this precondition may lead to sub-optimal results. To validate this hypothesis, we test L-Reg by reinforcing independence with Ortho-Reg. Results of MDG in PDF Tab.4 and GCD in Tab.2&3 show that combining L-Reg with Ortho-Reg leads to further improvements, whereas Ortho-Reg alone may not guarantee improvements. This suggests a direction for future work.
In summary, we believe the consistent improvements across all these experiments under different settings, with various baselines and backbones, demonstrate the excellent efficacy of L-Reg. These additional analyses and experiments will be included in the final version.
References:
[1] H. Andréka, I. Németi, and I. Sain. Universal algebraic logic. Studies in Logic, Springer, 2017.
[2] Z. Chai, Y. Zhao, W. Liu, Y. Lin, R. Wang, and R. Huang. Circuitnet: An open-source dataset for machine learning in vlsi cad applications with improved domain-specific evaluation metric and learning strategies. IEEE TCAD, 42(12):5034–5047, 2023.
[3] K. P. Panousis, D. Ienco, and D. Marcos. Sparse linear concept discovery models. In ICCV, pages 2767–2771, 2023.
[4] I. Tsapara and G. Turán. Learning atomic formulas with prescribed properties. In Proceedings of the eleventh annual conference on Computational learning theory, pages 166–174, 1998.
[5] J. Zou, X. Wang, J. Guo, W. Liu, Q. Zhang, and C. Huang. Circuit as set of points. NeurIPS, 36, 2024.
We would like to express our sincere gratitude to all of you for your valuable feedback and participation in the discussion regarding our paper. We have received responses from all the reviewers, and we greatly appreciate your involvement. Your insights and suggestions have been instrumental in enhancing the quality and clarity of our work.
Summary: This work introduces a sample-based regularization term, L-Reg. It goes beyond techniques like parameter-based L2 regularization by being more interpretable and demonstrating better generalization ability. It encourages models to focus on the salient semantics and thereby emerges interpretability. The theoretical analysis provides clear connections between logical reasoning and L-Reg.
Strength: The contributions --- building the relationship between logical reasoning (regularization) and visual tasks and the studies on loss regularization --- are meaningful; the paper is interesting and well-presented with rigorous theoretical analysis and comprehensive experiments and results; the idea of using semantic supports is motivated and formalized well; the derivation of required conditions is well done and makes the derivation of the regularization easy to follow; the proposed method is compatible with most existing frameworks, making it highly flexible and practical in real-world applications.
Weakness: Weak improvement empirically (other metrics might be presented); the interpretability of L-Reg has not been adequately discussed except for the introduction section; the analysis is incomplete and lacks theoretical analysis of sufficient (and necessary) conditions or sufficient and necessary for meeting Atomic Formulas; some arguments seem to be overly claimed; further analysis from either qualitative or quantitative perspectives and further discussions about sparse concept-based models can greatly strengthen the presentation; the writing could be improved for readers unfamiliar with logical reasoning; L-Reg may reduce the scope of semantic support, leading to a slight performance decrease on known datasets; some technical details and explanations are not clear or missing.
After rebuttal: The authors provided a rebuttal, including new experimental results. All the reviewers read the rebuttal and provided feedback. The reviewers have acknowledged that most of their concerns are well-addressed. After rebuttal, the paper received increased ratings 6-6-6-6-6, with an average rating of 6.
Recommendation: The AC checked the review, rebuttal, and the reviewers’ further feedback. The AC agreed with the reviewers about the multiple strengths of the papers and appreciated the authors’ efforts to effectively address weaknesses. The AC thus recommended acceptance and suggested the authors incorporate all the reviewers’ comments and their rebuttal into their final version (especially Reviver t6Vg’s and Reviewer 2Pp3’s comments).