Enhancing Neural Subset Selection: Integrating Background Information into Set Representations
摘要
评审与讨论
This paper proposes a neural subset selection method based on deep sets. This model is inspired by a theoretical perspective to include information from supersets to achieve better performance. Experiments on common benchmarks show SOTA performance compared to several recent baselines.
优点
- The idea to include information from superset is simple and effective as shown by the experiment results
- Theoretical discussions are provided.
缺点
-
Equation 4 describes the neural network construction. However, I am unclear about the objective function to optimize the neural network. Also, after optimization, how do you use this neural network to select a subset?
-
In equation 4, how do you divide a superset into several subsets? There are an exponential number of combinations.
-
What is the number of learnable parameters for each baseline method and the proposed method?
问题
None
伦理问题详情
None
We greatly appreciate the time and effort you have invested. In response to your concerns, we have provided detailed clarifications.
Comments 1: The optimization objective and how to select an optimal subset during inference?
ANSWER: Thank you for your insightful question, as the construction of an optimization objective and inference process are indeed key aspects of neural subset selection [1, 2, 3]. We aimed to balance detailed explanation with readability for a broad audience, which is why we initially provided a high-level overview in the Introduction Section. To specifically address your concerns, we are now including more detailed descriptions of the optimization and inference processes.
Our formulation of the optimization objective is based on the framework established in [1]. Specifically, the optimization objective is to address Equation 1 in our paper by adopting an implicit learning strategy grounded in probabilistic reasoning. This approach can be succinctly formulated as follows:
The important step in addressing this problem involves constructing an appropriate set mass function that is monotonically increasing in relation to the utility function . To achieve this, we can employ the Energy-Based Model (EBM):
In practice, we approximate the EBM by solving a variational approximation
During the training phase, we need an EquiNet, denoted as This network takes the ground set as input and outputs probabilities indicating the likelihood of each element being part of the optimal subset . In the inference stage, EquiNet is employed to predict the optimal subset for a given ground set , using a TopN rounding approach. For detailed information on the implementation and derivation of the aforementioned objective, please refer to [1].
[1] Ou Z, Xu T, Su Q, et al. "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.
[2] Tschiatschek S, Sahin A, Krause A. "Differentiable submodular maximization." IJCAI, 2018.
[3] Zhang D W, Burghouts G J, Snoek C G M. "Set prediction without imposing structure as conditional density estimation." ICLR, 2021.
Comments 2: In equation 4, how do you divide a superset into several subsets?
ANSWER: Theorem 3.5 and Eq.4 are general frameworks to establish the relationship between and . This approach does not necessitate dividing a superset into multiple subsets; instead, it requires processing only once for a specific pair of In the context of neural subset selection with the optimal supervision (OS) oracle, addressing the variational approximation employs Monte-Carlo (MC) sampling. For a given we only generate subsets during training, consistently setting this number to across various tasks. Consequently, this eliminates the need for an exponential number of combinations. It is important to highlight that one of the foremost advantages of Neural Subset Selection in the context of OS Oracle is its ability to significantly reduce the computational burden associated with processing an exponential number of
Comments 3: What is the number of learnable parameters for baselines and the proposed method?
ANSWER: In regard to the parameters, we have already done ablation studies in Table 3 and provided a discussion in Section 4.4 of our paper. To further demonstrate that the improvements achieved by our method are not merely due to additional parameters, we present an additional table here using the CeleA dataset. This table compares EquiVSet (v1) and EquiVSet (v2) — variants of EquiVSet where we have incorporated a Conv(32, 3, 2) layer and a Conv(64, 4, 2) layer into the EquiVSet backbone, respectively. Detailed descriptions of these backbones are available in Appendix E.2. Notably, despite having the largest number of parameters, EquiVSet (v2) is outperformed by INSET, indicating that our method's efficacy is not solely parameter-dependent.
| DeepSet | Set-Transformer | EquiVSet | EquiVSet-v1 | EquiVSet-v2 | INSET | |
|---|---|---|---|---|---|---|
| Parameter | 651181 | 1288686 | 1782680 | 2045080 | 3421592 | 2162181 |
| MJC | 0.4400.006 | 0.5270.008 | 0.5490.005 | 0.5540.007 | 0.5600.005 | 0.5800.012 |
Thank you in advance for dedicating your time and attention to our response. We are confident that the clarifications and additional information provided here comprehensively address your concerns. With this in mind, we respectfully and earnestly request that you re-evaluate our work, considering the explanations we have offered.
Thanks for your response! I have one more question regarding Comments 2. By stating "we only generate m subsets during training", do you mean that during each training iteration, you randomly select m subsets?
Huge thanks for your super quick reply and for raising the score!
Thank you for your prompt response! You are correct that during each training iteration, we randomly select subsets for each ground set . Increasing the value of leads to an improvement in performance. To ensure a fair comparison, we adhere to EquiVSet's protocol by setting the sample number to 5 across all tasks and datasets. Even when varying the value of , the results consistently demonstrate that INSET significantly outperforms EquiVSet. In the following table, we report the performance of EquiVSet by selecting the best results achieved after tuning the value of within the range of 1 to 10.
| EquiVSet | m=1 | m=2 | m=5 | m=7 | m=8 | m=10 | |
|---|---|---|---|---|---|---|---|
| Toys | 70.40.004 | 75.20.006 | 75.30.005 | 76.90.005 | 76.80.003 | 76.70.003 | 77.10.004 |
| Gear | 74.50.013 | 78.80.015 | 77.50.020 | 80.80.012 | 81.30.010 | 82.10.015 | 84.60.011 |
Thank you for your time. If you have any additional questions, we would be delighted to discuss them further.
I am satisfied with the clarification and increased my score to 6.
Dear Reviewer miuk,
We would like to extend our heartfelt gratitude for your active engagement and valuable suggestions. Thanks to your insightful feedback, we have made some revisions to our manuscript.
Firstly, we have incorporated the optimization objective and inference process into Appendix D.2, allowing for a more comprehensive understanding of our proposed approach. Additionally, we have included the extra experiments in Appendices F.2 and F.4, providing further supporting evidence for our findings. These revisions have been highlighted in purple for readers' convenience.
We greatly appreciate your continued support and acknowledgement of our response. Moreover, we are truly grateful for the time and consideration you have invested in reviewing our manuscript.
Sincerely,
The Authors
The paper tackles neural subset selection. In particular, they tackle the issue that current methods do not consider the properties of the superset while constructing subsets. Their theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an invariant sufficient statistic of the superset into the subset of interest for effective learning.
优点
- The paper is clearly written.
- The related work covers enough ground for a new researcher to understand a high level idea of this field.
- The experiments include multiple baselines.
缺点
- Lack of ablation studies.
- The proposed method is not evaluated on a wide distribution of datasets.
- Will similar findings hold if the dataset contains imbalance? If so, what degree of imbalance do the guarantees still hold?
问题
- Baselines do not consider the information from superset, but these baselines be improved by adding the invariant sufficient statistic of the superset?
Comments 3: Will similar findings hold if the dataset contains imbalance? What degree of imbalance do the guarantees still hold?
ANSWER: INSET is designed to significantly enhance the models' capacity to effectively learn or . According to Theorem 3.5, this enhancement holds true consistently when the tasks involve modeling the relationship between (S,V) and . To provide empirical evidence, we conduct additional experiments that demonstrate INSET's consistent superiority over the baselines, even in scenarios with imbalanced ground set sizes. Specifically, we train the model on the two-moons datasets (for detailed information, please refer to Appendix F.1) using fixed ground set sizes of 100, and evaluate its performance on various data sizes ranging from 200 to 1000.
| 200 | 400 | 600 | 800 | 1000 | |
|---|---|---|---|---|---|
| EquiVSet | 0.538 0.002 | 0.513 0.003 | 0.482 0.002 | 0.473 0.005 | 0.471 0.003 |
| INSET | 0.547 0.002 | 0.518 0.005 | 0.502 0.003 | 0.486 0.002 | 0.485 0.002 |
The results clearly show that INSET consistently enhances the performance of EquiVSet, regardless of any imbalances.
Comments 4: Can baselines be improved by adding the invariant sufficient statistic of the superset?
ANSWER: Certainly, Theorem 3.5 offers a comprehensive framework for modeling the relationship between and , which is also applicable to the baselines. However, integrating this invariant sufficient statistic directly into DeepSet and Set-Transformer presents challenges, as they do not explicitly learn a neural subset function . Our method, INSET, has employed DeepSet as its backbone. To demonstrate that Set-Transformer can also derive benefits from INSET, we utilize Set-Transformer as our backbone to showcase this.
| Random | Set-Transformer | Set-Transformer + INSET | |
|---|---|---|---|
| Toys | 0.083 | 0.625 0.020 | 0.769 0.010 |
| Gear | 0.077 | 0.647 0.006 | 0.825 0.021 |
| Carseats | 0.066 | 0.220 0.010 | 0.230 0.031 |
| Bath | 0.076 | 0.716 0.005 | 0.862 0.005 |
| Health | 0.076 | 0.690 0.010 | 0.852 0.009 |
| Diaper | 0.084 | 0.789 0.005 | 0.896 0.005 |
| Bedding | 0.079 | 0.760 0.020 | 0.885 0.013 |
| Feeding | 0.093 | 0.753 0.006 | 0.902 0.004 |
By employing Set-Transformer as our backbone, we enable it to explicitly learn the relationship between and . The empirical results clearly demonstrate a significant improvement in performance as a result.
Thank you for your time and thoughtful consideration. If you have any concerns or questions, please don't hesitate to reach out to us.
We greatly appreciate the time and effort you have invested. In response to your concerns, we have provided detailed clarifications and additional experimental results. For your convenience, these results are presented in tabular format. We will incorporate these results, along with details and plots into the appendix of the revised version.
Comments 1: Lack of ablation studies
ANSWER: Thank you for highlighting the absence of ablation studies in our work. Indeed, our method, INSET, does not introduce any new hyperparameters to the EquiVSet [1] framework. We use the exact same hyperparameters as EquiVSet in all of our experiments, ensuring a fair comparison. Meanwhile, INSET can significantly outperform baseline models across various datasets and tasks, demonstrating its substantial efficacy.
To further verify the robustness of INSET, we have now conducted ablation studies focusing on the Monte-Carlo (MC) sample numbers for each input pair . In the context of neural subset selection tasks, our primary aim is to train the model to predict the optimal subset from a given ground set . During training, we sample m subsets from to optimize our model parameters , thereby maximizing the conditional probability distribution among of all pairs of for for a given V. In our main experiments, we adhere to EquiVSet's protocol by setting the sample number to 5 across all the tasks. Even with varying the value of , the results consistently demonstrate that INSET significantly outperforms EquiVSet. Please note that the performance of EquiVSet is reported by selecting the best results achieved after tuning the value of .
| EquiVSet | m=1 | m=2 | m=5 | m=7 | m=8 | m=10 | |
|---|---|---|---|---|---|---|---|
| Toys | 0.7040.004 | 0.7520.006 | 0.7530.005 | 0.7690.005 | 0.7680.003 | 0.7670.003 | 0.7710.004 |
| Gear | 0.7450.013 | 0.7880.015 | 0.7750.020 | 0.8080.012 | 0.8130.010 | 0.8210.015 | 0.8460.011 |
| Bath | 0.8200.005 | 0.8210.010 | 0.8510.008 | 0.8620.005 | 0.8740.006 | 0.8610.005 | 0.8740.003 |
| Health | 72.00.010 | 0.7490.015 | 0.7630.012 | 0.8120.005 | 0.8240.008 | 0.8080.005 | 0.8110.005 |
Comments 2: The proposed method is not evaluated on a wide distribution of datasets.
ANSWER: It is important to clarify that our experiments encompass three tasks: product recommendation, set anomaly detection, and compound selection, which involve the processing of tabular data, images, and 3D Cartesian coordinates. Specifically, we conduct extensive experiments on these tasks using six datasets. Notably, for the product recommendation task, the datasets consist of 12 categories, effectively representing 12 sub-datasets.
Moreover, we have also conducted synthetic experiments in Appendix F.1 to assess INSET's capability to learn complex set functions. Additionally, to provide further evidence of INSET's effectiveness, we have performed set anomaly detection tasks using the CIFAR-10 dataset. We are also incorporating additional filters for compound selection tasks for a wider distribution of datasets. For more information, please refer to Appendix F.3 and F.5 in the revised submission."
| Random | PGM | DeepSet | Set-Transformer | EquiVSet | INSET | |
|---|---|---|---|---|---|---|
| CIFAR-10 | 0.193 | 0.4500.020 | 0.3160.008 | 0.6540.023 | 0.6030.012 | 0.7420.020 |
| PDBBind | 0.073 | 0.3500.009 | 0.323 0.004 | 0.3550.010 | 0.3570.005 | 0.3710.010 |
| BindingDB | 0.027 | 0.1760.006 | 0.1650.005 | 0.1830.004 | 0.1880.006 | 0.1980.005 |
The latest results provide further evidence of INSET's superior performance compared to the baselines. Furthermore, it is worth mentioning that our experimental setup includes a significantly larger number of experiments compared to DeepSet (Sec. 4.3) [2] and PGM (Sec. 5.3) [3].
[1] Ou Z, Xu T, Su Q, et al. "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.
[2] Zaheer M, Kottur S, Ravanbakhsh S, et al. "Deep Sets." NeurIPS, 2017.
[3] Tschiatschek S, Sahin A, Krause A. "Differentiable submodular maximization." IJCAI, 2018.
Dear Reviewer vcJR:
Thank you again for your valuable time and efforts in reviewing our manuscript. Since our previous response was a bit long, we provide a summary below:
-
Ablation studies: We have clarified that our method INSET does not introduce new hyper-parameters. Additionally, we have included an ablation study focusing on the number of Monte-Carlo (MC) samples. Detailed responses to these points are available in our feedback to Comment 1.
-
Datasets: We have elaborated on the usage of our datasets, encompassing 3 tasks across 6 datasets in three different modalities. Besides, we have also presented more experiments in our feedback to Comment 2.
-
Questions: We also provide new experiments to answer your thoughtful questions on the imbalance and baseline.
Our method not only demonstrates an impressive empirical performance, with up to a 23% improvement over the best baselines, but it is also underpinned by rigorous theoretical analysis and a strong foundational concept. We are also grateful for your recognition of our work’s Soundness, Presentation, and Contribution as being satisfactory.
Considering these aspects, we respectfully and kindly invite you to re-evaluate the rating of our submission. We eagerly anticipate any further feedback from you.
Dear Reviewer vcJR,
Thanks for your time and consideration. We have revised the Appendix of our manuscript to address your concerns. Regarding your comments on ablation studies, we have conducted additional experiments, detailed in Appendix F.4, and in the first answer of our initial response. Concerning the distribution of datasets, we invite you to review Appendix F.3 and F.5, along with the second answer in our initial response. For your insightful questions, please refer to the second part of our initial response. A summary of our previous response can be found in the paragraph. We would appreciate knowing if you have any additional feedback or suggestions.
Sincerely,
The Authors
Dear Reviewer vcJR,
Thank you once again for your time! We understand that you have a busy schedule, and we kindly remind you that the revision deadline is approaching. If you have any suggestions or feedback on how we can improve our manuscript, we would greatly appreciate your input. We eagerly await your response.
Sincerely,
The Authors
Dear Reviewer vcJR,
We sincerely appreciate the time and effort you have dedicated to reviewing our work. Would you mind checking our response (a shortened summary, and the details )? If you have any further questions or concerns, we would be grateful if you could let us know. Moreover, if you find our response satisfactory, we kindly ask you to consider the possibility of improving your rating. Thank you very much for your valuable contribution.
Best,
The Authors
Dear Reviewer vcJR,
As the deadline for updating our manuscript is rapidly approaching, we would greatly appreciate your timely feedback on the revisions and clarifications we have provided. We are eager to incorporate any further suggestions you may have. If you find our responses and modifications satisfactory, we kindly request that you consider revising your rating to reflect these changes.
Thank you for your attention to our work, and we look forward to your response.
Best regards,
The Authors.
The authors propose an optimal subset selection method based on neural networks, which is designed to learn a permutation invariant representation of both the subset of interest and the ground superset . The authors highlight that prior works for neural subset selection (e.g., DeepSet) do not account for the superset , and both theoretically and empirically demonstrate that jointly modeling the interactions between and leads to improved performance.
优点
- The writing is generally easy to follow, and the paper includes a sufficiently comprehensive discussion of relevant prior works. Experimental results are presented well.
- The proposed method achieves strong empirical performance in terms of mean Jaccard coefficient (often with a fairly large gap) when compared against several optimal subset selection baselines (e.g., DeepSet, EquiVSet).
缺点
- The presentation of some of the mathematical details needs improvement. In particular, it seems that some of the notations are overloaded (i.e., the same notation is used with different interpretations) or not clearly defined. For example, the notation appears as a subset of the ground set in the Introduction, but in Section 3.1 (Background), the notation appears as an element of that takes a matrix form. The relationship between elements and is not clearly defined either. On another note, it is not entirely clear to me what the function value is really referring to, which also appears without an explicit discussion of its meaning in the Introduction as part of the variational distribution . Is supposed to be the utility function value (which was also introduced with the notation in the Introduction)? The confusion arising from notational ambiguity makes the paper less readable.
问题
- Can the authors clearly define what is? The footnote mentions that is the "probability of element being selected", but this description is ambiguous.
- It looks like learning the neural network approximation in Eq. (4) is done via variational inference as in Ou et al. (2022). As I am not familiar with the cited work, it is unclear to me how is serves as an approximation for the subset likelihood when the former is a distribution over and the latter is a distribution over . Can the authors provide clarifications on this?
- How is the neural network construction in Eq. (4) explicitly related to (or )?
We greatly appreciate the time and effort you have invested! In response to your concerns, we have provided clarifications here. We will also incorporate these clarifications into our revised version to enhance clarity.
Comments 1: Relationship Between and
ANSWER: We regard as a set composed of elements, denoted as , i.e., . In order to facilitate the proposition of Property 1, we describe as a collection of several disjoint subsets, specifically , where . Here, represents the size of subset , that is,
Comments 2: The definition of
ANSWER: The generality of our Theorem 3.1 allows it to be applied to both U=F(Y|S,V) and P(Y|S,V) for different tasks. Specifically, when considering the task of Neural Subset Selection in Optimal Subset (OS) oracles, which involves learning P(Y|S,V), we define Y as a independent Bernoulli distribution, which is parameterized by , representing the odds or probabilities of selecting element in a output subset .
Comments 3: Why can serve as a variational approximation to
ANSWER: As discussed in the previous answer, . In practice, is represented as a binary vector (mask), denoted as , where the -th element is equal to if and otherwise. Therefore, it is natural to use to represent the variational distribution of .
Comments 4: How is the neural network construction in Eq. (4) explicitly related to or
ANSWER: Once neural networks are trained, their outputs become fixed for a given input, such as (S,V). Thus, Eq. (4) represents the explicit structure used to construct models for learning the deterministic function (to differentiate it from the utility function U=F(S,V)). Using this function, we can construct the conditional distribution according to Theorem 3.5. Specifically, we employ the Mean-Field Variational Inference (MFVI) method introduced by [1] (Section 3.2) to approximate the distribution , referred to as in [1].
To prevent overwhelming readers with an abundance of notations and equations, we have deliberately omitted the detailed construction of q(Y|S,V) and the derivation of variational approximation in our paper. This decision was motivated by two factors. Firstly, our theorem and Eq. 4 offer a general framework for modeling the relationship between and , instead of focusing on the neural subset selection tasks. Secondly, in order to ensure clarity of our motivation, we have provided a high-level description of these concepts in the Introduction section. For readers interested in the details of these concepts, we strongly recommend referring to [1] (Section 3) for a more comprehensive understanding. For the implementation details of q(Y|S,V) and , we suggest consulting our accompanying code located at (./model/modules.py).
Thanks for your time and suggestions again. We would appreciate knowing if you have any additional feedback or suggestions.
[1] Ou Z, Xu T, Su Q, et al., "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.
Dear Reviewer PoVi,
We sincerely appreciate your reviews and valuable suggestions. Taking into account your feedback, we have made refinements to the footnote in the Introduction Section and enhanced the description of and in Section 3.1. These revisions, highlighted in purple, will significantly enhance the clarity of our paper. Thank you once again for your time and contribution.
Best regards,
The Authors.
Thanks for a quick response and for letting me know of the revisions. Here are additional clarification questions and comments:
- Regarding Comment 1: Wouldn't it be more natural to write with when , which I believe is indeed the form used in Section 3.2?
- Regarding Comments 2 and 3: I think the description that is a distribution is misleading. Clearly, itself is not a distribution since it need not be the case that , and by the definitions given here, and are distributions over different objects. Rather, shouldn't it be the case that the probabilities of each element being selected in the optimal subset are the outputs from the variational distribution ? In this case, it seems more appropriate to describe it as ? Please let me know if I am misunderstanding something here. Meanwhile, since is used throughout the main text, I think it should be very clearly defined as part of the main text before it is used, instead of appearing in a footnote (if appropriate, discussed along with concrete examples that readers can immediately understand).
Thank you for your valuable suggestions! We are now revising our manuscript based on your suggestions. To address your questions, we would like to provide the following clarifications:
Firstly, we want to clarify that represents a set of independent Bernoulli distributions rather than a categorical distribution with classes. Hence, it is not required for the elements of to satisfy the constraint . Additionally, we define the expression of as follows:
Next, we would like to explain why can approximate . Consider and . In this case, can be viewed as a stochastic version of since can also be generated as while still satisfying the constraint .
To facilitate comprehension, let us consider an illustrative scenario. Suppose we have a ground set , and the optimal subset is , which can be represented as . Specifically, we define , indicating that is the correct subset, while for any , we have .
Now, let's examine the case when In this situation, we can calculate that This implies that accurately represents the probability of observing given , and it correctly assigns a high probability to the optimal subset.
We hope these clarifications help provide a better understanding of our framework. Once again, we appreciate your constructive suggestions and look forward to further discussion.
Dear Reviewer PoVi,
Thank you for your constructive suggestions and insightful comments! As the discussion deadline is approaching in 10 hours, we would like to inquire if you have any further suggestions for improving our manuscript. We would greatly value your input and appreciate your guidance.
Thank you for your time and consideration. We eagerly await your reply.
Sincerely, The Authors
Dear Reviewer PoVi,
We sincerely appreciate your valuable suggestions for improving our paper. We have refined the descriptions of and in Section 3.2 based on your suggestions. Moreover, we have defined in the main text instead of the footnote, where we have also included a reference to Appendix D.2. In this appendix, readers can find further elaboration on the relationship between and $P(S,V), along with the example mentioned in our previous response.
We sincerely thank you once again for your time and valuable contribution. Should you have any additional suggestions or questions, please do not hesitate to let us know.
Best regards, The Authors
We express our sincere gratitude to the Area Chairs and Reviewers for their dedicated time and valuable feedback. Below is a concise summary of the review and our responses for ease of reference.
Reviewer Acknowledgments:
Our method, INSET, has been recognized for its strong empirical performance and rigorous theoretical support. Key highlights include:
- Clear Motivation with Theoretical Support: Our model is inspired by a theoretical result [miuk], and have theoretically demonstrated the significance to include information from supersets to achieve better performance [PoVi, vcJR].
- Strong Empirical Performance: INSET achieves state-of-the-art performance [miuk], supported by empirical evidence [PoVi].
- Quality of Writing: The manuscript is praised for its clarity and simplicity, making it accessible for newcomers to the field [vcJR, miuk]. The presentation effectively conveys high-level concepts [PoVi].
Addressing Weaknesses:
The reviewers raised concerns regarding ablation studies [vcJR, miuk], dataset distribution [vcJR], and mathematical notations in the optimal subset oracle [PoVi, miuk]. Our responses include:
- Ablation Studies: We clarified in this dialog that INSET introduces no new hyperparameters. Additional experiments demonstrate that even with variations in existing hyperparameters, INSET consistently outperforms baselines by a large margin. This response was acknowledged positively by Reviewer miuk.
- Mathematical Notations: Following suggestions from Reviewers PoVi and miuk, we corrected minor typos and enriched the appendix with additional mathematical background. These revisions received positive feedback from Reviewer miuk.
- Dataset Distribution: We clarified in the dialog that our approach encompasses three tasks across six datasets in different modalities in the main body of our work. The appendix have also included five additional datasets. We have utilized a much wider variety of datasets than those adopted by our baseline comparisons.
Revision Overview:
- To enhance clarity, we have made refinements to the description of in the Introduction, as well as the definitions of and in Sections 3.1 and 3.2.
- The mathematical description of the optimization objective and inference process has been detailed in Appendix D.2.
- Appendix F.2 has been updated with additional empirical studies on computation costs. Further, ablation studies have been included in Appendix F.4, and results from a broader range of datasets are now presented in Appendix F.5."
We are thankful for the constructive feedback received, and we believe that the concerns raised by the reviewers have been thoroughly addressed in our responses and revisions.
This paper proposes a method for "neural subset selection" based on deep sets. The paper received three reviews with borderline scores. Far the most comprehensive review came from PoVi, who found the work to be easy to follow, appreciated the discussion of the previous literature and noted the comprehensiveness of the experiments. The authors took time to provide extensive responses to the reviewer complaints but the reviewers did not take sufficient time to acknowledge these responses. In this case, and given the satisfaction expressed by the few reviewers who did reply with the updated results, I tend to give the authors the benefit of the doubt and recommend acceptance.
为何不给更高分
Too many weaknesses.
为何不给更低分
Reviewers all see strengths.
Accept (poster)