PaperHub
5.7
/10
Poster3 位审稿人
最低5最高6标准差0.5
6
5
6
3.3
置信度
ICLR 2024

Enhancing Neural Subset Selection: Integrating Background Information into Set Representations

OpenReviewPDF
提交: 2023-09-24更新: 2024-03-06

摘要

关键词
Neural Set FunctionHierarchical StructureInvarianceSubset Selection

评审与讨论

审稿意见
6

This paper proposes a neural subset selection method based on deep sets. This model is inspired by a theoretical perspective to include information from supersets to achieve better performance. Experiments on common benchmarks show SOTA performance compared to several recent baselines.

优点

  1. The idea to include information from superset is simple and effective as shown by the experiment results
  2. Theoretical discussions are provided.

缺点

  1. Equation 4 describes the neural network construction. However, I am unclear about the objective function to optimize the neural network. Also, after optimization, how do you use this neural network to select a subset?

  2. In equation 4, how do you divide a superset into several subsets? There are an exponential number of combinations.

  3. What is the number of learnable parameters for each baseline method and the proposed method?

问题

None

伦理问题详情

None

评论

We greatly appreciate the time and effort you have invested. In response to your concerns, we have provided detailed clarifications.

Comments 1: The optimization objective and how to select an optimal subset during inference?

ANSWER: Thank you for your insightful question, as the construction of an optimization objective and inference process are indeed key aspects of neural subset selection [1, 2, 3]. We aimed to balance detailed explanation with readability for a broad audience, which is why we initially provided a high-level overview in the Introduction Section. To specifically address your concerns, we are now including more detailed descriptions of the optimization and inference processes.

Our formulation of the optimization objective is based on the framework established in [1]. Specifically, the optimization objective is to address Equation 1 in our paper by adopting an implicit learning strategy grounded in probabilistic reasoning. This approach can be succinctly formulated as follows: argmaxθ EP(V,S)[logpθ(SV)]argmax_\theta\ \mathbb{E}_{\mathbb{P}(V, S)} [\log p _\theta (S^{*}| V)] s.t.pθ(SV)Fθ(S;V),S2V,s.t. p _\theta (S | V) \propto F _\theta (S ; V), \forall S \in 2^V,

The important step in addressing this problem involves constructing an appropriate set mass function pθ(SV)p_\theta (S|V) that is monotonically increasing in relation to the utility function Fθ(S;V)F_\theta (S;V). To achieve this, we can employ the Energy-Based Model (EBM):

pθ(SV)=exp(Fθ(S;V))Z,  Z:=SVexp(Fθ(S;V)),p_\theta (S|V) = \frac{\mathrm{exp}( F_\theta (S; V))}{Z}, \; Z := \sum\nolimits_{S'\subseteq V} \mathrm{exp}( F_\theta (S'; V)),

In practice, we approximate the EBM by solving a variational approximation

ϕ=argminϕD(qϕ(YS,V))pθ(SV)), \phi^* = argmin_{\phi} D(q_\phi(Y|S,V)) || p_\theta (S|V)),

During the training phase, we need an EquiNet, denoted as Y=EquiNet(V;ϕ):2V[0,1]V.Y = \operatorname{EquiNet}(V;\phi): 2^V \rightarrow [0,1]^{|V|}. This network takes the ground set VV as input and outputs probabilities indicating the likelihood of each element xVx \in V being part of the optimal subset SS^*. In the inference stage, EquiNet is employed to predict the optimal subset for a given ground set VV, using a TopN rounding approach. For detailed information on the implementation and derivation of the aforementioned objective, please refer to [1].

[1] Ou Z, Xu T, Su Q, et al. "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.

[2] Tschiatschek S, Sahin A, Krause A. "Differentiable submodular maximization." IJCAI, 2018.

[3] Zhang D W, Burghouts G J, Snoek C G M. "Set prediction without imposing structure as conditional density estimation." ICLR, 2021.

评论

Comments 2: In equation 4, how do you divide a superset into several subsets?

ANSWER: Theorem 3.5 and Eq.4 are general frameworks to establish the relationship between YY and (S,V)(S,V). This approach does not necessitate dividing a superset into multiple subsets; instead, it requires processing only once for a specific pair of (S,V).(S,V). In the context of neural subset selection with the optimal supervision (OS) oracle, addressing the variational approximation employs Monte-Carlo (MC) sampling. For a given Vi,V_i, we only generate mm subsets during training, consistently setting this number to 55 across various tasks. Consequently, this eliminates the need for an exponential number of combinations. It is important to highlight that one of the foremost advantages of Neural Subset Selection in the context of OS Oracle is its ability to significantly reduce the computational burden associated with processing an exponential number of (S,V).(S,V).

Comments 3: What is the number of learnable parameters for baselines and the proposed method?

ANSWER: In regard to the parameters, we have already done ablation studies in Table 3 and provided a discussion in Section 4.4 of our paper. To further demonstrate that the improvements achieved by our method are not merely due to additional parameters, we present an additional table here using the CeleA dataset. This table compares EquiVSet (v1) and EquiVSet (v2) — variants of EquiVSet where we have incorporated a Conv(32, 3, 2) layer and a Conv(64, 4, 2) layer into the EquiVSet backbone, respectively. Detailed descriptions of these backbones are available in Appendix E.2. Notably, despite having the largest number of parameters, EquiVSet (v2) is outperformed by INSET, indicating that our method's efficacy is not solely parameter-dependent.

DeepSetSet-TransformerEquiVSetEquiVSet-v1EquiVSet-v2INSET
Parameter65118112886861782680204508034215922162181
MJC0.440±\pm0.0060.527±\pm0.0080.549±\pm0.0050.554±\pm0.0070.560±\pm0.0050.580±\pm0.012

Thank you in advance for dedicating your time and attention to our response. We are confident that the clarifications and additional information provided here comprehensively address your concerns. With this in mind, we respectfully and earnestly request that you re-evaluate our work, considering the explanations we have offered.

评论

Thanks for your response! I have one more question regarding Comments 2. By stating "we only generate m subsets during training", do you mean that during each training iteration, you randomly select m subsets?

评论

Huge thanks for your super quick reply and for raising the score!

评论

Thank you for your prompt response! You are correct that during each training iteration, we randomly select mm subsets for each ground set VV. Increasing the value of mm leads to an improvement in performance. To ensure a fair comparison, we adhere to EquiVSet's protocol by setting the sample number mm to 5 across all tasks and datasets. Even when varying the value of mm, the results consistently demonstrate that INSET significantly outperforms EquiVSet. In the following table, we report the performance of EquiVSet by selecting the best results achieved after tuning the value of mm within the range of 1 to 10.

EquiVSetm=1m=2m=5m=7m=8m=10
Toys70.4±\pm0.00475.2±\pm0.00675.3±\pm0.00576.9±\pm0.00576.8±\pm0.00376.7±\pm0.00377.1±\pm0.004
Gear74.5±\pm0.01378.8±\pm0.01577.5±\pm0.02080.8±\pm0.01281.3±\pm0.01082.1±\pm0.01584.6±\pm0.011

Thank you for your time. If you have any additional questions, we would be delighted to discuss them further.

评论

I am satisfied with the clarification and increased my score to 6.

评论

Dear Reviewer miuk,

We would like to extend our heartfelt gratitude for your active engagement and valuable suggestions. Thanks to your insightful feedback, we have made some revisions to our manuscript.

Firstly, we have incorporated the optimization objective and inference process into Appendix D.2, allowing for a more comprehensive understanding of our proposed approach. Additionally, we have included the extra experiments in Appendices F.2 and F.4, providing further supporting evidence for our findings. These revisions have been highlighted in purple for readers' convenience.

We greatly appreciate your continued support and acknowledgement of our response. Moreover, we are truly grateful for the time and consideration you have invested in reviewing our manuscript.

Sincerely,

The Authors

审稿意见
5

The paper tackles neural subset selection. In particular, they tackle the issue that current methods do not consider the properties of the superset while constructing subsets. Their theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an invariant sufficient statistic of the superset into the subset of interest for effective learning.

优点

  • The paper is clearly written.
  • The related work covers enough ground for a new researcher to understand a high level idea of this field.
  • The experiments include multiple baselines.

缺点

  • Lack of ablation studies.
  • The proposed method is not evaluated on a wide distribution of datasets.
  • Will similar findings hold if the dataset contains imbalance? If so, what degree of imbalance do the guarantees still hold?

问题

  • Baselines do not consider the information from superset, but these baselines be improved by adding the invariant sufficient statistic of the superset?
评论

Comments 3: Will similar findings hold if the dataset contains imbalance? What degree of imbalance do the guarantees still hold?

ANSWER: INSET is designed to significantly enhance the models' capacity to effectively learn P(YS,V)P(Y|S,V) or F(S,V)F(S,V). According to Theorem 3.5, this enhancement holds true consistently when the tasks involve modeling the relationship between (S,V) and YY. To provide empirical evidence, we conduct additional experiments that demonstrate INSET's consistent superiority over the baselines, even in scenarios with imbalanced ground set sizes. Specifically, we train the model on the two-moons datasets (for detailed information, please refer to Appendix F.1) using fixed ground set sizes of 100, and evaluate its performance on various data sizes ranging from 200 to 1000.

2004006008001000
EquiVSet0.538 ±\pm 0.0020.513 ±\pm 0.0030.482 ±\pm 0.0020.473 ±\pm 0.0050.471 ±\pm 0.003
INSET0.547 ±\pm 0.0020.518 ±\pm 0.0050.502 ±\pm 0.0030.486 ±\pm 0.0020.485 ±\pm 0.002

The results clearly show that INSET consistently enhances the performance of EquiVSet, regardless of any imbalances.

Comments 4: Can baselines be improved by adding the invariant sufficient statistic of the superset?

ANSWER: Certainly, Theorem 3.5 offers a comprehensive framework for modeling the relationship between YY and (S,V)(S,V), which is also applicable to the baselines. However, integrating this invariant sufficient statistic directly into DeepSet and Set-Transformer presents challenges, as they do not explicitly learn a neural subset function F(S,V)F(S,V). Our method, INSET, has employed DeepSet as its backbone. To demonstrate that Set-Transformer can also derive benefits from INSET, we utilize Set-Transformer as our backbone to showcase this.

RandomSet-TransformerSet-Transformer + INSET
Toys0.0830.625 ±\pm 0.0200.769 ±\pm 0.010
Gear0.0770.647 ±\pm 0.0060.825 ±\pm 0.021
Carseats0.0660.220 ±\pm 0.0100.230 ±\pm 0.031
Bath0.0760.716 ±\pm 0.0050.862 ±\pm 0.005
Health0.0760.690 ±\pm 0.0100.852 ±\pm 0.009
Diaper0.0840.789 ±\pm 0.0050.896 ±\pm 0.005
Bedding0.0790.760 ±\pm 0.0200.885 ±\pm 0.013
Feeding0.0930.753 ±\pm 0.0060.902 ±\pm 0.004

By employing Set-Transformer as our backbone, we enable it to explicitly learn the relationship between YY and (S,V)(S,V). The empirical results clearly demonstrate a significant improvement in performance as a result.

Thank you for your time and thoughtful consideration. If you have any concerns or questions, please don't hesitate to reach out to us.

评论

We greatly appreciate the time and effort you have invested. In response to your concerns, we have provided detailed clarifications and additional experimental results. For your convenience, these results are presented in tabular format. We will incorporate these results, along with details and plots into the appendix of the revised version.

Comments 1: Lack of ablation studies

ANSWER: Thank you for highlighting the absence of ablation studies in our work. Indeed, our method, INSET, does not introduce any new hyperparameters to the EquiVSet [1] framework. We use the exact same hyperparameters as EquiVSet in all of our experiments, ensuring a fair comparison. Meanwhile, INSET can significantly outperform baseline models across various datasets and tasks, demonstrating its substantial efficacy.

To further verify the robustness of INSET, we have now conducted ablation studies focusing on the Monte-Carlo (MC) sample numbers for each input pair {(Vi,Si)}\{(V_i, S_i^*)\}. In the context of neural subset selection tasks, our primary aim is to train the model θ\theta to predict the optimal subset SS^* from a given ground set VV. During training, we sample m subsets from VV to optimize our model parameters θθ, thereby maximizing the conditional probability distribution pθ(SV)p_\theta (S^* | V) among of all pairs of (S,V)(S,V) for for a given V. In our main experiments, we adhere to EquiVSet's protocol by setting the sample number mm to 5 across all the tasks. Even with varying the value of mm, the results consistently demonstrate that INSET significantly outperforms EquiVSet. Please note that the performance of EquiVSet is reported by selecting the best results achieved after tuning the value of mm.

EquiVSetm=1m=2m=5m=7m=8m=10
Toys0.704±\pm0.0040.752±\pm0.0060.753±\pm0.0050.769±\pm0.0050.768±\pm0.0030.767±\pm0.0030.771±\pm0.004
Gear0.745±\pm0.0130.788±\pm0.0150.775±\pm0.0200.808±\pm0.0120.813±\pm0.0100.821±\pm0.0150.846±\pm0.011
Bath0.820±\pm0.0050.821±\pm0.0100.851±\pm0.0080.862±\pm0.0050.874±\pm0.0060.861±\pm0.0050.874±\pm0.003
Health72.0±\pm0.0100.749±\pm0.0150.763±\pm0.0120.812±\pm0.0050.824±\pm0.0080.808±\pm0.0050.811±\pm0.005

Comments 2: The proposed method is not evaluated on a wide distribution of datasets.

ANSWER: It is important to clarify that our experiments encompass three tasks: product recommendation, set anomaly detection, and compound selection, which involve the processing of tabular data, images, and 3D Cartesian coordinates. Specifically, we conduct extensive experiments on these tasks using six datasets. Notably, for the product recommendation task, the datasets consist of 12 categories, effectively representing 12 sub-datasets.

Moreover, we have also conducted synthetic experiments in Appendix F.1 to assess INSET's capability to learn complex set functions. Additionally, to provide further evidence of INSET's effectiveness, we have performed set anomaly detection tasks using the CIFAR-10 dataset. We are also incorporating additional filters for compound selection tasks for a wider distribution of datasets. For more information, please refer to Appendix F.3 and F.5 in the revised submission."

RandomPGMDeepSetSet-TransformerEquiVSetINSET
CIFAR-100.1930.450±\pm0.0200.316±\pm0.0080.654±\pm0.0230.603±\pm0.0120.742±\pm0.020
PDBBind0.0730.350±\pm0.0090.323±\pm 0.0040.355±\pm0.0100.357±\pm0.0050.371±\pm0.010
BindingDB0.0270.176±\pm0.0060.165±\pm0.0050.183±\pm0.0040.188±\pm0.0060.198±\pm0.005

The latest results provide further evidence of INSET's superior performance compared to the baselines. Furthermore, it is worth mentioning that our experimental setup includes a significantly larger number of experiments compared to DeepSet (Sec. 4.3) [2] and PGM (Sec. 5.3) [3].

[1] Ou Z, Xu T, Su Q, et al. "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.

[2] Zaheer M, Kottur S, Ravanbakhsh S, et al. "Deep Sets." NeurIPS, 2017.

[3] Tschiatschek S, Sahin A, Krause A. "Differentiable submodular maximization." IJCAI, 2018.

评论

Dear Reviewer vcJR:

Thank you again for your valuable time and efforts in reviewing our manuscript. Since our previous response was a bit long, we provide a summary below:

  • Ablation studies: We have clarified that our method INSET does not introduce new hyper-parameters. Additionally, we have included an ablation study focusing on the number of Monte-Carlo (MC) samples. Detailed responses to these points are available in our feedback to Comment 1.

  • Datasets: We have elaborated on the usage of our datasets, encompassing 3 tasks across 6 datasets in three different modalities. Besides, we have also presented more experiments in our feedback to Comment 2.

  • Questions: We also provide new experiments to answer your thoughtful questions on the imbalance and baseline.

Our method not only demonstrates an impressive empirical performance, with up to a 23% improvement over the best baselines, but it is also underpinned by rigorous theoretical analysis and a strong foundational concept. We are also grateful for your recognition of our work’s Soundness, Presentation, and Contribution as being satisfactory.

Considering these aspects, we respectfully and kindly invite you to re-evaluate the rating of our submission. We eagerly anticipate any further feedback from you.

评论

Dear Reviewer vcJR,

Thanks for your time and consideration. We have revised the Appendix of our manuscript to address your concerns. Regarding your comments on ablation studies, we have conducted additional experiments, detailed in Appendix F.4, and in the first answer of our initial response. Concerning the distribution of datasets, we invite you to review Appendix F.3 and F.5, along with the second answer in our initial response. For your insightful questions, please refer to the second part of our initial response. A summary of our previous response can be found in the paragraph. We would appreciate knowing if you have any additional feedback or suggestions.

Sincerely,

The Authors

评论

Dear Reviewer vcJR,

Thank you once again for your time! We understand that you have a busy schedule, and we kindly remind you that the revision deadline is approaching. If you have any suggestions or feedback on how we can improve our manuscript, we would greatly appreciate your input. We eagerly await your response.

Sincerely,

The Authors

评论

Dear Reviewer vcJR,

We sincerely appreciate the time and effort you have dedicated to reviewing our work. Would you mind checking our response (a shortened summary, and the details )? If you have any further questions or concerns, we would be grateful if you could let us know. Moreover, if you find our response satisfactory, we kindly ask you to consider the possibility of improving your rating. Thank you very much for your valuable contribution.

Best,

The Authors

评论

Dear Reviewer vcJR,

As the deadline for updating our manuscript is rapidly approaching, we would greatly appreciate your timely feedback on the revisions and clarifications we have provided. We are eager to incorporate any further suggestions you may have. If you find our responses and modifications satisfactory, we kindly request that you consider revising your rating to reflect these changes.

Thank you for your attention to our work, and we look forward to your response.

Best regards,

The Authors.

审稿意见
6

The authors propose an optimal subset selection method based on neural networks, which is designed to learn a permutation invariant representation of both the subset of interest SS and the ground superset VV. The authors highlight that prior works for neural subset selection (e.g., DeepSet) do not account for the superset VV, and both theoretically and empirically demonstrate that jointly modeling the interactions between SS and VV leads to improved performance.

优点

  • The writing is generally easy to follow, and the paper includes a sufficiently comprehensive discussion of relevant prior works. Experimental results are presented well.
  • The proposed method achieves strong empirical performance in terms of mean Jaccard coefficient (often with a fairly large gap) when compared against several optimal subset selection baselines (e.g., DeepSet, EquiVSet).

缺点

  • The presentation of some of the mathematical details needs improvement. In particular, it seems that some of the notations are overloaded (i.e., the same notation is used with different interpretations) or not clearly defined. For example, the notation SS appears as a subset of the ground set VV in the Introduction, but in Section 3.1 (Background), the notation SS appears as an element of VV that takes a matrix form. The relationship between elements xiXx_i \in \mathcal{X} and SiS_i is not clearly defined either. On another note, it is not entirely clear to me what the function value YYY \in \mathcal{Y} is really referring to, which also appears without an explicit discussion of its meaning in the Introduction as part of the variational distribution q(YS,V)q(Y|S,V). Is YYY \in \mathcal{Y} supposed to be the utility function value (which was also introduced with the notation U=Fθ(S,V)U = F_{\theta}(S,V) in the Introduction)? The confusion arising from notational ambiguity makes the paper less readable.

问题

  • Can the authors clearly define what YY is? The footnote mentions that YiY_i is the "probability of element ii being selected", but this description is ambiguous.
  • It looks like learning the neural network approximation in Eq. (4) is done via variational inference as in Ou et al. (2022). As I am not familiar with the cited work, it is unclear to me how q(YS,V)q(Y|S,V) is serves as an approximation for the subset likelihood p(SV)p(S|V) when the former is a distribution over YY and the latter is a distribution over SS. Can the authors provide clarifications on this?
  • How is the neural network construction in Eq. (4) explicitly related to pθ(S,V)p_{\theta}(S,V) (or Fθ(S,V)F_{\theta}(S,V))?
评论

We greatly appreciate the time and effort you have invested! In response to your concerns, we have provided clarifications here. We will also incorporate these clarifications into our revised version to enhance clarity.

Comments 1: Relationship Between xi,Si,x_i, S_i, and V.V.

ANSWER: We regard VV as a set composed of nn elements, denoted as xix_i, i.e., V={x1,x2,...,xn}V=\\\{x_1, x_2,..., x_n\\\}. In order to facilitate the proposition of Property 1, we describe VV as a collection of several disjoint subsets, specifically V={S1,,Sm}V = \\\{S_1, \dots, S_m\\\}, where SiRni×dS_i \in \mathbb{R}^{n_i \times d}. Here, nin_i represents the size of subset SiVS_i \subset V, that is, Si={x1i,x2i,...,xni}.S_i = \\\{x_{1_i}, x_{2_i},..., x_{n_i}\\\}.

Comments 2: The definition of Y.Y.

ANSWER: The generality of our Theorem 3.1 allows it to be applied to both U=F(Y|S,V) and P(Y|S,V) for different tasks. Specifically, when considering the task of Neural Subset Selection in Optimal Subset (OS) oracles, which involves learning P(Y|S,V), we define Y as a V|V| independent Bernoulli distribution, which is parameterized by Y[0,1]VY \in [0,1]^{|V|}, representing the odds or probabilities of selecting element xiVx_i \in V in a output subset SS.

Comments 3: Why can q(YS,V)q(Y|S,V) serve as a variational approximation to P(SV)?P(S|V)?

ANSWER: As discussed in the previous answer, Y[0,1]VY \in [0,1]^{|V|}. In practice, SS is represented as a binary vector (mask), denoted as S:={0,1}VS := \\\{0,1\\\}^{|V|}, where the ii-th element is equal to 11 if iSi \in S and 00 otherwise. Therefore, it is natural to use q(YS,V)q(Y|S,V) to represent the variational distribution of P(SV)P(S|V).

Comments 4: How is the neural network construction in Eq. (4) explicitly related to pθ(S,V)p_\theta(S,V) or Fθ(S,V).F_\theta(S,V).

ANSWER: Once neural networks are trained, their outputs become fixed for a given input, such as (S,V). Thus, Eq. (4) represents the explicit structure used to construct models for learning the deterministic function θ(S,V)\theta(S,V) (to differentiate it from the utility function U=F(S,V)). Using this function, we can construct the conditional distribution q(YS,V)q(Y|S,V) according to Theorem 3.5. Specifically, we employ the Mean-Field Variational Inference (MFVI) method introduced by [1] (Section 3.2) to approximate the distribution q(YS,V)q(Y|S,V), referred to as ψ\psi in [1].

To prevent overwhelming readers with an abundance of notations and equations, we have deliberately omitted the detailed construction of q(Y|S,V) and the derivation of variational approximation in our paper. This decision was motivated by two factors. Firstly, our theorem and Eq. 4 offer a general framework for modeling the relationship between YY and (S,V)(S,V), instead of focusing on the neural subset selection tasks. Secondly, in order to ensure clarity of our motivation, we have provided a high-level description of these concepts in the Introduction section. For readers interested in the details of these concepts, we strongly recommend referring to [1] (Section 3) for a more comprehensive understanding. For the implementation details of q(Y|S,V) and θ(S,V)\theta(S,V), we suggest consulting our accompanying code located at (./model/modules.py).

Thanks for your time and suggestions again. We would appreciate knowing if you have any additional feedback or suggestions.

[1] Ou Z, Xu T, Su Q, et al., "Learning Set Functions Under the Optimal Subset Oracle via Equivariant Variational Inference."NeurIPS, 2022.

评论

Dear Reviewer PoVi,

We sincerely appreciate your reviews and valuable suggestions. Taking into account your feedback, we have made refinements to the footnote in the Introduction Section and enhanced the description of VV and SS in Section 3.1. These revisions, highlighted in purple, will significantly enhance the clarity of our paper. Thank you once again for your time and contribution.

Best regards,

The Authors.

评论

Thanks for a quick response and for letting me know of the revisions. Here are additional clarification questions and comments:

  • Regarding Comment 1: Wouldn't it be more natural to write V=S1SmV = S_1 \cup \cdots \cup S_m with SiSj=S_i \cap S_j = \emptyset when iji \neq j, which I believe is indeed the form used in Section 3.2?
  • Regarding Comments 2 and 3: I think the description that YY is a distribution is misleading. Clearly, YY itself is not a distribution since it need not be the case that iYi=1\sum_{i} Y_i = 1, and by the definitions given here, P(SV)P(S|V) and q(YS,V)q(Y|S,V) are distributions over different objects. Rather, shouldn't it be the case that the probabilities of each element being selected in the optimal subset are the outputs from the variational distribution qq? In this case, it seems more appropriate to describe it as q(SV)q(S|V)? Please let me know if I am misunderstanding something here. Meanwhile, since YY is used throughout the main text, I think it should be very clearly defined as part of the main text before it is used, instead of appearing in a footnote (if appropriate, discussed along with concrete examples that readers can immediately understand).
评论

Thank you for your valuable suggestions! We are now revising our manuscript based on your suggestions. To address your questions, we would like to provide the following clarifications:

Firstly, we want to clarify that Y[0,1]VY \in [0, 1]^{|V|} represents a set of V|V| independent Bernoulli distributions rather than a categorical distribution with VV classes. Hence, it is not required for the elements of YY to satisfy the constraint iYi=1\sum_i Y_i = 1. Additionally, we define the expression of q(YS,V)q(Y|S,V) as follows:

q(YS,V)=iSYii∉S(1Yi),Y[0,1]V.q(Y|S,V) = \prod_{i \in S}Y_i \prod_{i \not\in S}(1-Y_i), Y\in[0,1]^{|V|}.

Next, we would like to explain why q(YS,V)q(Y|S,V) can approximate P(SV)P(S|V). Consider Y[0,1]VY \in [0, 1]^{|V|} and S={0,1}VS = \\\{0, 1\\\}^{|V|}. In this case, YY can be viewed as a stochastic version of SS since YY can also be generated as Y={0,1}VY = \\\{0, 1\\\}^{|V|} while still satisfying the constraint Y[0,1]VY \in [0, 1]^{|V|}.

To facilitate comprehension, let us consider an illustrative scenario. Suppose we have a ground set V={x1,x2,x3}V = \\\{x_1, x_2, x_3\\\}, and the optimal subset SS^* is {x1,x2}\\\{x_1, x_2\\\}, which can be represented as [1,1,0][1, 1, 0]. Specifically, we define P(SV)=1P(S^*|V) = 1, indicating that SS^* is the correct subset, while for any SSS \neq S^*, we have P(SV)=0P(S|V) = 0.

Now, let's examine the case when Y=[1,1,0].Y = [1, 1, 0]. In this situation, we can calculate that q(YS,V)=1.q(Y|S^,V) = 1. This implies that q(YS,V)q(Y|S^,V) accurately represents the probability of observing SS^* given VV, and it correctly assigns a high probability to the optimal subset.

We hope these clarifications help provide a better understanding of our framework. Once again, we appreciate your constructive suggestions and look forward to further discussion.

评论

Dear Reviewer PoVi,

Thank you for your constructive suggestions and insightful comments! As the discussion deadline is approaching in 10 hours, we would like to inquire if you have any further suggestions for improving our manuscript. We would greatly value your input and appreciate your guidance.

Thank you for your time and consideration. We eagerly await your reply.

Sincerely, The Authors

评论

Dear Reviewer PoVi,

We sincerely appreciate your valuable suggestions for improving our paper. We have refined the descriptions of VV and SS in Section 3.2 based on your suggestions. Moreover, we have defined YY in the main text instead of the footnote, where we have also included a reference to Appendix D.2. In this appendix, readers can find further elaboration on the relationship between q(YS,V)q(Y|S,V) and $P(S,V), along with the example mentioned in our previous response.

We sincerely thank you once again for your time and valuable contribution. Should you have any additional suggestions or questions, please do not hesitate to let us know.

Best regards, The Authors

评论

We express our sincere gratitude to the Area Chairs and Reviewers for their dedicated time and valuable feedback. Below is a concise summary of the review and our responses for ease of reference.

Reviewer Acknowledgments:

Our method, INSET, has been recognized for its strong empirical performance and rigorous theoretical support. Key highlights include:

  • Clear Motivation with Theoretical Support: Our model is inspired by a theoretical result [miuk], and have theoretically demonstrated the significance to include information from supersets to achieve better performance [PoVi, vcJR].
  • Strong Empirical Performance: INSET achieves state-of-the-art performance [miuk], supported by empirical evidence [PoVi].
  • Quality of Writing: The manuscript is praised for its clarity and simplicity, making it accessible for newcomers to the field [vcJR, miuk]. The presentation effectively conveys high-level concepts [PoVi].

Addressing Weaknesses:

The reviewers raised concerns regarding ablation studies [vcJR, miuk], dataset distribution [vcJR], and mathematical notations in the optimal subset oracle [PoVi, miuk]. Our responses include:

  • Ablation Studies: We clarified in this dialog that INSET introduces no new hyperparameters. Additional experiments demonstrate that even with variations in existing hyperparameters, INSET consistently outperforms baselines by a large margin. This response was acknowledged positively by Reviewer miuk.
  • Mathematical Notations: Following suggestions from Reviewers PoVi and miuk, we corrected minor typos and enriched the appendix with additional mathematical background. These revisions received positive feedback from Reviewer miuk.
  • Dataset Distribution: We clarified in the dialog that our approach encompasses three tasks across six datasets in different modalities in the main body of our work. The appendix have also included five additional datasets. We have utilized a much wider variety of datasets than those adopted by our baseline comparisons.

Revision Overview:

  • To enhance clarity, we have made refinements to the description of YY in the Introduction, as well as the definitions of VV and SS in Sections 3.1 and 3.2.
  • The mathematical description of the optimization objective and inference process has been detailed in Appendix D.2.
  • Appendix F.2 has been updated with additional empirical studies on computation costs. Further, ablation studies have been included in Appendix F.4, and results from a broader range of datasets are now presented in Appendix F.5."

We are thankful for the constructive feedback received, and we believe that the concerns raised by the reviewers have been thoroughly addressed in our responses and revisions.

AC 元评审

This paper proposes a method for "neural subset selection" based on deep sets. The paper received three reviews with borderline scores. Far the most comprehensive review came from PoVi, who found the work to be easy to follow, appreciated the discussion of the previous literature and noted the comprehensiveness of the experiments. The authors took time to provide extensive responses to the reviewer complaints but the reviewers did not take sufficient time to acknowledge these responses. In this case, and given the satisfaction expressed by the few reviewers who did reply with the updated results, I tend to give the authors the benefit of the doubt and recommend acceptance.

为何不给更高分

Too many weaknesses.

为何不给更低分

Reviewers all see strengths.

最终决定

Accept (poster)