PaperHub
6.3
/10
Poster4 位审稿人
最低6最高7标准差0.4
7
6
6
6
2.8
置信度
正确性3.3
贡献度2.5
表达3.0
NeurIPS 2024

Transductive Learning is Compact

OpenReviewPDF
提交: 2024-05-14更新: 2025-01-15
TL;DR

We demonstrate that the transductive sample complexity of learning a hypothesis class is exactly equal to the sample complexity of learning its most difficult "finite projection," for a wide range of losses.

摘要

We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $\mathcal{H}$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to all proper metric loss functions (e.g., any norm on $\mathbb{R}^d$) and any continuous loss on a compact space (e.g., cross-entropy, squared loss). For realizable learning with improper metric losses, we show that exact compactness of sample complexity can fail, and provide matching upper and lower bounds of a factor of 2 on the extent to which such sample complexities can differ. We conjecture that larger gaps are possible for the agnostic case. Furthermore, invoking the equivalence between sample complexities in the PAC and transductive models (up to lower order factors, in the realizable case) permits us to directly port our results to the PAC model, revealing an almost-exact form of compactness holding broadly in PAC learning.
关键词
Sample ComplexityCompactnessOne-Inclusion GraphsMetric SpaceTransductive LearningPAC Learning

评审与讨论

审稿意见
7

The paper looks at the question of whether or not the transductive sample complexity is compact in the sense that the transductive sample complexity globally can be deduced from understanding the tranductive sample complexity of finite instances of transductive learning. More formally transdictive learning is the setting where the learner is revealed a labelled dataset SiS_i with one variables xix_i missing and is then querried to predict the label of the xix_i, where all the labels y1,,yny_1,\ldots,y_n are determined by some hHh\in\mathcal{H} i.e. yj=h(xj)y_{j}=h(x_j). The goal is now create a learned AA which takes as input the SiS_i and xix_i and for some loss L:Y×YR0L: \mathcal{Y}\times \mathcal{Y}\rightarrow R_{0\geq }, minimize L(A,S,h)=inL(A(Si,xi),h(xi))L(A,S,h)=\sum_{i}^{n} L(A(S_i,x_i),h(x_i)) for any hHh\in \mathcal{H} and Si=1XiS\subset \cup_{i=1}^{\infty}\mathcal{X}^{i}. This can also be seen as a game where a adversary picks hHh\in\mathcal{H} and Si=1XiS\subset \cup_{i=1}^{\infty} \mathcal{X}^{i} and then look at the expected value of the loss of a learner which is given SiS_i and propmt the label of xix_i for a random iSi\sim |S|. The sample complexity of transductive learning is now for ε\varepsilon defined as

m(ε)=min{mN:infAsuphH,SXmL(A,S,h)ε}m(\varepsilon)=\min\{m\in\mathbf{N}: \inf_{A} \sup_{h\in\mathcal{H},S\in\mathcal{X}^{m}} L(A,S,h)\leq \varepsilon\}

The paper shows that the transductive sample complexity is compact in the following sense: For suitable loss functions(metric losses, contious losses on compact spaces, the zero one loss) the transductive sample complexity of a hypothesis class H\mathcal{H} m()m() is the same as for any finite XXX\subset \mathcal{X} and finite HHXH'\subset \mathcal{H}|_{X} having transductive sample complexity m()m(). They also show this for the agnostic setting of transductive learning, and distribution-families and furthermore show how PAC-learning can be studied from a point of view of finite projections of H\mathcal{H}. The authors also give an example of where the transductive sample complexity is not compact - but close to.

优点

Originality:

The paper states that they are the first to discover exact connection of the compactness of transductive sample complexity in the realizable and agnostic setting.

Quality and Clarity:

The paper is well written and things are nicely defined and explained.

Significance:

The idea about connecting tranductive sample complexity seems novel and interesting.

缺点

.

问题

Line 48: Why is HS\mathcal{H}_{S} nessarily finite? What if Y\mathcal{Y} is infinite? \newline

Line 174-175: In the definition of completable partial sisignment: Is it that there exist an assignment of the unassigned variables such that for all rRr\in R' this assignment is such that rr is less than ε\varepsilon or is it for all rRr\in R' there exists and assigment of the unassigned variables such that rr is less than ε\varepsilon?

Line 195: Why isn't it R<|R'|<\infty instead of R|R|\leq \infty so excluding ==?

Line 207: Do you have to be in P\mathcal{P} to be an upper bound?

Line 212-213: Why isn't: That is, there exists ϕjC\phi_{j} \in C which agrees with ϕC\phi_{C} in its action on l1,,lil_{1}, \ldots, l_{i}. As ϕjP\phi_{j}\in P, it must be that ϕj\phi_{j} is completable with respect to RR' .i.e. there exists assignments to li+1,lml_{i+1}\ldots,l_{m} "such that ε\leq \varepsilon", but since li+1,,lml_{i+1},\ldots,l_{m} are free variables of ϕC\phi_{C} and ϕC\phi_{C} upper bound ϕj\phi_{j} so especially agree ϕj\phi_{j} on l1,,lil_{1},\ldots,l_{i} the variables li+1,,lml_{i+1},\ldots,l_{m} can also be assigned such that ϕC\phi_{C} is also completable with respect to RR' with the same assignment of the variables li+1,,lml_{i+1},\ldots,l_{m} as ϕj\phi_{j}. Sorry if it is saying the same.

Line 216: total in what sense?

Theorem 3.6: Can you please provide the set of variables LL and RR in the proof of Theorem 3.6?(Formally)

Line 244: Why is the inf\inf always attained?

Line 246: Why does rr reflects bounded sets- because it is a function of a norm on Y\mathcal{Y} - so cant be to fare (O(ε))(O(\varepsilon)) from the fix argument - i.e. in a bound ball around that?

Line 331: The condition about disjointness in a R-matching I struggle with - could you try to explain why each RR only has one ll insident, when it has a degree larger than 11?

Line 352-354: Out of curiosity, why is it that 2 and 3 are not combined?

(not a very concrete question and out of curiosity so if time is limited not need to be answered): The finding is also very broth and theoretical (a strength). I am curious if the idea of studying the transductive sample complexity "locally" instead of "globally" have been used before in the analysis of the transductive sample complexity - the authors meantions in the related work section the γOIG\gamma-OIG - is the analysis of γOIG\gamma-OIG for instance done in this way.

局限性

Yes.

作者回复

Thank you for your time and attention in reviewing the paper. We will address your questions in order.

Line 48: HSH|_S can indeed be infinite when YY is infinite. For this reason, we refer only to finite subsets of HSH|_S as finite projections of HH. That is, a finite projection of HH refers to both restricting to a finite collection of points in the domain and restricting to finitely many behaviors on those points (lines 45-48). We will add further clarification to the beginning of Section 1.1.

Line 174-175: It is defined as the first option you mentioned: there exists an assignment of unassigned variables satisfying all functions in RR’. I.e., the assignment is permitted to depend upon the choice of RR’, but it may not depend upon particular functions in RR’. We will clarify this further in the next draft.

Line 195: That is a typo, the inequality should be strict – thank you!

Line 207: Yes, an element must be in the poset PP in order to be an upper bound, as it must be in PP in order to be comparable to any elements in PP.

Line 212-213: Yes, you are exactly correct! Unfortunately we must be slightly terse at times owing to the page limit. We will further clarify this step in the next version of the paper.

Line 216: Total in the sense of assigning all variables, i.e., not leaving any variables unassigned. We will clarify this in the paper.

Theorem 3.6: Formally, LL is a set of variables which are valued in YY and which are indexed/parameterized in the following manner: L = \cup_{S' \subseteq S, |S'| = n-1} H|_{S'}. So each variable L\ell \in L is represented by a tuple of the form (y1,,yi1,?,yi+1,,yn)(y_1, \ldots, y_{i-1}, ?, y_{i+1}, \ldots, y_n).

RR is a collection of functions, indexed/parameterized by R=HSR = H|_S. The function rRr \in R which is represented by (y1,,yn)(y_1, \ldots, y_n) depends upon the nn variables L\ell \in L whose representing vectors agree with that of rr except for a “?” in one of its entries. (Formally, the representing vectors of rr and \ell have the same values in the non-"?" entries of \ell.) Lastly, we define the output of rr on these variables 1,,n\ell_1, \ldots, \ell_n as 1nd(yi,i)\frac 1n \sum d(y_i, \ell_i).

Line 244: By assuming condition (2.), we have that all finite projections of H can be learned to error ϵ\leq \epsilon using n=m(ϵ)n = m(\epsilon) many samples. This is equivalent to condition (2.) of Theorem 3.3 for the system of variables and functions we describe, allowing us to conclude condition (1.). In that sense, we do not need to worry about infimums.

Line 246: Exactly: the pre-image of bounded sets must be bounded because the function uses the norm on Y. (E.g., the pre-image of [0,ϵ][0, \epsilon] contains only points within distance ϵ\epsilon of r=(y1,,yn)r = (y_1, \ldots, y_n) in each coordinate.)

Line 331: Each node in RR is permitted to have multiple incident nodes in LL. An RR-matching, however, consists of only one choice of incident edge for each node in RR. The requirement is that these chosen edges be disjoint, i.e., that they each connect to distinct nodes in LL. The essence of Theorem 3.13 is that the existence of such RR-matchings can be detected locally. That is, RR-matchings exist precisely when RR’-matchings exist for all finite RRR’ \subseteq R.

Line 352-354: At the level of information, it would indeed be equivalent to combine steps 2 and 3 (or even to remove step 2 entirely). We mention step 2 primarily to emphasize that the learner has access to all unlabeled datapoints in SS, call them SXS_X. This allows it to, for instance, construct the OIG of HH on SXS_X after step 2 and orient it optimally, and then use the information of step 3 to get a prediction for the unlabeled test point. This is often how transductive learning is described in the literature, but in the absence of step 2 the learner could equivalently just wait until after step 4 to see all unlabeled data, construct the OIG, orient it, and perform inference all in one go. It is primarily a matter of preference/exposition.

Local transductive sample complexity: Excellent question. One-inclusion graphs and the transductive model are in some sense intrinsically local, as they only study the performance of learners on finite regions of the domain. (As opposed to the PAC model, which considers performance with respect to distributions with possibly infinite support.) Our results can be viewed as stating that the transductive model can be made even more local by restricting focus to finite regions of the domain and only finitely many behaviors on the domain. (I.e., to finite subgraphs of one-inclusion graphs.) To our knowledge, this is as local a perspective as one could hope to achieve.

The γ\gamma-OIG is defined and analyzed using largely the same perspective as for classical OIGs. There, the key difference is that the loss function is not the 0-1 loss but rather the absolute loss on the interval Y=[0,1]Y = [0, 1]. The key idea behind the γ\gamma-OIG is to turn this continuous loss into a discrete one by thresholding: losses below γ\gamma are rounded to 0 and losses above γ\gamma are rounded to 1. (γ\gamma is thought of as a scale parameter ranging from 1 to 0.) This permits the authors to simplify the problem and to use ideas and techniques from discrete math (e.g., graph theory).

Thank you again for your detailed questions and comments! They will all be taken into account to improve the paper.

审稿意见
6

The paper studies transductive learning with general real-valued loss functions. The paper shows that: (1) for proper metric loss functions and continuous loss functions defined on compact spaces, the sample complexity of (realizable and agnostic) transductive learning a class H is exactly the same as the sample complexity of learning "finite projections" of H. (2) for improper metric loss functions: there is a tight gap of 2. (3) extensions of the above to the PAC learning model.

优点

I think that these are nice generic results that contribute to the literature on transductive learning. Effectively, the results seem to suggest that for classes of loss functions considered in this paper, it suffices to focus attention on constructing transductive learners for finite subsets of the hypothesis class H.

The result where there is a gap of 2 (Theorem 3.8 and 3.9) is particularly interesting.

缺点

Below, I include some comments that may help improve the paper when addressed.

Given that the main results of the paper deals with a class H and "finite projections" of H, I think that it is important to formally define and explicitly write down what "H|_S" is. Furthermore, given that the label space Y may be infinite, H|_S may be infinite as well, in the sense that we have infinitely many projections of H onto S. I think that it is important to clarify this point, and I find it a bit misleading to refer to H|_S as a "finite projection".

The definition of transductive learning (Def 2.1, 2.2, 2.3) when unpacked boils down to defining transductive learning with respect to any n, S \subseteq X^n, and any H|_S. So, with this in mind, when looking at Theorem 3.6 and 3.8, it seems like they can be reduced to the following:
For any n, and any S \subseteq X^n,

  1. H|_S is learnable with error rate epsilon.
  2. For all (finite) H' \subset H|_S, H' is learnable with error rate epsilon.

What I am trying to highlight is that the results seem to be establishing equivalence between transductive learning the (potentially) infinite H|_S and transductively learning the finite subsets of H|_S. I think the authors should carefully clarify what "H is learnable in the realizable case with transductive sample function m" means. This should be included with the current Def 2.1-2.3.

Additionally, it seems that talking about error rate in transductive learning instead of sample complexity makes more sense, because as the paper defines the model (Def'n 2.1 and line 25), the adversary chooses n. So, one can just focus on smallest achievable error rate as a function of n.


Based on authors' rebuttal, I raise my score to 6.

问题

See comments above.

局限性

Yes.

作者回复

We thank the reviewer for their time and attention in reviewing the paper.

We note that HSH|_S is the collection of all functions in HH restricted to the datapoints SS. Indeed HSH|_S can easily be infinite, in which case it is not a finite projection of HH. We instead refer to the finite subsets of HSH|_S as the finite projections of HH, as we describe at the beginning of Section 1.1 (lines 45-48). We will clarify this point further in the next version of the paper; thank you.

The reviewer’s restatements of Theorems 3.6 and 3.8 are indeed correct. We will clarify the meaning of “HH is learnable in the realizable case with transductive sample function mm” alongside Definitions 2.1-2.3 to avoid confusion. (In particular, it means that HH has a realizable transductive learner with sample complexity at most mm.)

We agree that discussing error rates and sample complexities is equivalent, across both the transductive and PAC models. We chose to phrase our results in terms of sample complexities rather than error rates, in part arbitrarily and in part as it is roughly a convention in the community. We indeed could have equivalently phrased all results in terms of error rates throughout the paper.

Thank you again for your comments on improving the exposition of the paper – they are very helpful and will be taken into account as we update the paper!

评论

Thank you for the response, I have updated my score to 6.

审稿意见
6

This work studies the transductive learninng model. In this model, given a domain X,YX,Y and HYXH\subset Y^X the adversary choses data (xi,yi)(x_i,y_i) (in the realizable setting the adversary can chose the labels after the reveal to the learner). The adversary then uniformly at random hides one data point. The goal of the learner is to use the remaining samples to predict the hidden label. The authors first show a compactness result showing that if all the finite subsets XX' are learnable in the projected class, i.e., HXH|X' with sample complexity mm then the same is true for the whole class. They provide a similar result for the agnostic case. Finally, the authors present a result connecting transductive learning with PAC learning.

优点

This work provides some fundamental results for the transductive learning. This work is well written.

缺点

  1. The authors do not convinced me about the importance of this model. The authors should explain why this model is important, what it explains that other models does not (i.e., the classical PAC learning).
  2. I do not believe that this submission is relevant to the neurips community, maybe the authors should consider submitting to COLT. 3.Furthermore, I believe that more results are needed to make this a complete submission.

问题

See 1 in weaknesses.

局限性

Yes.

作者回复

We thank the reviewer for their time and attention in reviewing the paper.

The authors do not convinced me about the importance of this model. The authors should explain why this model is important, what it explains that other models does not (i.e., the classical PAC learning).

We argue that the transductive model is itself influential and insightful. Moreover, it is intimately related to the PAC model such that even those readers with an exclusive attachment to the PAC model would derive insight from our work. We tried to communicate both of these points in the paper, though acknowledge that there is room for improvement, as the review notes. We elaborate on both of these points below in order to address the reviewer’s concern.

  1. The transductive approach to learning has a rich and deep history, dating to the seminal works of Vapnik and Chervonenkis ‘74 and Vapnik ‘82. Our particular model was first used by Haussler et al. ‘94 and has since been employed to derive numerous insights for PAC learning as well as other models of learning. We note that it offers several advantages relative to the PAC model, including its emphasis on labeling a single datapoint at test time, rather than emitting an entire hypothesis (i.e., the inductive vs. transductive approaches to learning). This perspective has seen renewed interest in recent years with the rise of such techniques as few-shot learning, which focuses on a model’s performance at a single test point (e.g., “Realistic Evaluation of Transductive Few-Shot Learning”, NeurIPS 2021). In fact, by focusing solely on prediction, the transductive model emphasizes improper learning by default, a notable advantage given the necessity of improper learning in settings such as multiclass classification (Daniely and Shalev-Shwartz 2014). Furthermore, the transductive model is closely related to the one-inclusion graph, a combinatorial technique that has given rise to fundamental insights across learning theory, including the first dimension to characterize multiclass learnability (the DS dimension; Brukhim et al. 2022), the first minimax optimal robust learner (via the global one-inclusion graph; Montasser et al. 2022), and the first dimension to characterize learnability for realizable regression (the \gamma-OIG dimension; Attias et al., 2023) — to name only a few examples. Lastly, the transductive model is arguably conceptually simpler than the PAC model, as it merely requires learners to attain error less than epsilon on each sample, and avoids any involvement of marginal distributions, integrability issues, confidence parameter delta, etc.

  2. The transductive model bears intimate connections to the PAC model. We note that any results concerning transductive sample complexities automatically transfer to the PAC model in a black-box manner with only minor losses (lines 60-65, Lemma 3.19). Consequently, our work demonstrates an almost-exact form of compactness for sample complexities holding broadly in the PAC model (Corollary 3.20). We believe that this makes our work of central interest even to those who have an exclusive attachment to the PAC model of learning.

I do not believe that this submission is relevant to the neurips community, maybe the authors should consider submitting to COLT.

We respectfully disagree with the reviewer on this point. We note that NeurIPS has published several works analyzing the transductive model and the one-inclusion graph in recent years, many of which our paper cites and builds directly upon. These include “A trichotomy for transductive online learning” (NeurIPS 2023), “Optimal learners for realizable regression: PAC learning and online learning” (NeurIPS 2023), “Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization” (NeurIPS 2023), and “Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes” (NeurIPS 2022), among others.

Perhaps more importantly, we believe that our work demonstrates a far-reaching structural property of sample complexities which would be of interest to many members in the NeurIPS community, both theoretically inclined and beyond. We believe that collectively, our results form an original and meaningful contribution to learning theory, as requested in NeurIPS’ call for papers.

评论

I thank the authors for their response. I am raising my score to 6.

审稿意见
6

This document explores the concept of compactness in the context of transductive learning, a model closely related to the PAC model in supervised learning. The authors demonstrate that for a broad class of loss functions, a hypothesis class can be learned with a specific transductive sample complexity if and only if all its finite projections (subsets of the hypothesis class restricted to finite data sets) can be learned with the same sample complexity. This result holds for realizable and agnostic learning settings, with specific bounds provided for realizable learning with improper metric losses. The authors highlight the significance of this “exact” compactness result, as it avoids dilution by asymptotics or constants. They further connect their findings to the PAC model, revealing an almost exact form of compactness for realizable PAC learning. The paper also discusses the implications of proper versus improper learners, demonstrating a structural difference in terms of compactness. The paper's core lies in generalizing the classic marriage theorems for bipartite graphs, which provides a foundation for the compactness results.

优点

This paper presents a compelling and rigorous analysis of compactness in the context of transductive learning. The authors contribute significantly by demonstrating an “exact” compactness result, which avoids the limitations of asymptotic or constant-based approaches. This result is particularly noteworthy for its broad applicability to a comprehensive class of loss functions and its relevance to both realizable and agnostic learning settings. Here’s a breakdown of the paper’s strengths across different dimensions: Originality: The paper’s originality is a key strength, stemming from its unique approach to proving compactness in transductive learning. The authors' generalization of the classic marriage theorems for bipartite graphs serves as a foundation for their key results, introducing a novel framework that establishes a precise connection between the learnability of a hypothesis class and the learnability of its finite projections, a result that has not been previously demonstrated. Quality: The paper is of high quality, exhibiting rigorous mathematical proofs and clear exposition. The authors' careful definition of their assumptions and provision of complete proofs for all their theoretical results demonstrate a thoroughness that ensures the validity and reliability of their findings. Clarity: The paper is well-written and easy to follow. The authors effectively introduce the concepts of transductive learning and compactness, providing clear definitions and explanations. The structure of the paper is logical, guiding the reader through the key results and their implications. Significance: The paper’s importance lies in its contribution to our understanding of the fundamental principles of transductive learning. The “exact” compactness result provides a powerful tool for analyzing the learnability of hypothesis classes in this setting. This result can potentially impact future research in transductive learning, particularly in semi-supervised and active learning areas. Overall, this paper presents a valuable and original contribution to the field of transductive learning. Its rigorous analysis, clear exposition, and significant implications make it a strong candidate for publication.

缺点

This paper presents a strong theoretical contribution, but it would benefit from a more nuanced discussion of its limitations and potential applications.

Weaknesses:

Limited Scope of Applications: While the paper establishes a powerful compactness result for transductive learning, it doesn’t delve into the practical implications of this finding. The authors could strengthen their work by exploring how this result translates to real-world scenarios. For instance, they could discuss specific transductive learning algorithms where this compactness property is particularly relevant or analyze the impact of different loss functions on the sample complexity.

Lack of Empirical Validation: The paper focuses solely on theoretical analysis. While this is valuable, it would be significantly enhanced by including empirical studies to demonstrate the practical relevance of the compactness results. Even a small-scale simulation could provide valuable insights into the behavior of transductive learners under different conditions.

Comparison to Existing Work: The paper could benefit from a more thorough comparison to existing work on compactness in learning theory. While the authors mention the PAC model, they could provide a more detailed discussion of how their results relate to existing compactness results in that framework. This would help clarify the novelty and significance of their contribution.

Discussion of Assumptions: The paper clearly states its assumptions, but it could benefit from a more in-depth discussion of their limitations. For example, the authors could explore the impact of relaxing the assumption of realizable learning or discuss the potential implications of using improper learners.

Actionable Insights:

Expand on Applications: The authors should dedicate a section to discussing potential applications of their compactness results in real-world transductive learning problems. This could involve analyzing specific algorithms, exploring the impact of different loss functions, or discussing the implications for different data distributions.

Include Empirical Studies: Even a small-scale simulation could provide valuable insights into the practical relevance of the compactness results. This would strengthen the paper’s impact and demonstrate the applicability of the theoretical findings.

Strengthen Comparison to Existing Work: The authors should provide a more detailed comparison to existing work on compactness in learning theory, particularly in the context of the PAC model. This would help clarify the novelty and significance of their contribution. Discuss Assumption Limitations: The authors should dedicate a section to discussing the limitations of their assumptions. This could involve exploring the impact of relaxing the assumption of realizable learning or discussing the potential implications of using improper learners.

By addressing these points, the authors can significantly enhance the impact and relevance of their work.

问题

This paper presents a compelling theoretical analysis of compactness in transductive learning. However, as a reviewer, I have some questions and suggestions for the authors to consider:

  1. Generalizability of Compactness Results:

Question: The paper focuses on a broad class of loss functions. Could the authors provide more concrete examples of loss functions that fall within this class and those that do not? This would help readers understand the practical implications of the results.

Suggestion: It would be beneficial to briefly discuss the results' limitations, particularly in terms of the specific loss functions that are not covered.

  1. Implications of Proper vs. Improper Learners:

Question: The paper mentions a structural difference in compactness between proper and improper learners. Could the authors provide a more detailed explanation of this difference? How does it impact the practical application of the results?

A dedicated section or subsection discussing the implications of proper vs. improper learners for compactness would be very valuable.

  1. Practical Applications:

Question: While the paper focuses on theoretical results, discussing potential practical applications of the compactness results would be helpful. How can these results be used to design more efficient transductive learning algorithms?

Suggestion: A brief discussion of potential applications, even if speculative, would enhance the paper’s relevance and impact.

  1. Future Directions:

Question: The paper mentions a conjecture about more significant gaps between sample complexities in the agnostic case. Could the authors elaborate on this conjecture and discuss potential approaches to proving it?

Suggestion: A section on future directions, outlining potential extensions and open problems, would add value to the paper.

局限性

While the paper does not include a dedicated “Limitations” section, the authors have effectively integrated discussions of limitations throughout the paper, particularly in the introduction and conclusion.

The authors have adequately addressed the limitations of their work. They clearly state the assumptions required for their theoretical results and acknowledge that these results may not hold in more general settings. Notably, they acknowledge the potential for larger gaps between sample complexities in the agnostic case, demonstrating their awareness of the field's challenges. They also discuss the limitations of their approach in terms of its applicability to different learning settings.

However, the paper does not discuss any potential negative societal impacts of their work. This is understandable, given that the paper focuses on theoretical results in transductive learning, which is a relatively abstract field. However, it would be beneficial for the authors to briefly consider the potential applications of their work and any potential negative societal impacts that might arise.

最终决定

The authors show a "compactness" result for transductive learning for a range of settings and loss functions. Roughly speaking, they show the sample complexity of transductive learning is m iff it is m for all of its "finite projections" (which means to characterize sample complexity it is enough to look at finite subsets of the class). "Compactness" is a rather mathematical concept and is a bit far from usual NeurIPS papers. However, the reviewers found the techniques and results interesting and relevant to the theory community. As such, I also recommend acceptance.

Reviewer Rcdh's comment were not used in the decision making by the AC.