PaperHub
5.5
/10
Poster4 位审稿人
最低3最高3标准差0.0
3
3
3
3
ICML 2025

Online Linear Classification with Massart Noise

OpenReviewPDF
提交: 2025-01-24更新: 2025-07-24

摘要

We study the task of online learning in the presence of Massart noise. Specifically, instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\boldsymbol{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\boldsymbol{x}$ with unknown probability {\em at most} $\eta$. We focus on the fundamental class of $\gamma$-margin linear classifiers and present the first computationally efficient algorithm that achieves mistake bound $\eta T + o(T)$. We point out that the mistake bound achieved by our algorithm is qualitatively tight for computationally efficient algorithms; this follows from the fact that, even in the offline setting, achieving 0-1 error better than $\eta$ requires super-polynomial time under standard complexity assumptions. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards---instead of satisfying commonly used realizability assumptions---are consistent, in expectation, with some linear ranking function with weight vector $\boldsymbol{w}^\ast$. Given a list of contexts $\boldsymbol{x}_1,\ldots \boldsymbol{x}_k$, if $\boldsymbol{w}^*\cdot \boldsymbol{x}_i > \boldsymbol{w}^* \cdot \boldsymbol{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $\Delta$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k)~ \Delta T - o(T)$ bigger than choosing a random action at every round.
关键词
Online LearningMassart noise

评审与讨论

审稿意见
3

The paper studies online linear classification under massart noise where noisy label might be flipped with a rate of η\eta. In the case where the dataset is separable with a margin of γ\gamma. The paper used a scaled leakyRelu function as the surrogate loss and guarantees a mistake bound of ηT+O(T3/4/γ)\eta T + O(T^{3/4} / \gamma). The binary classification tasks can be extended to kk-armed bandit setting under the assumption of rewards being monotonic and separated by Δ\Delta, an expected reward (11/k)ΔTo(T)(1 - 1/k) \Delta T - o(T) is recovered. The later result is a generalization of the first when k=2k = 2, with a slight worse dependence on o(T)o(T).

给作者的问题

The required assumption in definition 1.5 seems very restrictive if we are considering kk-armed bandit, as it requires pairwise Δ\Delta separability. Δ\Delta separability between every arm and the best arm is common. I wonder whether definition 1.5 can be lifted to the standard assumption.

论据与证据

The paper claims focusing on computational efficient algorithm with provable guarantee rather than previously established algorithm scales with O(T)O(\sqrt{T}) but incomputable. The claim is supported by main theorems.

方法与评估标准

The paper used classical and well acknowledged metric in the filed, mistake bound and reward bounds, for evaluation

理论论述

All checked

实验设计与分析

Not applicable

补充材料

All checked

与现有文献的关系

the paper contributes to provide new computational efficient online algorithm with theoretical guarantee for linear classification with labels subject to Massart noise. The bound matches with previous offline result.

遗漏的重要参考文献

well referenced.

其他优缺点

The paper is well written and easy to follow. The intuition of selecting LeakyReul with appropriate parameter is clearly demonstrated.

其他意见或建议

  • typo line 234 left '+' instead of '-' followed by t|t|.

  • typo, eqn start from line 312 left: indexing with tt instead of ii

  • I did not follow at line 310 right panel, proof to Theorem 1.9. In particular, by definition of R(T)R(T) in line 308 right and definition of Rˉ(T)\bar R(T) in line 268 left. It seems conclude E[R(T)]O(GD/T)=Rˉ(T)E[R(T)] \le O(GD / \sqrt{T}) = \bar R(T) with appropriate G,DG, D being substituted. What choices allowed the claimed bound Rˉ(T)=O(T/τ)\bar R(T) = O(\sqrt{T} / \tau)

  • Another confusion of above might arise at η\eta had clashed interpretation: step size in Fact 2.4 and noise level in Definition 1.1; then GG is related to η\eta (in the noise sense)...

作者回复

We thank Reviewer qMz2 for the positive feedback on the paper's writing and clarity of intuition. We will fix the typos pointed out by the reviewer.

Clarification on Rˉ(T)\bar{R}(T) Derivation: We apologize for the confusion regarding the derivation involving Rˉ(T) \bar{R}(T) around Line 310. The bound indeed stems from applying Fact 2.4 (Online Gradient Descent regret bound). In our setting, the parameter values are as follows: The domain diameter (D) is equal to 1. The gradient norm GG is related to the Lipschitz constant of our reweighted loss, which is O(1/τ)O(1/\tau). Substituting these into Fact 2.4 gives the bound O(T/τ)O( \sqrt{T}/\tau).

We acknowledge the inconsistent reuse of the symbol η\eta for both the Massart noise level and the step size (Fact 2.4). We will change the notation for the step size throughout the paper.

Restrictiveness of Assumption (Definition 1.5): The reviewer correctly notes that our assumption of pairwise separability by Δ\Delta for all pairs of arms in the bandit setting is stronger than the more standard assumption requiring only separability between the best arm and all other arms. Relaxing our assumption to the standard "best vs. others" gap introduces significant technical challenges in adapting our Massart noise learner. While this is a very interesting and important direction, we leave it open for future work.

审稿意见
3

The paper studies online linear classification with massart noise. The paper designs computationally efficient algorithms for online linear classification with Massart noise. The paper also extends this model to k-arm contextual badntit setting.

update after rebuttal

After rebuttal, I maintain my score.

给作者的问题

  1. Could you explain why you define a loss function step d in algorithm1, what is the problem of using LeakyReLU directly?

论据与证据

In general, the claims are supported by sufficient evidence. The theoretical results seem sound.

方法与评估标准

The performance measure, the mistake bound, makes sense as it is the measure that is studied in the most similar works.

理论论述

I checked the general soundness and read all the proofs.

实验设计与分析

The paper does not include experiments.

补充材料

I review all supplementary materials.

与现有文献的关系

The contributions of this paper fall within the broader field of Online Linear Classification under Massart noise. The most relevant prior work is by Ben-David et al. (2009), who proposed inefficient algorithms. By contrast, this paper presents efficient algorithms.

遗漏的重要参考文献

I don’t know any specific part of the literature that is not addressed.

其他优缺点

  1. The paper is well written.
  2. This paper gave the first efficient algorithm achieving a mistake bound of ηT+o(T)\eta T+o(T) in online linear classification with Massart Noise.

其他意见或建议

There is a typo: 234: |t| - t -> |t| + t.

作者回复

We thank Reviewer WUUa for the positive feedback on the clarity and soundness of our theoretical results. We will fix the typos pointed out. In response to the reviewer’s question about the use of the Leaky ReLU:

The standard LeakyReLU loss penalizes points that are further away from the decision boundary more than the points that are close. By reweighting with the margin we equalize the contribution of all points to the objective leading to minimizing misclassification error. Making the reweighting depend on the current vector w would result in a non-convex objective but treating it as a constant everytime results in an online sequence of convex objectives and we prove that our method converges to the desired solution. This is explained in lines 181-191 (left column) of the submission.

审稿意见
3

This paper considers an online learning setting where context-label pairs are generated with Massart noise. More specifically, while in standard online classification, both the context and label are generated adversarially, the authors consider a setting where the context is generated adversarially, but the label is determined based on the context at that time and is subject to Massart noise.

They first derive a mistake bound for this online classification setting that matches a computational lower bound in the offline setting. Technically, they achieve this by considering a new loss function for the online learning algorithm, in which a LeakyReLU function is weighted by the margin of the sample at each time step, which allows them to establish the above mistake bound.

By leveraging this technique, they further derive a desirable mistake bound for the kk-armed contextual bandit under assumptions weaker than realizability for the reward function. When the number of arms is 2, this mistake bound matches existing results.

给作者的问题

no

论据与证据

Yes, all propositions and claims in the paper are accompanied by proofs or corresponding references.

方法与评估标准

Yes, the proposed algorithms are variants of existing algorithms in online classification and are thus valid.
Moreover, the evaluation criteria (mistake bound and expected reward) are standard in the literature.

理论论述

Yes, I have reviewed the proofs in the main body and confirmed their validity with no issues.

实验设计与分析

NA

补充材料

The reviewer did not review the supplementary material.

与现有文献的关系

In the traditional online (linear) classification setup, the setting where both the context and label are generated adversarially has been extensively studied. However, the traditional assumption that the label is generated completely independently of the context is overly pessimistic. To address this issue, the authors introduce the Massart noise assumption, which has been well-studied in the offline setting, and consider a scenario where there is some dependence between the context and the label, and this is an interesting aspect of this paper. Furthermore, instead of directly using the LeakyReLU function, which is known to be effective in existing studies, they consider a variant weighted by the margin of the sample at each time step.
By doing so, they establish a mistake upper bound that matches a computational lower bound, and this is of interest to the community.

遗漏的重要参考文献

no

其他优缺点

This paper is overall well written.
In particular, the problem setting, algorithms, definitions, and main results are presented in a highly clear manner.
Other strengths are discussed in Relation To Broader Scientific Literature.

A minor weakness is the presence of numerous apparent typos.
For example,

  • In Line 72, the phrase "chooses an action α=1,,k\alpha = 1, \dots, k" is unclear.
  • In Line 80, E[rix(t)]\mathbb{E}[r_i | x^{(t)}] should be E[rixi(t)]\mathbb{E}[r_i | x_i^{(t)}].

Additionally, some notations are quite confusing.
For example,

  • In Fact 2.1, the expression 1/2((12λ)tt)1/2 ((1-2\lambda)|t| - t) is difficult to interpret.
  • In Fact 2.4, the regret upper bound GD/3TGD/3 \sqrt{T} is presented in a fraction format that makes it unclear.

It would be desirable to revise these points for clarity.

其他意见或建议

no

伦理审查问题

a

作者回复

We thank the reviewer for the comments. We will fix the typos and address the reviewer’s suggestions to improve clarity.

审稿意见
3

They present a computationally efficient alg that achieves mistake bound \eta T + o(T) where \eta is the probability of flipping the ground truth label in the Massart Noise model. Their algorithm is based on performing online gradient descent on a seq of reweighted Leaky-relU loss functions.

Next, they consider a semi-random k-arm contextual bandit problem, where given a list of contexts, they are consistent with some halfspace ww^*, in expectation. That is, given a list of contexts x1,,xkx_1,\cdots, x_k, given that wxj>wxiw^* x_j > w^* x_i, the expected reward of action I is larger than action j in expectation by at least \Delta. They used their online Massart learner to obtain an efficient bandit algorithm that obtains roughly (1−1/k)∆T more reward than playing at random at every round. Question: what is a lower bound here, in terms of what is the maximum reward possible?

给作者的问题

Same as above.

论据与证据

Their results are definitely novel and interesting, although I did not check all the proofs.

方法与评估标准

Theoretical paper, and the claims seem reasonable to me.

理论论述

I did not check all the proofs but the claims seem reasonable to me.

实验设计与分析

Theoretical paper

补充材料

I did not check the appendix.

与现有文献的关系

For the class of \gamma-margin linear classifiers they present the first computationally efficient algorithm that achieves mistake bound \etaT+o(T).

遗漏的重要参考文献

I cannot think of anything that is missing.

其他优缺点

Their results are definitely novel and interesting, although I did not check all the proofs. In terms of writing of the paper, at some places, it was very hard to understand what is going on, here are some comments:

Lemma 2.2., you did not define \bar{R}. Lines 245- 260, what is u? It is not defined. Alg 2, what is the function G?

Although the results are novel and interesting, I think the write-up can be improved a lot.

其他意见或建议

Same as above.

作者回复

We thank Reviewer WzcY for the positive feedback on the paper's novelty. We respond to the reviewer’s questions below.

Rˉ\bar{R} in Lemma 2.2: Thank you for pointing this out. Rˉ\bar{R} refers to the regret bound used in the online gradient descent analysis. Vector uu in Lines 245-260: The vector uu in Lines 245-260 was meant to represent an auxiliary vector, which is independent of the current vector ww in computing the gradients. The value of the vector uu is chosen to be the vector ww but we define it in this way so that the reweighting does not change the gradient wrt ww.

Function GG in Algorithm 2: In Algorithm 2, GG represents the loss function being minimized at each step of Algorithm 2. Its specific form, incorporating the reweighted LeakyReLU loss and bandit feedback, is constructed in steps 2 and 3b of Algorithm 2. We use \ell(w) as a shorthand for the instance of this loss function GG at a particular step, simplifying the presentation of the regret analysis.

Lower Bound Question: The theoretical maximum expected reward achievable by any algorithm (even computationally inefficient ones or those with oracle access to the true separator ww^*) corresponds to always selecting the optimal arm based on the current context xtx^t. If we assume, without loss of generality, that arm 1 is the optimal arm in expectation for every round tt the maximum possible expected reward is E[t=1Tr1t]E[\sum_{t=1}^T r_1^t]. Establishing tight bounds on the maximum achievable reward specifically for computationally efficient algorithms in this precise semi-random setting is challenging, and related questions remain open even in offline contexts. For example, in the special case where k=2k=2 and rit{±1}r_i^t\in\{\pm 1\}, the expected reward gain of (1/2)ΔT(1/2)\Delta T achieved by our algorithm is indeed a meaningful quantity. As discussed in Remarks 1.6 and 1.10, this relates directly to the mistake bound achievable in the underlying binary classification problem. Given the known computational hardness results (as stated in the aforementioned remarks in the submission), the maximum expected reward that can be achieved by computational efficient algorithms is at least ΔT\Delta T.

最终决定

The paper studies online linear classification and contextual bandits in the presence of Massart noise. In this case, the paper gives an algorithm that is o(T) away from the best possible offline algorithms. There are a few weaknesses, such as the lack of a discussion on possible lower bounds, and a further comparision with more recent offline works (some of which the authors say are inspired by this paper).

The authors should take into account the feedback from the reviewers and rebuttal in preparing the revision.