PaperHub
8.2
/10
Spotlight4 位审稿人
最低5最高5标准差0.0
5
5
5
5
3.5
置信度
创新性2.8
质量3.5
清晰度3.3
重要性3.3
NeurIPS 2025

Online Strategic Classification With Noise and Partial Feedback

OpenReviewPDF
提交: 2025-05-09更新: 2025-10-29

摘要

In this paper, we study an online strategic classification problem, where a principal aims to learn an accurate binary linear classifier from sequentially arriving agents. For each agent, the principal announces a classifier. The agent can strategically exercise costly manipulations on his features to be classified as the favorable positive class. The principal is unaware of the true feature-label distribution, but observes all reported features and only labels of positively classified agents. We assume that the true feature-label distribution is given by a halfspace model subject to arbitrary feature-dependent bounded noise (i.e., Massart Noise). This problem faces the combined challenges of agents' strategic feature manipulations, partial label observations, and label noises. We tackle these challenges by a novel learning algorithm. We show that the proposed algorithm yields classifiers that converge to the clairvoyant optimal one and attains a regret rate of $ O(\sqrt{T})$ up to poly-logarithmic and constant factors over $T$ cycles.
关键词
strategic classificationonline optimizationlearning halfspaces with noise

评审与讨论

审稿意见
5

This paper considers the strategic classification model with following extension: candidates arrive online and the principal only observes the manipulated features and the label only if the candidate was classified positively. Candidate features originate from an unknown distribution and are not exactly separable by a linear hyperplane - rather, the labels can be noisy wrt to the plane (Massert noise that is correlated with features). Like other works in the online strategic setting [Chen, 2020], they assume agents can manipulate within some radius. The key departure from existing online model is the observability of the labels conditioned on acceptance.

The key technical contribution of the work is providing an online algorithm that achieves T\sqrt{T} regret with respect to the offline optimal, which can be informally characterized as shifting the optimal non-strategic classifier by the width of the manipulation radius. The algorithm composed of three parts: an initialization phase where the principal positively classifies both sides of the plane (on every other step) to collect data. A refinement stage that explores a boundary near the initialization phase - both of these can incur large regret and thus runs for log(T) time. Lastly, the large part of the algo runs where the enhancement is over batches and proxy features are used to approximate the gradient (in lieu of costly exploration).

优缺点分析

With strategic classification paper, I find the model to be crucial and I find the model here to be fairly reasonable and well justified. Indeed in many settings only positive labels are observed for their true label. While this is the main difference from existing works, I don't think it's a negative as it does lead to a meaningful/challenging problem. I also found the paper to be focused in answering the crucial question of sub-linear regret. Overall, I think it's a good paper.

My main criticism would be in the exposition of the paper - especially in the theory section. For example, in the initialization part of the algorithm, it is not clear what the significance of the acute angle perspective is. Some diagrams or high-level description of the main ideas would help for the first and second components of the algorithm.

问题

  • Do you have any thoughts on the setting where each agent's manipulation radius γ\gamma is different? So one can either think of this as being an unknown (perhaps too hard) or belonging to some range. I know prior literature also makes this assumption -- so not a major criticism -- but I am curious about it since afaik the clairvoyant classifier in this setting can no longer be easily characterized.

  • Could you explain why in the refinement stage, why we need to consider strategic manipulation and form a linear optimization problem. My understanding is that we set a threshold b and agents within b margin (according to true features) of the current classifier are positively classified. Can this not be achieved by shifting the boundary by \gamma?

局限性

I don't see any negative social impact of this work.

最终评判理由

I had a positive view of this work and the author rebuttal maintained this perspective.

格式问题

None

作者回复

We sincerely thank you for finding our model crucial, reasonable and well-justified and acknowledging that our paper is good. We also appreciate your thoughtful comments. We provide our feedback as follows.

  • Explanation for the significance of each algorithm:

We agree that we should explain the sub-algorithms in more detail. Here let us provide further explanations for the significance of each sub-algorithm below, which we also plan to add to our manuscript.

Initialization Algorithm. The Initialization Algorithm aims to output an initial vector wˉ0\bar w_0 such that the inner product of wˉ0\bar w_0 and the coefficient vector of the optimal classifier ww^\star is lower bounded by a positive constant (w,wˉ0c1(12ηˉ)>0\langle w^\star, \bar w_0 \rangle\ge c_1(1-2\bar\eta)>0) with high probability (notably, we want to clarify that the angle between wˉ0\bar w_0 and ww^\star should be strictly less than π/2\pi/2, not just less than or equal to π/2\pi/2). This guarantee is crucial for the subsequent Refinement stage. The Refinement Algorithm relies on the localization technique, which in turn requires that the hyperplane deployed in every round remains close to the optimal one. However, since ww^\star is actually unavailable, for each coefficient vector w0,iw_{0,i} in Refinement, we cannot measure w,w0,i\langle w^\star , w_{0,i}\rangle directly. Instead, we constrain every w0,iw_{0,i} to lie in the set K0={w w,wˉ0c1(12ηˉ)}\mathcal{K}_0 = \{w|\ \langle w, \bar w_0 \rangle\ge c_1(1-2\bar\eta)\}, which, by construction, contains ww^\star with high probability. Thus, every hyperplane used in Refinement remains provably close to the optimal one, enabling effective localization without ever seeing ww^\star.

Refinement Algorithm. The Refinement Algorithm aims to output a vector w1w_1 such that the angle between w1w_1 and the coefficient vector of the optimal classifier ww^\star is less than or equal to π/4\pi/4 with high probability. This guarantee is important for the subsequent Enhancement stage. The Enhancement Algorithm relies on the effective construction of proxy features, which in turn requires that the angle between the coefficient vectors wk,iw_{k,i} deployed in each iteration always forms an acute angle with ww^\star (θ(wk,i,w)π/2\theta(w_{k,i},w^\star)\le \pi/2). Still, since ww^\star is actually unavailable, we cannot directly measure θ(w1,i,w)\theta(w_{1,i},w^\star) in batch 1. Instead, we constrain every w1,iw_{1,i} to lie in the set K1={w w,w1cos(π/4)}\mathcal{K} _ 1 = \{ w |\ \langle w, w _ 1 \rangle \ge \cos (\pi / 4) \}, which is equivalent to θ(w1,i,w1)π/4\theta(w_{1,i},w_1)\le \pi/4 for all i[T1]i\in[T_1]. Then, by the triangular inequality of angles as we have proved in Appendix A.3, Lemma 12, if θ(w1,w)π/4\theta(w_1,w^\star)\le \pi/4 and θ(w1,w1,i)π/4\theta(w_1,w_{1, i})\le \pi/4, then θ(w1,i,w)π/2\theta(w_{1,i},w^\star)\le \pi/2 for all i[T1]i\in[T_1], enabling the construction of proxy feature.

Enhancement Algorithm. During the Batched Enhancement phase, each batch kk receives an initial vector wkw_{k} from the last batch, which is required to satisfy θ(w,wk)π/2k+1\theta(w^\star,w_{k})\le \pi/2^{k+1}. Next, in each iteration ii of batch kk, we constrain the deployed coefficient vector wk,iw_{k,i} to lie in the set Kk={ww,wkcos(π/2k+1)}\mathcal{K} _ k = \{ w \mid \langle w, w _ k \rangle \ge \cos \left( \pi / 2^{k+1} \right) \}, which means θ(wk,i,wk)π/2k+1\theta(w_{k,i},w_k)\le \pi/2^{k+1} for all i[Tk]i\in[T_k]. Then, by the triangular inequality of angles as we have proved in Appendix A.3, Lemma 12, we can show that θ(w,wk,i)θ(w,wk)+θ(wk,i,wk)π/2k\theta(w^\star,w_{k,i})\le \theta(w^\star,w_{k})+\theta(w_{k,i},w_{k})\le \pi/2^{k}. This statement is quite critical: On one hand, it controls the expected cumulative error in batch kk to be O(12kTk)O(\frac{1}{2^k}\cdot T_k). On the other hand, the condition that θ(wk,i,wk)π/2k\theta(w_{k,i},w_k)\le \pi/2^{k}, together with our localized online mirror descent method, ensures that batch kk outputs a vector wk+1w_{k+1} that satisfies θ(w,wk+1)π/2k+2\theta(w^\star,w_{k+1})\le \pi/2^{k+2} with high probability, which is required by the next batch.

  • Discussions on unknown or heterogeneous maximum manipulation radius γ\gamma:

In our paper we consider the setting that the manipulation radius (or manipulation budget/maximum manipulation distance) γ\gamma is known, which is a common practice in the literature (Chen et al. (2020), Shen et al. (2024)). However, considering unknown or heterogeneous γ\gamma is a quite interesting direction of future study.

For a homogeneous unknown γ\gamma, Ahmadi et al. (2021) proposed a strategic perceptron algorithm with an unknown manipulation radius. Their key idea is running a binary search to find a proper estimate on this radius. However, this idea relies heavily on their assumption that the positive and negative classes are strictly separated by a margin without any noise. Their algorithm design requires either knowing the value of the margin or having an estimation process for it. These requirements cannot be fulfilled in the noisy setting without any margin. Therefore, it is hard to extend Ahmadi et al. (2021)'s binary search approach to our setting with unknown γ\gamma.

If the maximum manipulation distance is heterogeneous but known, then adapting our algorithm seems straightforward. That is, each agent arriving at time tt has a maximum manipulation distance γt\gamma_t that potentially differs from other agents' manipulation distances, but every γt\gamma_t is known to the principal prior to decision-making. In this case, our construction of proxy feature and pairwise contrastive inference can directly use the known γt\gamma_t for each agent, in place of the previous homogeneous γ\gamma.

  • Explanations on the Refinement stage:

As we have explained at the beginning, we need the Refinement Algorithm to output a vector w1w_1 such that θ(w1,w)π/4\theta(w_1,w^\star)\le \pi/4 with high probability, which is crucial for the subsequent Enhancement Algorithm. To achieve this, we form an online linear optimization problem with localization following Zhang et al. (2020). Zhang et al. (2020) show that in the non-strategic setting, this approach can effectively learn halfspaces under Massart noise. Our refinement stage applies classifiers as if there is no manipulation and focuses on the localization region right above the classification hyperplane. Agents within the localization region are naturally positively classified so they would truthfully report their features. Therefore, we can collect true data within the localization region, enabling the approach in Zhang et al. (2020) with theoretical guarantees. However, although this can collect true data within the localization region, it is fairly costly. This is because unqualified agents below the classification hyperplanes are incentivized to manipulate their features to get positively classified, resulting in high instantaneous regret. Fortunately, the total regret in the Refinement stage is controlled because this stage lasts for only O(logT)O(\log T) cycles.

However, this cannot be achieved by simply shifting the boundary by γ\gamma. Although shifting the boundary by γ\gamma can prevent excessive manipulation from unqualified agents, it also causes challenges for targeting the localization region in terms of the true features. In this case, all agents with true features within γ\gamma distance below the announced classification hyperplane will strategically manipulate their features onto the classification hyperplane (see Figure 1(a) in our paper). These manipulating agents consist of both agents whose true features are within the localization band and agents whose true features are outside the band, and their reported features all lie on classification hyperplane. So it is difficult to tell whose true features are indeed within the localization band. Our Enhancement algorithm manages to resolve this issue by combining proxy features and pairwise contrastive inference.

We appreciate your insightful questions and suggestions. We will integrate our discussions into the camera-ready version to clarify the points you have raised.

Chen, Y., Liu, Y., & Podimata, C. (2020). Learning strategy-aware linear classifiers. Advances in Neural Information Processing Systems, 33, 15265-15276.

Shen, L., Ho-Nguyen, N., Giang-Tran, K. H., & Kılınç-Karzan, F. (2024). Mistake, manipulation and margin guarantees in online strategic classification. arXiv preprint arXiv:2403.18176.

Ahmadi, S., Beyhaghi, H., Blum, A., & Naggita, K. (2021). The strategic perceptron. In Proceedings of the 22nd ACM Conference on Economics and Computation (pp. 6-25).

Zhang, C., Shen, J., & Awasthi, P. (2020). Efficient active learning of sparse halfspaces with arbitrary bounded noise. Advances in Neural Information Processing Systems, 33, 7184-7197.

评论

Thank your for your detailed response and I remain convinced this is a good paper.

评论

Thank you for your insightful questions and suggestions. We will integrate our discussions into the camera-ready version to clarify the points you have raised. We are particularly grateful for your inspiring question regarding the unknown or heterogeneous maximum manipulation radius γ\gamma, as it has prompted us to explore an exciting new future research direction.

审稿意见
5

The paper studies online binary linear classification when agents strategically manipulate their features to achieve positive prediction at quadratic cost, and the learner only sees labels of positively classified agents. The authors assume that the true feature-label relationship is a halfspace model subject to arbitrary but bounded (Massart) noise. Additionally, the distribution of features is assumed to satisfy regularity conditions for 1-dimensional and 2-dimensional projections, including lower and upper density bounds and a sub-exponential tail bound for the inner product of features with any unit vector. This assumption appears in prior works on learning halfspaces with noise.

The proposed algorithm consists of three phases: initialization, refinement, and batched enhancement, utilizing proxy features and pairwise contrastive inference. A regret bound of O~(T)\tilde O(\sqrt T​) is provided. Numerical experiments support these theoretical findings.

优缺点分析

Strengths

  • The problem addressed, online strategic classification with partial feedback, is interesting.
  • The inclusion of Massart noise makes a great addition, and demonstrates the generality of the algorithm.
  • The way the paper handles the challenges of strategic feature manipulation, partial feedback, and label noise is a central contribution. The specific techniques used, constructing proxy features and using pairwise contrastive inference, are novel aspects of their proposed algorithm. In particular, for gradient estimation without true features, gradient estimates over bands around the boundary are needed. Yet, agents in those bands misreport, so the authors (i) construct proxy feature points from the reports and (ii) use pairwise contrastive inference with two shifted classifiers to unbias these estimates.
  • The paper is well-written.

Weaknesses

  • Usually, in online strategic classification, adversarial settings are considered, whereas this paper makes some distributional assumptions. That said, this setting is also interesting.
  • Only quadratic cost is considered.

Comments

  • The paper mentions [Ahmadi et al., 2023, 2024], but it does not mention a closely related work by Cohen et al. 2024 [1], which is chronologically "in between", nor a more recent, subsequent relevant work by Shao et al. 2025 [2].
  • The paper provides overviews and discussions of the Initialization, Refinement, and Batched Enhancement algorithms. However, I suggest that the authors provide even more discussion about them, particularly given the complexity of the methods.

[1] Lee Cohen, Yishay Mansour, Shay Moran, and Han Shao. Learnability gaps of strategic classification. COLT 2024.

[2] Han Shao, Shuo Xie, Kunhe Yang. Should Decision-Makers Reveal Classifiers in Online Strategic Classification? ICML 2025.

问题

  1. Do you have any intuition for how your algorithm or regret bounds might change under other noise settings?
  2. Do you have an idea on how to extend the partial feedback setting to more complex classes, e.g., finite VC classes (even without noise)?
  3. Are there any guarantees if the budget parameter γ\gamma is unknown or must be estimated online?
  4. Any guarantees on more general cost functions (e.g., linear, or other convex costs)?

局限性

yes

最终评判理由

I have read the rebuttal and still think the paper should be accepted.

格式问题

The title of the paper in openreview [Online Strategic Classification with Noise] is not the same as the pdf [Online Strategic Classification with Noise and Partial Feedback], but this could be easily fixed.

作者回复

We are grateful that you find our work interesting, novel and well-written. We also sincerely thank you for your insightful questions. We provide our feedback as follows.

  • Citations:

Thank you for providing the two related papers. We will add these two papers to our ''Related Literature'' section.

  • More discussions on each algorithm:

We agree that we should add more discussions on the algorithms. Due to space constraints, we provide our explanations for the significance of each sub-algorithm in the answers to Reviewer SQmo, and plan to update our manuscript accordingly.

  • Intuitions for how the algorithm and bound will change under other noises:

Here we provide some thoughts on online strategic classification under adversarial noise (the nature confines the overall error rate of the `best' halfspace to be at most ηˉ\bar\eta, but samples may not necessarily be i.i.d.) and malicious noise (the nature randomly draws an instance from the underlying feature-label distribution with a fixed probability 1ηˉ1-\bar\eta while returning an arbitrary feature-label pair with probability ηˉ\bar\eta). We conjecture we could tackle these noises again based on our proposed idea of localization, proxy feature and contrastive inference, with a slight difference in the design of loss function and the gradient. Their regret bounds are also expected to be similar to ours.

Here is our rough intuition. Shen (2021, 2023) adapts the algorithm in Zhang et al. (2020) to learning halfspaces under these alternative types of noises (in non-strategic and full-feedback setting). The resulting algorithms again use online mirror descents with "localization" and their theoretical guarantees under these alternative noises are similar to that in Zhang et al. (2020) under Massart noise. Moreover, the type of noise does not appear to influence our construction of proxy features and contrastive inference. Therefore, we conjecture that we can adapt our algorithm to these alternative noises while attaining similar theoretical guarantees. We leave this extension for future study.

  • Extending the partial feedback setting to more complex classes:

In our setting, we use the contrastive inference method to tackle the partial feedback setting. By contrasting positively-classified agents' responses under two slightly different classifiers, we can fine-tune our classifier and simultaneously maintain a relatively high accuracy of our declared classifiers. However, this method relies on the clear characterization of agents' best responses and a gradient-based optimization algorithm to update the classifier, both of which depend on the linear structure of classifiers. When adapting to more general settings, it may be hard to characterize the agent's strategic manipulation and directions for updating the classifiers in a closed form. These challenges may prevent us from directly extending our setting to more complex classes.

In addition, as you noted, online strategic classification problems within finite VC classes are typically examined under adversarial settings with full feedback. In particular, their approaches often involve discretizing the hypothesis class into multiple experts and then implementing a weighted-majority voting scheme across these experts (see, e.g., Ahmadi et al., 2024; Cohen et al., 2024). However, this cannot be extended to partial label observation setting, either. Specifically, in the adversarial setting with partial feedback, nature can hurt the principal's learning algorithm by keeping generating agent examples that will be classified as negative. This time the principal will never observe the agents' true label due to partial observation, and thus has no chance to update the classifier.

  • Setting with unknown budget γ\gamma:

In our paper we consider the setting that the manipulation budget (or manipulation radius/maximum manipulation distance) γ\gamma is known, which is a common practice in the literature (Chen et al. (2020), Shen et al. (2024)). However, considering unknown γ\gamma is a quite interesting direction of future study. Ahmadi et al. (2021) proposed a strategic perceptron algorithm with unknown manipulation budget. Their key idea is running a binary search to find a proper estimate on this budget. However, this idea relies heavily on their assumption that the positive and negative classes are strictly separated by a margin without any noise. Their algorithm design requires either knowing the value of margin having an estimation process on it. These requirements cannot be fulfilled in the noisy setting without any margin. Therefore, it is hard to extend their binary search approach to our setting with unknown γ\gamma.

  • Guarantees on more general cost functions:

Our work can be easily adapted to any norm-formed manipulation cost with the regret guarantee unchanged. The way for adaptation is similar to that by Shen et al. (2024). In this setting, given a classifier h~(r)=sgn(w,r+m)\tilde h(r)=\text{sgn}(\langle w,r\rangle+m) and agent manipulation cost Cost(x,r)=2γxr\text{Cost}(x,r)=\frac{2}{\gamma}\|x-r\| for any norm \|\cdot\|, an agent with true feature xx will report r(x,h~)r^\star(x,\tilde h) that satisfies r(x,h~)=xw,x+mwv(w)r ^ \star (x , \tilde h ) = x -\frac { \langle w , x \rangle+m } { \|w\| _ \star } v(w) if γww,x+m<0-\gamma\|w\| _ * \le \langle w , x \rangle + m < 0 ; and otherwise, r(x,h~)=xr ^ \star (x , \tilde h ) = x, where \|\cdot\| _ * denotes the dual norm of \|\cdot\|, and v(w)argmaxv1w,vv(w)\in \arg \max_{\|v\|\le 1}\langle w,v\rangle. From the above expression we know that when extending the 2\ell _ 2 norm cost to general norm cost, the key modification is changing agents' manipulation direction (from ww under 2\ell_2 norm to v(w)/wv(w) / \|w\| _ * under general norm) and converting the distance under the general norm to that under 2\ell_2 norm (multiplying a w\|w\|_* factor on the general norm-based distance). Therefore, when designing proxy features and contrastive inference, we can accordingly change the projection direction and adjust the offset. These modifications do not change our key idea of algorithm design.

  • The title:

Our final title is "Online Strategic Classification with Noise and Partial Feedback". We will finalize this in our camera-ready version.

Thanks for your interesting questions about potential extensions of our work. We will add these as our future work direction to the manuscript.

Shen, J. (2021). On the power of localized perceptron for label-optimal learning of halfspaces with adversarial noise. In International Conference on Machine Learning (pp. 9503-9514). PMLR.

Shen, J. (2023). PAC learning of halfspaces with malicious noise in nearly linear time. In International Conference on Artificial Intelligence and Statistics (pp. 30-46). PMLR.

Zhang, C., Shen, J., & Awasthi, P. (2020). Efficient active learning of sparse halfspaces with arbitrary bounded noise. Advances in Neural Information Processing Systems, 33, 7184-7197.

Ahmadi, S., Yang, K., & Zhang, H. (2024). Strategic littlestone dimension: Improved bounds on online strategic classification. Advances in Neural Information Processing Systems, 37, 101696-101724.

Cohen, L., Mansour, Y., Moran, S., & Shao, H. (2024). Learnability gaps of strategic classification. In The Thirty Seventh Annual Conference on Learning Theory (pp. 1223-1259). PMLR.

Chen, Y., Liu, Y., & Podimata, C. (2020). Learning strategy-aware linear classifiers. Advances in Neural Information Processing Systems, 33, 15265-15276.

Shen, L., Ho-Nguyen, N., Giang-Tran, K. H., & Kılınç-Karzan, F. (2024). Mistake, manipulation and margin guarantees in online strategic classification. arXiv preprint arXiv:2403.18176.

Ahmadi, S., Beyhaghi, H., Blum, A., & Naggita, K. (2021). The strategic perceptron. In Proceedings of the 22nd ACM Conference on Economics and Computation (pp. 6-25).

评论

Thank you for your thorough and careful revisions. You have satisfactorily addressed all of my concerns. I therefore confirm my original recommendation to accept the paper.

评论

We would like to sincerely thank you once again for your helpful comments and for drawing our attention to the related literature. We will make corresponding revisions to our manuscript in the camera-ready version. We also appreciate your valuable advice and inspiring questions, which will be of great help to us in improving the work.

审稿意见
5

This paper explores binary classification in a strategic online setting, where a classifier must immediately label incoming agents. Crucially, it receives noisy feedback only for positively classified agents ("apple tasting"). The feature-label relationship uses a linear half-space model with bounded noise, and the classifier aims to minimize regret against the best fixed omniscient classifier.

Following prior work, the authors first show that for half-spaces, agents' best responses can be characterized, proving that an omniscient classifier yields identical outputs whether features are true or manipulated.

They then propose a three-component algorithm. First, an "exploration" subroutine uses contrasting classifiers to find an initial normal vector close to the optimal one. Second, a "localization" subroutine refines this vector by focusing on informative data near the classifier's boundary, where true features are observable due to no manipulation incentives. Despite new complications from strategic reporting, they solve an online linear optimization problem using techniques akin to learning with label noise, achieving only logarithmic regret in this phase.

The third and most novel component tackles strategic manipulation in this refined localization. As manipulable classifiers lead to high regret from misreporting, they introduce a modified classifier that sets a higher bar for positive classification. To address the unobservability of true features in key regions, they develop proxy features to mimic true ones for gradient construction, and pairwise contrastive inference. The latter uses two successive classifiers to statistically infer information about unobservable true data in critical localized regions, and which in turn enables estimation of gradients.

优缺点分析

This paper tackles a tougher version of online strategic classification, to accommodate partial, noisy feedback. A key and novel strength of the paper is the development of pairwise contrastive inference to address strategic reporting from the agents. In particular, by employing two slightly different classifiers in succession and contrasting the observed data from each, they cleverly infer statistical information about the true, unobservable features in critical regions. Moreover, the integration of proxy features, designed to mimic true features from reported ones, further strengthens their approach by allowing the construction of effective proxy gradients. In general, I appreciated the technical novelty the authors developed in this paper.

Even the “localization” procedure, which has already been developed for the non-strategic counterpart of the problem, is here devised in a clever way. Namely, it handles strategic interaction via an instantiation of Online Mirror Descent and suffers only logarithmic regret.

The exposition is clear and, despite its technical depth, the treatment maintains conceptual clarity at all points of the main body. The authors explicitly explain what the algorithm subroutines are needed for, and where each piece of the analysis comes into play.

Overall, the contributions and presentation are strong, novel and well-conveyed. The authors design a single algorithm that simultaneously handles strategic feature reports and partial, noisy feedback, suffering T\sqrt{T} regret, essentially providing conclusive answers for the analyzed setting.

There is a typo in line 168: inccurs —> incurs.

问题

If I understand correctly, our results hold for a sequence of myopic buyers. I am wondering whether you had some thoughts regarding a repeated game where a single agent aims to maximize the sum of positively classified points (over cycles). I guess that this can be modeled as a Stackelberg game and that it is also a consideration for the model without noise and with two-sided feedback.

局限性

Yes

最终评判理由

Thank you for the response to all of my questions and comments. They explain everything I asked clearly and made me understand even better where the additional challenges in future work lie. I, thus, confirm my score.

格式问题

No formatting issues noticed

作者回复

We are grateful that you find our contributions and presentation ''strong, novel and well-conveyed''. We also sincerely thank you for your thought-provoking question. Our response is as follows.

  • Repeated games with a single and non-myopic agent:

Our work indeed focuses on a sequence of myopic agents, who always follow the best response to the declared classifier. Specifically, agents with true features within γ\gamma distance below the classification hyperplane will project their true features onto the hyperplane as their reported feature, while other agents will report truthfully (Lemma 1). Our algorithm heavily leverages the structure of this best response, both in the proxy feature construction and pairwise contrastive inference. The learning problem becomes much more difficult when the principal repeatedly interacts with the same agent. This is because the agent may have complicated forward-looking behaviors that drastically deviate from the myopic best response described in our Lemma 1. In particular, the agent may report features that do not appear immediately optimal to mislead the principal's learning algorithm, thereby securing future advantages. Under the noisy setting, the agents' feature manipulation direction may be even opposite to the myopic optimal direction, slowing down the learning of the optimal classifier. While one may model the agent's behaviors by a dynamic programming (DP) model, analyzing the DP model and developing effective learning algorithms appear much more challenging. This is a totally different but very interesting problem that warrants future study.

  • We also thank you for pointing out our typo. We have corrected it in our paper.
评论

Thank you for the response to all of my questions and comments. They explain everything I asked clearly and made me understand even better where the additional challenges in future work lie. I, thus, confirm my score.

评论

Thank you very much for the inspiring question on extending the model to a setting involving a single and non-myopic agent. This is truly an interesting problem for future research. In our revision, we will also correct the typo you pointed out. We greatly appreciate your time and valuable feedback. Thank you again.

审稿意见
5

They study online strategic classification(SC) with massart noise and a partial feedback model (only receive feedback when the example is labeled as positive). In online strategic classification, in each round the principal deploys a classifier and then an agent in round tt arrives. The agent with feature set xx best-responds to the deployed classifier in order to maximize their utility, that is the classification that they receive in their manipulated state xx' minus the cost of manipulation, i.e., movement from xx to xx'. Usually in SC, the true label is revealed after the principal announces their classification based on the deployed classifier; however, here they assume partial feedback, that is the true label is only revealed if the agent is classified as positive. For example, if a student is admitted in the college, after that the principal would see if her decision was good or not (the student was actually a positive example (good student) or not.)

Massart noise is when the actual label of the points are flipped with some feature-dependent probability η(x)\eta(x), where all values η(x)\eta(x) are bounded (at most η1/2\eta\leq 1/2). Some of the previous work in SC assumes the clean examples( before manipulation) are linearly separable, the Massart noise relaxes this assumption and considers the scenario where most of the points are linearly separable.

They show sublinear regret in the setting of online SC+Massart Noise+partial feedback. Their techniques are mostly a generalization of Zheng et al.2020 to a strategic setting. There are three steps in their algorithm: initialization, refinement, and batch contrastive inference step. In each step, they find a linear classifier ww that gets closer to the opt classifier ww^*. (lines 206-217 do a great job of summarizing these steps.)

I think the main difference with Zheng et al is in the last step of batched enhancement, where they solve a sequence of adaptively constructed online linear optimization problems to refine the reported classifier ww. Here, they need to estimate the gradients gk,ig_{k,i} (you need to define what gk,ig_{k,i} is, I didn't find its definition.) However, these gradients depend on the true features, but all agents in the localized regions D_{k,i} misreport their features. So, here they need to use proxy features and estimate the true location of the agents by projecting the reported feature set to the upper or lower boundary of Dk,iD_{k,i}. They would argue that using these proxy features, the gradients will be well-constructed and hence they would converge to the optimal ww^* after bounded number of iterations.[This proxy feature construction is a trick also used in earlier work in online SC.]

优缺点分析

I appreciate the write-up and technical details of the paper. In my opinion, this is mostly a combination of the techniques from Zheng et al 2020 on learning half-spaces under label noise and the techniques from Ahmadi et al. on online learning with strategic input, where they use proxy features to update their linear classifiers. They employ a similar idea to adapt the algorithm by Zheng et al. to a SC setting. I have explained this in detail in the summary. Please correct me if I am wrong.

问题

Line 168, why do you need the multiplicative factor of 2?

I don’t understand lines 218-220. What is the relationship between batches, indices and t?

Trying to understand lines 232-245, in the first paragraph, it says that you are considering the band above the classifier where the points have no incentive to move. Then in the second paragraph, it says there are challenges due to the strategic nature of the points. Could you please clarify? It seems a bit contradictory.

In line 6 in Alg 2, perhaps you can add an explanation on how you construct w_0?

局限性

yes, no potential negative societal impact

最终评判理由

The rebuttal has addressed some of my concerns and I decided to raise my score from 4 to 5.

格式问题

no formatting concerns

作者回复

We appreciate that you like our write-up and technical details. We provide our feedback as follows.

  • Comparison with Zhang et al. (2020) and Ahmadi et al. (2021):

While both Zhang et al. (2020) and Ahmadi et al. (2021) inspire separate components of our algorithm, we hope to clarify that our work considers a fundamentally different setting. In particular, naively extending the algorithms in those papers cannot resolve the challenges unique to our setting.

Zhang et al. (2020) propose an algorithm to learn halfspace under Massart noise with full feedback, without strategic manipulation. According to Zhang et al. (2020), under a noisy setting, data points near the classification hyperplane are the most informative for learning the classifier. Therefore, they utilize a key technique, known as “localization”, that uses only data within a small band near the classification hyperplane. However, our problem involves agents' strategic manipulation. If we ignore the strategic manipulation from agents and naively adopt Zhang et al. (2020)'s method to announce classification hyperplanes, then agents whose true feature resides near the announced classifier, qualified or not, will strategically misreport their feature to be positively classified. In particular, the unqualified agents who get positively classified through strategic manipulation lead to high regret.

Ahmadi et al. (2021), on the other hand, investigate the problem of linear classification under agents' strategic manipulations with full feedback. Compared with their setting, ours involves Massart noise and partial feedback. If we directly extend Ahmadi et al. (2021)’s method to address agents' strategic manipulations by elevating the classification hyperplane parallelly by γ\gamma, then all agents with true features within γ\gamma distance below the announced classification hyperplane will strategically manipulate their features onto the classification hyperplane (see Figure 1(a) in our paper). These manipulating agents consist of both agents whose true features are within the localization band and agents whose true features are outside the band, and their reported features all lie on classification hyperplane. So it is difficult to tell whose true features are indeed within the localization band. This poses significant challenges to applying the localization technique.

In short, the joint consideration of label noise, agents' strategic manipulation behavior, and partial feedback introduces challenges that are not present in the existing literature. Our key contribution consists of a novel contrastive inference method with proxy features to address these non-trivial challenges. By declaring two parallel but slightly different classification hyperplanes and comparing agent responses under these two classifiers, we can simultaneously make inferences about true features within the localization band and largely discourage the unqualified agents from feature manipulation. This idea is unique and novel and has not been considered by the previous works.

  • The definition of the gradient:

We will follow your suggestion and add an explicit definition of the gradient gk,ig_{k,i}. It is derived from the gradient of Leaky-ReLU loss function, which is defined as  LeakyRelu ηˉ(yw,x)\text{ LeakyRelu } _ { \bar\eta }(-y\langle{w},{x}\rangle), where LeakyReluηˉ(z)=(1η)zI(z0)+ηzI(z<0)\text{LeakyRelu} _ {\bar\eta}(z)=(1-\eta)z\mathbb{I}({z\ge 0})+\eta z\mathbb{I}({z<0}). Specifically, gk,i=[ηˉxk,iI(yk,i=1)+(1ηˉ)xk,iI(yk,i=1)]I(xk,iDk,i)\mathbf{g} _ {k, i}=[-\bar{\eta} \mathbf{x} _ {k, i} \mathbb{I}\left(y _ {k, i}=1\right)+(1-\bar{\eta}) \mathbf{x} _ {k, i} \mathbb{I}\left(y _ {k, i}=-1\right)] \mathbb{I}\left(\mathbf{x} _ {k, i} \in D _ {k, i}\right) as we defined in Lines 270-271. In our previous submission, we only briefly mentioned the loss function in a footnote, without giving its formal definition. This is because it did not appear in our subsequent theoretical analysis. Instead, all of our analyses are based on the linear formulation in terms of the gradients. Take the refinement stage as an example. The theoretical analyses thereof involve upper bounding i=1Tkw0,i,g0,ii=1Tkw,g0,i\sum_{i=1}^{T_k}\langle w_{0, i},g_{0,i}\rangle - \sum_{i=1}^{T_k}\langle w^\star,g_{0,i}\rangle, lower bounding i=1Tkw0,i,g0,i\sum_{i=1}^{T_k}\langle w_{0, i},g_{0,i}\rangle and relating w,g0,i \langle w^\star,-g_{0,i}\rangle to the angle between w0,iw_{0, i} and ww^\star. These only need to leverage the form of g0,ig_{0, i}, without regard to its connection to the Leaky-ReLU loss.

  • The reason for requiring the multiplication factor of 2:

Consider a general manipulation cost cxr2c\|x-r\|_2 for a generic constant cc, where xx and rr are true and reported features, respectively. Note that the payoff that an agent can gain from the classification outcome is either +1+1 or 1-1, so the manipulation distance xr2\|x-r\|_2 is at most 2/c2/c. For notation simplicity, we set γ=2/c\gamma = 2/c as the largest manipulation distance, leading to c=2/γc = 2/\gamma. This is the source of the multiplication factor 22.

  • The relationship between batches, indices and the global time tt:

We partition the horizon of TT cycles into consecutive batches indexed by k{init,0,1,2,,K}k \in \{\text{init}, 0,1,2,\cdots,K\}. Batch kk has TkT_k iterations indexed by i{1,2,,Tk}i\in \{1,2,\cdots,T_k\}, where each iteration ii contains one or two cycles and performs only one gradient estimate followed by a single parameter update. The superscript j=1 or 2j= 1 \text{ or } 2 distinguishes the two cycles inside the same iteration during Initialization and Enhancement. During Refinement, each iteration issues only one cycle. To map any (k,i,j)(k,i,j) to the global time-step tt, we use the closed-form formula t=2(i1)+jt = 2(i-1)+j , if k=initk=\text{init}; t=2Tinit+It = 2T _ {\text{init}}+I, if k=0k=0; t=2Tinit+T0+2k=1k1Tk+2(i1)+jt = 2T _ {\text{init}}+T _ 0+2\sum _ {k'=1}^{k-1}T _ {k'}+2(i-1)+j, if k{1,2,,K}k \in \{1,2,\cdots, K\}.

With this mapping, every variable indexed by (k,i,j)(k, i, j) can be immediately translated to its corresponding global time tt.

  • Explanation for lines 232–245:

These two sentences are indeed a bit confusing. Let us clarify here. In our Refinement algorithm, we set the classification hyperplane as if there is no feature manipulation (h~0,i(r)=sgn(w0,i,r)\tilde h_{0,i}(r)=\text{sgn}(\langle w_{0,i},r\rangle)). Then, agents with true features above the hyperplane are already classified as positive, so they have no incentive to manipulate their features (x0,i=r0,ix_{0,i}=r_{0,i}). In other words, we can directly observe the true features of agents within the localization region ({x 0<w0,i,x0,ib0}\{x|\ 0<\langle w_{0,i}, x_{0,i}\rangle\le b_0\}). This is the “band with no manipulation” mentioned in the first paragraph. In contrast, agents with true features within γ\gamma distance below the hyperplane ({xγw0,i,x0,i<0}\{x| -\gamma\le\langle w_{0,i},x_{0,i} \rangle < 0 \}) do have an incentive to project their features onto the hyperplane and report the projections. These are the “strategic” agents that cause high instantaneous regret during the Refinement phase. Indeed, with this, we can still run the localization approach as usual following Zhang et al. (2020) and refine the coefficient estimate, while bearing the cost of misclassifying unqualified agents. In the camera-ready version, we will remove the sentence "Nonetheless, strategic feature manipulation and partial label ... challenges to the localization approach" in the second paragraph to avoid confusion. Furthermore, to control the regret, we limit the length of Refinement phase to only O~(lnT)\tilde O(\ln T) cycles. For the rest of the decision horizon, we have to balance the need to update classifiers using data with true features within the localization region and the need to prevent high regrets due to unqualified agents' strategic manipulation. This is achieved by our novel Enhancement algorithm based on proxy features and pairwise contrastive inference.

  • Explanation for the construction of wˉ0\bar w_0:

We apologize that there is actually a typo in Line 6, Algorithm. The correct expression should be wˉ0=1Tiniti=1Tinitj=12rinit,i(j)yinit,i(j)I((1)(j1)winit,irinit,i(j)>0) .\bar w_0=\frac{1}{T_{\text{init}}}\sum_{i=1}^{T_{\text{init}}}\sum_{j=1}^{2}r_{\text{init},i}^{(j)}y_{\text{init},i}^{(j)}\mathbb{I}\left((-1)^{(j-1)}\langle{w_{\text{init},i}}{r_{\text{init},i}^{(j)}}\rangle>0\right) \ . Let us explain the intuition for this construction. In the non-strategic, noiseless and full feedback case, yw,x=w,yx>0y\langle{w^\star},{x}\rangle = \langle{w^\star},y{x}\rangle>0 for all (x,y)(x,y), so yxyx always forms an acute angle with the optimal normal vector ww^\star. Analogously, under the noisy feedback setting, we should have w,E[yx]>0\langle w^\star, \mathbb{E}[yx]\rangle > 0 so E[yx]\mathbb{E}[yx] forms an acute angle with ww^\star. Now, consider our strategic and partial-feedback setting. When we declare a classifier h~(r)=sgn(w,r)\tilde h (r)=\text{sgn}(\langle w,r\rangle), agents whose true features are originally above the hyperplane (w,x>0\langle w,x\rangle>0) have no incentive to manipulate their feature and will report truthfully (r=xr=x), then yryr is exactly yxyx. Therefore, rinit,i(1)yinit,i(1)I(winit,irinit,i(1)>0)+rinit,i(2)yinit,i(2)I(winit,irinit,i(2)>0)=xinit,i(1)yinit,i(1)I(winit,ixinit,i(1)>0)+xinit,i(2)yinit,i(2)I(winit,ixinit,i(2)>0)r_{\text{init},i}^{(1)}y_{\text{init},i}^{(1)}\mathbb{I}\left(\langle{w_{\text{init},i}}{r_{\text{init},i}^{(1)}}\rangle>0\right)+r_{\text{init},i}^{(2)}y_{\text{init},i}^{(2)}\mathbb{I}\left(\langle{-w_{\text{init},i}}{r_{\text{init},i}^{(2)}}\rangle>0\right)=x_{\text{init},i}^{(1)}y_{\text{init},i}^{(1)}\mathbb{I}\left(\langle{w_{\text{init},i}}{x_{\text{init},i}^{(1)}}\rangle>0\right)+x_{\text{init},i}^{(2)}y_{\text{init},i}^{(2)}\mathbb{I}\left(\langle{-w_{\text{init},i}}{x_{\text{init},i}^{(2)}}\rangle>0\right) provides information for E[yx]\mathbb{E}[yx]. By averaging them over the TinitT_{\text{init}} cycles, we get the estimator wˉ0\bar{w} _ 0 that well approximates E[xy]\mathbb{E}[xy]. So we expect that wˉ0\bar{w} _ 0 forms an acute angle with ww^\star with high probability if TinitT_{\text{init}} is large enough.

Thank you again for your comments. We will incorporate our discussions into the camera-ready version to clarify the issues you raise.

评论

Thank you for the rebuttal. I understand your point about how your work differs from earlier work in the literature. I'll consider raising my score during the discussion period.

Also, thank you for responding to my other questions.

评论

Thank you for your insightful comments. We enjoyed thinking about your comments during the rebuttal process. In the revision, we will incorporate our discussions into the camera-ready version to clarify the issues you raise. We are truly grateful for your consideration of raising the score during the discussion period.

最终决定

This paper makes a strong and timely contribution by studying online strategic classification with Massart noise and partial feedback, a practically motivated and technically challenging setting, with impact on both learning theory and fairness communities. Reviewers highlight that the authors successfully combine techniques from strategic classification and noisy online learning while introducing new tools such as pairwise contrastive inference and proxy feature gradients. The work delivers the first sublinear regret guarantees in this setting, which several reviewers view as a conclusive result. The writing could be improved in places, particularly in explaining intuition behind the algorithmic steps, but overall the paper is technically solid, well-motivated, and advances the state of the art.