6.0

/10

Poster4 位审稿人

最低6最高6标准差0.0

3.0

置信度

ICLR 2024

The Human-AI Substitution game: active learning from a strategic labeler

Tom Yan,Chicheng Zhang

OpenReview PDF

提交: 2023-09-23更新: 2024-04-15

摘要

关键词

active learningstrategic learning

评审与讨论

审稿意见

评分: 6置信度: 32023-10-25

This paper discuss the game between labeler and learning system, where the labeler's objective is to maximize profit, by following some strategy to choose abstaining labels and thus the learning is slowing down; meanwhile, the learning system's objective is thus finish learning as soon as possible, which puts higher requirements on the active query strategy, that is should be robust to abstaining. The paper theoretically analyzes the above observations and proves that absteining labels can indeed destroy the sample-efficient active learning system, base on which the authors further propose a near-optimal algorithm that is robust to absteining.

优点

The problem this paper concerns is clearly defined, and all formalizations are clean.
The paper is well-organized with clear writing.
Modeling two parts of machine learning with respectively conflicting objectives by game and then it can be analyzed with some game-related techniques is novel.

缺点

The difference between an adversarial labeler and an abstaining labeler is not discussed.The difference between the concerning problem from active learning within an adversarial environment is not clear. Meanwhile, though it is mentioned that the labeler's strategy is identifiable, it is still unclear if it is the worst case.
The motivation of the problem is too detailed, such that it can hardly be extended to other applications. It is recommended that real-world experiments should be conducted to validate the problem or it seems too artificial.
The theoretical results are does not belong too much surprising information and has limited contribution to the algorithm design, since the algorithm design is mainly based on the labeler's strategy that is already pre-defined.

问题

See weaknesses above.

评论- Reply to Reviewer k54k

2023-11-15

Thank you for your review, Reviewer k54k! We are confident that we can address your technical concerns, and summarize our responses below:

The difference between an adversarial labeler and an abstaining labeler is not discussed.The difference between the concerning problem from active learning within an adversarial environment is not clear. Meanwhile, though it is mentioned that the labeler's strategy is identifiable, it is still unclear if it is the worst case.

To clarify, in the learning game, we can deduce that the labeler will not be adversarial. Note that this is not by assumption, but rather by observation on which labeling strategies can be minimax optimal in the learning game.

To see this, if the labeler is adversarial, then this will lead to a payoff of zero. This is because adversarially generated data will not realize the learning outcome, and thus the learner will not pay the labeler (L67-70). Hence, while adversarial labeling is a valid strategy, it will not be minimax optimal in our learning game and will not be adopted by a labeler as an optimal labeling strategy.

Finally, to your last point, the labeler’s minimax strategy will always be identifiable. Only these strategies ensure the learner’s learning outcome is met, and thus lead to a positive payoff in the learning game.

The motivation of the problem is too detailed, such that it can hardly be extended to other applications. It is recommended that real-world experiments should be conducted to validate the problem or it seems too artificial.

We believe that our setup is fairly general, involving only a simple twist of the standard active learning setting (L44). It serves to address an inherent tension in any supervised learning setup: if the labeler allows the learner to learn fast (e.g. through active learning), this in turn reduces the final profit of the labeler.

Moreover, we believe our work is timely and well-motivated. Just in the recent months, we have seen (in the real-world) Hollywood screenwriters go on strike to negotiate a deal that prevents companies from training on their data and replacing them with AI [1]. Thus, we believe it is important to build our understanding of the replacement aspect of learning, which we set out to do in our paper.

[1] WGA Negotiations — Status as of May 1, 2023

The theoretical results are does not belong too much surprising information and has limited contribution to the algorithm design, since the algorithm design is mainly based on the labeler's strategy that is already pre-defined.

We believe our learning algorithm is novel, since we prove that it is near-minimax optimal in the learning game. Contrary to your point, we do not pre-define the labeler’s strategy. Our learning algorithm is proven to be near-optimal with respect to any labeling strategy in the learning game.

We hope this helped clarify most of the questions. We found your feedback very useful in revising the manuscript, thank you! Please let us know if there are any further questions.

评论- Reply to Reviewer k54k

2023-11-20

Hi Reviewer k54k, with the discussion period nearing the end, we hope you could please consider getting back to us soon regarding our response above and update. In particular, we have added some toy experiments in Appendix A on randomly sampled toy instances similar to (bigger versions) of the tabular setting as in Table 1.

Please let us know what you think. We would very much appreciate a chance to compose an update to address concerns if there are any, before the end of the discussion period. Thank you again for taking the time!

审稿意见

评分: 6置信度: 32023-10-30

The authors study an active learning problem where the labeler can choose to give the correct label or abstain from labeling (refuse to label) for each query, in order to prolong the learner's learning process, hence get more reward. This is motivated by real-world scenarios where AI training data come from human workers who will eventually be replaced by AI once the AI is fully trained. The authors formalize this interaction as a game, propose query complexity notations to measure how long the learning process can be prolonged, and provide an active learning algorithm with a small query complexity. Extensions to approximate learning, noisy labeling, arbitrary labeling, multiple tasks, are discussed.

优点

(1) The problem is well-motivated. Although the "teacher wants to prolong learning process" phenomenon is not new in other literature, it is an important, new problem for the active learning community. This paper can potentially lead to many follow-up results.

(2) The various definitions of the query complexity are interesting. Theoretical results are sound.

缺点

W1: My major concern is that this paper has two really strong, restrictive assumptions. To illustrate this, let me describe the following scenario: Suppose there are two hypotheses $h_1$ and $h_2$ that give the same labels on all but one example $x^*$ (e.g., the hypotheses $h_1$ and $h_2$ in Table 1). Consider the case where the labeler correctly labels all examples except for $x^*$ , and abstains from labeling $x^*$ . In this case, the learner cannot tell whether the true hypothesis is $h_1$ or $h_2$ . I can imagine that this scenario can easily happen in a real-world labeler-learner game where the labeler wants to prolong the learning process as long as possible. This is a key tension in the "human-AI substitution" game that motivates this paper. However, the authors simply disallow this scenario to happen by making two strong assumptions in their model:

(1) The learner is not allowed to query an example multiple times. If the learner can query an example multiple times, then the labeler can first abstaining from labeling for a while and then eventually give a correct label. It is very natural in practice for a learner to query an example multiple times until he gets a label. But this is not allowed in the authors' model.

(2) Under the assumption that the learner can query each example only once, the authors further assume "Guaranteeing Learning Outcome" (Section 1.1): the labeler must label in a way to guarantee that the learner can identify the hypothesis $h^*$ in the end. However, this requires the labeler to know the hypothesis space $\mathcal H$ of the learner very well so that he will not abstain from labeling critical examples (e.g., the $x^*$ above). This is a strong assumption -- how can an average human worker knows the hypothesis space of the sophisticated machine learner?

I might change my opinions if the authors can provide some strong motivations or real-world scenarios to support these two assumptions.

W2: the presentation of this paper is not very good. See my Questions and Suggestions below.

问题

Questions:

(Q1) What is the purpose of Section 3.1 (in particular Algorithm 3 and Proposition 3.7) ? My rough understanding is that, Section 3.1 wants to show that finding the maximal E-VS bisection point $x$ (line 3) in Algorithm 2 can be done efficiently by Algorithm 3 (correct me if I am wrong). If this is the case, then what is the time-complexity of Algorithm 3? How does Proposition 3.7 shows that finding the maximal E-VS bisection point can be done efficiently?

(Q2) What is the purpose of Section 3.9? The authors say that this section is to compare with EPI-CAL, but only give a formal result for EPI-CAL ( $\Omega(\sqrt{m})$ samples in Proposition 3.9) and didn't give any formal result for the algorithm they proposed for comparison.

(Q3) How is this paper's model and result related to the "well documented" phenomenon that "teachers (masters) strategically slow down the training of their apprentices (Garicano & Rayo, 2017)" ?

Suggestions:

(S1) The presentation of Section 5.1 and 5.2 is really confusing.

Section 5.1 is about upper bound on the multi-task query complexity. Theorem 5.3 gives an upper bound under a regularity assumption and for the $c_{all}$ label cost. Proposition 5.1 is a counterexample where the upper bound doesn't hold without the regularity assumption, but the authors didn't say which label cost this Proposition is for. Proposition 5.2 is an example where the upper bound in Theorem 5.3 doesn't hold for the $c_{one}$ label function.

Section 5.2 is about lower bound (for the $c_{one}$ cost). Theorem 5.6 gives a lower bound on the multi-task query complexity under two conditions for the $c_{one}$ cost, while Proposition 5.4 and 5.5 are counterexamples where the lower bound does not hold if the two conditions are not met.

I would suggest presenting the theorem first, and then give counterexamples. And can the authors give a lower bound result for $c_{all}$ ?

(S2) $c(T(x))$ in Definition 3.1 is undefined. (It is mentioned later in the paper but not here.)

评论- Reply to Reviewer GuAP (1/2)

2023-11-15

Thank you for your detailed review, Reviewer GuAP! We are confident that we can address your technical concerns, and summarize our responses below:

(1) The learner is not allowed to query an example multiple times. If the learner can query an example multiple times, then the labeler can first abstaining from labeling for a while and then eventually give a correct label. It is very natural in practice for a learner to query an example multiple times until he gets a label.

Deadlock: We have considered the setting you propose, as a possible formulation. However, one issue is that some AL algorithms would deadlock. To see this, at time $t$ , an algorithm would query the optimal point $x_t$ . If the labeler declines to label and $x_t$ is not removed from the data pool, the algorithm would still query $x_t$ at $t+1$ . This is because the state has not changed, $x_t$ would still be the best point to query, and eligible to be queried since it’s in the data pool. Thus, for the setting you suggest, there would have to be an apt way to modify AL algorithms and ensure the process does not hang. By contrast, in our setting, a data point is removed from the pool at each time step, thus ensuring progress.

Labeler controlled queries: If there is a way around the deadlock issue, we also thought that this setup may grant the labeler a lot of power. If the labeler can hold off on labeling a point indefinitely, the labeler can dictate the order the learner receives the labels. This is achieved by refusing to label any point but one (say $x’_t$ at time $t$ ). Thus, the learner’s choice in queries would not matter as labels can only be received in the order $x’_1$ , …, $x’_n$ , as decided by the labeler. We thought that this setup may differ substantially from the standard AL setting, and our goal is to study the problem closest to the canonical AL setup (L43-44).

Guaranteeing Learning Outcome" (Section 1.1): the labeler must label in a way to guarantee that the learner can identify the hypothesis in the end. However, this requires the labeler to know the hypothesis space of the learner very well so that he will not abstain from labeling critical examples (e.g., the above). This is a strong assumption -- how can an average human worker knows the hypothesis space of the sophisticated machine learner?...I might change my opinions if the authors can provide some strong motivations or real-world scenarios to support these two assumptions.

Analyzing minimax strategies: For most of the paper, in the learning game, we think of the labeler as a “[resourceful] data labeling company” (L199). This is because in game theory, one needs to consider players who play optimally in order to study the Nash Equilibria/minimax query-complexity.

Dealing with myopic labelers: With that said, in subsection 4.1 and 4.3, we offer some remedies when facing a player that behaves sub-optimally, for instance due to lack of knowledge about $\mathcal{H}$ as you suggest (assume also the learner does not communicate $\mathcal{H}$ for some reason):

If the learner knows that the labeler may behave myopically, one idea is to loosen the “learning outcome” to approximate identifiability. Thus, the labeler abstaining on one critical example is not fatal ( $x^*$ in your example). In subsection 4.1, we show how to extend our learning algorithms to the PAC learning setting.
In 4.3, we observe that the E-VS can be used to detect when an abstained point leads to non-identifiability. Thus, the learner can use this to send a certified “warning” to the labeler: if this critical point (e.g. $x^*$ ) is abstained upon, it will provably lead to non-identifiability. In this way, the learner can use the E-VS representation to prevent an unaware labeler from prematurely halting the learning process. Indeed, non-identifiability is something neither party wants. The learner wants to learn. The labeler needs to realize the learning outcome, in order to be paid.

Worker Unions: In the specific case of AI automation, we have seen in recent news that Hollywood screenwriters have collectively gone on strike, engaging in negotiations with companies through their union (“Writers Guild of America'') [1]. Thus, in the AI automation setting, we are inclined to think that the labeler may in fact correspond to a union of workers, which may be more resourceful than a single worker.

[1] WGA Negotiations — Status as of May 1, 2023

Finally, as we write in the introduction (paragraph at L29), this paper studies a general conflict in incentives that can arise in any learner-labeler relationship. Active learning enables the learner to learn fast, but in doing so, leads to lower labeler profit. This calls into question a fundamental assumption in active learning: what is to say that the labeler wants the learner to learn fast? We note that the setting of the labeler as an “average human worker” is only one instantiation of this setup (L34).

评论- Reply to Reviewer GuAP (2/2)

2023-11-15

What is the purpose of Section 3.1 (in particular Algorithm 3 and Proposition 3.7) ? … If this is the case, then what is the time-complexity of Algorithm 3? How does Proposition 3.7 shows that finding the maximal E-VS bisection point can be done efficiently?

The purpose of section 3.1 is to develop a way to access the E-VS. We start by noting that the canonical approach of sampling to access the VS (L153-4) will not work for E-VS (L158-9). We prove that in the linear case, it would be akin to sampling from an exponential number of polytopes (Prop 3.5). This motivates finding a new way to access the E-VS, and led us to develop Algorithm 3.
As you may know, the (E-)VS can contain an exponential number of hypotheses. Thus, prior AL works typically make use of oracles to access the VS. Just as in the VS-bisection algorithm, we assume access to a sampling oracle $O$ . Now, to overcome the issue in point 1, we observe that the E-VS is a subset of the VS. Hence, it suffices to implement a membership oracle for whether $h \in V$ belongs to E-VS. We prove in Prop 3.7 that this membership check can be implemented by one carefully setup call to the C-ERM oracle.
Lastly, indeed less is known about computational efficiency in AL algorithms, even for the original VS bisection algorithm. This is because this differs depending on the instance. Instead, the main focus in AL has been on statistical efficiency, which is also our focus in the paper. We agree with you that exploring the computational efficiency of AL algorithms in general is an important future direction.

(Q2) What is the purpose of Section 3.9? The authors say that this section is to compare with EPI-CAL, but only give a formal result for EPI-CAL ( samples in Proposition 3.9) and didn't give any formal result for the algorithm they proposed for comparison.

Thanks for this note, we will be sure to clarify! As we write, our setup and that of EPI-CAL (section 3.3) are different. Thus, in this subsection, we were mainly interested in understanding whether strategic abstention can also significantly enlarge query complexity for EPI-CAL. We answer this in the affirmative. We will be sure to make this clear in the revision, perhaps moving this subsection to the appendix.

(Q3) How is this paper's model and result related to the "well documented" phenomenon that "teachers (masters) strategically slow down the training of their apprentices (Garicano & Rayo, 2017)" ?

In (human) teacher to (human) student relationships, economists have observed that teachers have strategically slowed down learning for their own gain (L57-58). Thus, we think it is conceivable that this may happen analogously in (human) teacher to (machine) student relationships, corresponding to and motivating the interaction we study in our paper.

For more details, please see the Additional Related Works section of the Appendix (L1508-1528); we did not have enough space to further comment on this in the main.

…Proposition 5.1 is a counterexample where the upper bound doesn't hold without the regularity assumption, but the authors didn't say which label cost this Proposition is for…

Thank you for your helpful suggestion on the reorganization! We will modify the section accordingly. Just to answer your question, Prop 5.1 in fact holds under both costs.

We hope this helped clarify most of the questions. We found your feedback very useful in revising the manuscript, thank you! Please let us know if there are any further questions.

评论- Happy with authors' response and raise score 5 -> 6

2023-11-18

I am happy with the authors' response and raised score from 5 to 6.

I would appreciate it if the authors can add one or two sentences to motivate the assumption that the learner does not query a sample multiple times.

评论- Reply to Reviewer GuAP

2023-11-20

Thank you for getting back to us, Reviewer GuAP! And yes, we will make sure to include our discussion on the setting you suggested.

审稿意见

评分: 6置信度: 32023-10-30

This paper provides a set of theoretical evidence that labelers can slow down learning, in an active learning setting. The labelers is compensated by the number of labels that provided, and the learner wants to minimize the cost of learning. The paper proposes a maximal bisection algorithm for this setting and proves its sample complexity is optimal up to logarithmic factors. The paper further studies the noisy labels and multi-task learning, advancing its assumptions realisticity and applicability.

优点

The paper is very well-motivated; the problem is of increasing interest for the society by more and more ML applications in the wild.
Quality: the paper flows smoothly by starting with the minimal setting of single task learning without noise and generalizes to wider settings. It proves lower bounds for the sample complexity of the problem, and proposes algorithms matching that lower bound up to logarithmic factors.
Significance: this paper is introducing a new interesting setting into active learning literature and proposes near-optimal algorithms for it.

缺点

Clarity: the notation is convoluted; $V$ is used for both VS and a hypothesis space variable. For instance, in table 1, what is $V$ ? it seems introducing a specific notation for VS (e.g., $\mathbf{V}$ ) or for the hypothesis space variable (e.g., $\mathcal{V}$ ) could help.
There is no experimental reported. It could help clarify many abstract concepts introduced in the paper using even synthetic data. Examples like Table 1 could help practitioners employ the algorithms introduced in the paper in their applications.

问题

Suggestions: Please consider

moving table 1 after Definition 2.3 since it uses that definition.
separating the notations of $V$ (see 1. in the weaknesses section)

评论- Reply to Reviewer ciAE

2023-11-15

Thank you for your review and your helpful suggestions, Reviewer ciAE! Please see our responses to your questions below:

Clarity: the notation is convoluted; V is used for both VS and a hypothesis space variable. For instance, in table 1, what is V? it seems introducing a specific notation for VS (e.g., V) or for the hypothesis space variable (e.g., V) could help.

Thank you for this formatting suggestion! We will be sure to clarify the difference and adapt the presentation accordingly. To clarify, we meant to write $H[S]$ for the version space in Table 1.

We hope this helped clarify most of the questions. We found your feedback very useful in revising the manuscript, thank you! Please let us know if there are any further questions.

评论- Reply to Reviewer ciAE

2023-11-20

Hi Reviewer ciAE, with the discussion period nearing the end, we hope you could please consider getting back to us soon regarding our response above and update. In particular, we have added some toy experiments in Appendix A on randomly sampled toy instances similar to (bigger versions) of the tabular setting as in Table 1 (following your suggestion in point 2).

审稿意见

评分: 6置信度: 32023-11-05

This paper consides strategic labelers for active learning task who may abstain to prolong the complexity of quering, such that they may be compensated with more labeling. The authors show that abstention could indeed enlarge query complexity, and present a novel minimax game as well as query complexity measure. A near-optimal algorithm based on E-VS bisection is designed to defend strategic labeling, and extensions to other active learning settings such as approximate learning, noisy labeling, multi-task learning are also discussed.

优点

The strengths of this paper are its novelty of strategic learners for active learning tasks, as well as the theoretical soundness in the minmax formation and the effective version space bisection algorithm. This paper initiates the study of learning from a strategic labeler, which is original. The complexity of proposed E-VS method $O(\log{|\mathcal{X}|})$ clearly significantly improves existing VS algorithm of $\Omega(\{|\mathcal{X}|})$ .

缺点

I think the biggest concern that might weaken this paper is its lack of empirical evidence. Despite the theoretical gaurantees, some of the settings are probabilistic (PAC or noisy observations for instance), making empirical evaluation of the proposed methods valuable. I understand there are so many great results in this paper and there is a page limit, but including some simple simulation studies could be beneficial.

问题

I only have a minor question. This paper assumes "the learner’s strategy corresponds to some deterministic, querying algorithm". Nevertheless, non-deterministic policies such as Thompson sampling is quite common in some active learning tasks such as Bayesian optimization. How does these non-deterministic policies fit into the framework of this paper?

伦理问题详情

N/A

评论- Reply to Reviewer rYHN

2023-11-15

Thank you for your review, Reviewer rYHN! Please see our responses to your questions below:

I only have a minor question. This paper assumes "the learner’s strategy corresponds to some deterministic, querying algorithm". Nevertheless, non-deterministic policies such as Thompson sampling is quite common in some active learning tasks such as Bayesian optimization. How does these non-deterministic policies fit into the framework of this paper?

To clarify, Bayesian optimization/Thompson sampling is used for finding a maxima $x \in X$ of some unknown $h$ . There, the goal is to adaptively query in order to shrink the uncertainty over $h$ , to find one of its maxima with low (simple) regret.

Our setting is quite different in that we want to learn the unknown hypothesis $h$ , instead of optimizing $h$ . Moreover, our paper studies minimax learning strategies in the learning game, in face of a strategic labeler who may be playing an optimal labeling strategy. This game theoretic setup is another major difference that complicates the analysis and differs from the setup of Bayesian Optimization.

We hope this helped clarify most of the questions. We found your feedback very useful in revising the manuscript, thank you! Please let us know if there are any further questions.

评论- Lack of empirical evidence

2023-11-17

Thank you for your timely responses. The clarification for the Bayesian optimization case was helpful. Nevertheless, I (and maybe Reviewers ciAE and k54k as well) am still willing to hear how you will respond to the lack of empirical evidence in your paper.

评论- Reply to Reviewer rYHN

2023-11-17

Thank you for getting back to us and your continued interest in our paper, we appreciate it! As you know, our paper is intended to be a theoretical paper, and we believe our paper is well-motivated especially in light of recent events. Just in the past few months, we have seen Hollywood screenwriters go on strike to negotiate a deal preventing companies from training on their data and replacing their expertise with AI [1]. Thus, we believe it is high time to build our understanding of the replacement aspect of learning, which is exactly what we set out to do in our paper.

The experiments, which we started from scratch, finished today. Please see Appendix A for experiments on randomly sampled toy instances similar to (bigger versions) of the tabular setting of Table 1 (as suggested by reviewer ciAE). We have colored our edits to the original submission in red.

[1] In Hollywood writers' battle against AI, humans win (for now) (AP News)

评论- Reply to Reviewer rYHN

2023-11-20

Hi Reviewer rYHN, with the discussion period nearing the end, we hope you could please consider getting back to us soon. Was our response above satisfactory? If not, we would very much appreciate a chance to compose an update to address concerns, before the end of the discussion period. Thank you again for taking the time!

2023-11-21

My comments and concerns are addressed well by the authors' response and additional experiments conducted during rebuttal phase. My score does not change, leaning towards acceptance.

评论- Reply to Reviewer rYHN

2023-11-22

Thank you for getting back to us, reviewer rYHN! We're glad that we have addressed all the concerns in your review.

Please do let us know if there are more things we could clarify and/or address, before the end of the discussion period tomorrow, thanks.

AC 元评审

2023-12-06

This paper studies active learning with a strategic labeler, which may abstain from labeling to prolong the learning process. The authors introduce the setting, propose an algorithm for it, and analyze it. After I read the reviews, I had several concerns:

Motivation: I found the setting strange. As an example, human labelers often abstain due to low price. In this case, the learner has a tunable knob (price) to fix it. A labeling algorithm may abstain because it is uncertain about the label. This is not strategic at all though. That being said, this paper is the first theoretical investigation into classic active learning with a strategic labeler, which may open doors for others in the future. The reviewers like this.
Experiments: The submitted paper had no experiments. The authors added some experiments during the rebuttal. The results are not completely convincing and there is only one random policy baseline.
Assumptions: Reviewer GuAP was concerned about two strong assumptions in theory. This was resolved in the rebuttal.

This paper has blemishes. However, it is novel and can open doors for others. This is why we lean towards acceptance.

为何不给更高分

This work is mostly theory, which is easier to present as a poster.

为何不给更低分

The work is novel and can open doors for others.

最终决定Accept (poster)

2024-01-16

Accept (poster)