PaperHub
4.4
/10
Rejected5 位审稿人
最低3最高6标准差1.2
3
5
5
3
6
3.0
置信度
ICLR 2024

Rethinking Teacher-Student Curriculum Learning under the Cooperative Mechanics of Experience

OpenReviewPDF
提交: 2023-09-21更新: 2024-02-11
TL;DR

We recast the Teacher-Student Curriculum Learning framework through the cooperative game that emerges among the structures a teacher algorithm presents to a learner in order to control its learning dynamics.

摘要

关键词
curriculum learningteacher-student curriculum learningcooperative game theory

评审与讨论

审稿意见
3

The aim of the paper is to rethink curriculum learning as a game.

优点

The paper is clearly presented as a conceptual paper, but the contribution to the community remains less clear.

缺点

The writing style with many words in italics is tedious and distracting. Please fix.

The aim of the paper is to rethink curriculum learning as a game. This has already been demonstrated quite clearly in the work by [Silver et al., Nature, 2017] on AlphaGo Zero, and a body of work building on their results exists, although the authors do not mention this. It is difficult to find a clear contribution, and clear experimental evidence. Also, no clear link to related work has been provided, or a comparison of this algorithms to others.

问题

I would advise the authors to study the literature more carefully, to formulate their theory more clearly, to perform more extensive experiments, to use a more approachable writing style, and to make the contributions more explicit.

评论

I have read the reveiws and responses. I think my score is accurate, and will not change it.

审稿意见
5

This paper analyses when and how teacher-student curriculum learning (TSCL) approaches work from a new data-centric perspective. Cooperative game theory is used to understand and interpret the key components of TSCL. It is shown that for every TSCL problem, there exists an equivalent cooperative game, and the learning progression objective and the teacher bandit policy in TSCL methods can be interpreted as an approximation of player marginal contribution and a fair allocation mechanism, respectively. The experiments show that the ordered value-proportional curriculum mechanism proposed can consistently find an (near-) optimal curriculum even when TSCL fails. It is also shown that in settings with significant negative interactions, TSCL cannot produce good curricula.

优点

  • The problem studied in the paper is well-motivated. Investigating when and how TSCL algorithms work receives comparatively little attention (compared to studying algorithmic improvements in TSCL) and is an important research problem.
  • Understanding TSCL algorithms from a data-centric perspective looks new to me. Using cooperative game theory to analyse how the teacher-student interactions affect the performance of the curriculum that are found by TSCL algorithms is very interesting.
  • The experimental results seem to show that the proposed game-theoretic and data-centric interpretation of TSCL can be applied to some simple supervised learning and reinforcement learning problems to understand the conditions under which TSCL is effective.

缺点

  • My main concern of the paper is that the practical use of analysing TSCL algorithms from the proposed data-centric perspective is unclear to me. Given a TSCL problem, it does not seem straightforward to use the game-theoretic analysis presented to understand when and how TSCL works. It is also not clear to me if the formal grounding provided here is more/less useful compared to existing works that aim to understand when and how curriculum learning works as mentioned in the paper (e.g., Wu et al. (2020) and Lee et al. (2021))
  • As acknowledged in the paper, all experiments only consider a very small number of "units of experience". In most complex RL tasks, the learner algorithm needs access to hundreds and thousands of units of experience. I do not see how the theory presented can help analyse the TSCL approaches in these more complex settings, such as understanding the conditions under which TSCL is effective.
  • This is a minor problem with respect to the presentation of the paper. I personally found the extensive use of italic type text quite distracting when reading the paper.

问题

  • I do not really understand how the RL experiment (Section 5.4) answer the research question of "how units’ interactions impact TSCL’s prospects to find useful curricula". The main result seems to just be that the Nowak & Radzik values estimated is in line with folk knowledge that an optimal curriculum exists. Given a particular TSCL method, how do you use a similar analysis to evaluate if the curriculum found by the given TSCL method is useful or not?
  • It is mentioned that the experimental result in classical games "contrasts with folk knowledge in population-based training". How to interpret this result? It should be briefly summarised in the main paper.
  • Can you explain the practical use of analysing TSCL algorithms from the proposed game-theoretic and data-centric perspective?
评论

We thank the reviewer for highlighting our work's originality and overall significance for an important problem that we agree receives little attention in the literature.

Usefulness: We would like to understand the reviewer’s point about the usefulness of our approach when contrasted to previous work. As mentioned in the manuscript, ourlly differs work fundamenta from Wu et al. [1] and Lee et al [2]. Both works are (computationally expensive) empirical investigations of the effect of curricula while ours introduces a systematic and formal approach, grounded in game-theoretic arguments, to investigate TSCL. We believe the main contribution of our work is foundational. We hope it serves as a starting point to better understand the problem space and develop novel algorithmic contributions that address the issues we uncovered.

Scalability and Computational Limitations: We share the reviewers’ concerns about the scalability of our work. Our experiments considered a small number of units as cooperative solution concepts are computationally hard problems with well-known complexity classes [1]. However, the central argument of our work is that, whether we can efficiently compute these solution concepts or not, the cooperative game among units of experience that we develop reveals the impact data has on TSCL's capacity to find curriculum in relatively simple settings.

[1] Deng, Xiaotie, and Christos H. Papadimitriou. “On the Complexity of Cooperative Solution Concepts.” Mathematics of Operations Research, vol. 19, no. 2, 1994, pp. 257–66,

The RL Setting: The particular case of RL that the reviewer alludes highlights one of the main contributions of our work, the concept of unit of experience. Through this abstraction, we unveil the cooperative game that happens at different levels (i.e., instances, batches, trajectories, tasks/environments or benchmarks). As the reviewer correctly points out, the learner has access to thousands of units at the instance level. Undoubtedly, computing the exact effect of every unit on the learner at that level is intractable. However, the problem becomes computationally more tractable as the abstraction level increases. Our notion of a unit of experience and the cooperative game we unveil equally applies to every level. Moreover, our proposition could also explain the effectiveness of other strategies in RL, like Prioritized Experience Replay [1], that compute similar estimates at the instance level. However, we do not investigate this connection in this work and leave it to future investigations.

[1] Schaul, Tom, et al. “Prioritized Experience Replay.” 4th International Conference on Learning Representations, 2016.

Our Work & Population-based Training: As we explained in Appendix B, if one interprets TSCL as a meta-strategy solver over a population of opponents, our results of selecting TitForTat instead of AlwaysDefect contrast with other meta-strategy solver methods [1] that prescribe playing against the (meta) Nash (strongest strategy) of the population. While we consider this an exciting finding, we deferred it to the appendices due to a space constraint.

[1] Lanctot, Marc, et al. “A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning.” Advances in Neural Information Processing Systems, 2017

RL Experiments & TSCL failures: It is correct that the estimated Nowak & Radzik values align with folk knowledge that an optimal curriculum exists. To our knowledge, this grounded quantification is novel. Moreover, and to the reviewer’s central argument, what validates the idea of “how units interactions impact TSCL” is that using the value-proportional mechanism we derived from Nowak & Radzik values, we can find curriculum in this task, while the multi-armed bandit approach fails to do so (Fig. 3b). The results from Nowak & Radzik interaction metric further confirms this analysis (Fig. 4b).

Minor: We apologize for our excessive use of italics to highlight concepts and ideas throughout the paper. We have significantly scaled back their use in the updated version of the manuscript.

评论

Thank you for your response. My concern about the practical limit and scalability issue still remains. I'll keep my score the same.

审稿意见
5

The paper proposes a data-centric perspective to analyze the mechanics of teacher-student interactions in TSCL, using cooperative game theory. The set of experiences presented by the teacher to the learner, as well as their order, influence the performance of the curriculum in TSCL. The paper demonstrates that for every TSCL problem, there exists an equivalent cooperative game, and key components of the TSCL framework can be reinterpreted using game-theoretic principles. Experiments covering supervised learning, reinforcement learning, and classical games are conducted to estimate the cooperative values of experiences and construct curricula.

优点

  • Originality: The paper introduces a novel perspective on TSCL by analyzing the mechanics of teacher-student interactions using cooperative game theory. This approach is unique and provides a fresh understanding of TSCL.
  • Quality: The paper conducts experiments covering supervised learning, reinforcement learning, and classical games to estimate the cooperative values of experiences and construct curricula. This empirical evaluation enhances the quality of the research and validates the proposed framework.
  • Clarity: The paper clearly presents the concepts and principles of TSCL, as well as the application of cooperative game theory. The experimental setup and results are well-explained, making it easy for readers to understand the research.
  • Significance: The paper's findings shed light on the underlying mechanisms of TSCL and provide insights into its broader applicability in machine learning. This has significant implications for curriculum learning approaches and can contribute to the development of more effective learning algorithms.

缺点

  • Lack of comparison: The paper does not compare the proposed data-centric perspective with existing approaches or frameworks in the field of TSCL. A comparative analysis could provide insights into the advantages and limitations of the proposed approach.
  • Limited scope of experiments: While the paper conducts experiments covering supervised learning, reinforcement learning, and classical games, the scope of these experiments may not be comprehensive enough to fully explore the effectiveness and applicability of the proposed framework. Since the coIncluding a wider range of learning scenarios and domains could strengthen the empirical evaluation.

问题

My main concern is about the experiments. Since TSCL is mainly for the hard problem which students cannot learn well in the final tasks, however, the experiments here is almost simple tasks. Could you provide a comparative analysis of the proposed data-centric perspective with existing approaches or frameworks in the field of TSCL? This would help in understanding the advantages and limitations of the proposed approach.

伦理问题详情

No ethics concerns.

评论

Comparison to Existing Algorithms: The field of curriculum learning, and approaches that adopt the teacher-student framework, is vast [1, 2] and algorithms are evaluated on a wide range of dissimilar tasks. In many cases, the notion of curriculum among those tasks are not clear. In our experimental setting, we favored the evaluation across different machine learning paradigms to showcase the broad applicability of our framework. From an algorithmic perspective, we selected the multi-armed bandit approach as it is in foundational work on TSCL [3,4] and is commonly utilized through the literature [5,6,7,8]. If the reviewer has suggestions on what algorithm/family of algorithms within TSCL would be better suited for this comparison, we would gladly consider it.

Simplicity of the Tasks: We agree with the reviewer that the tasks we selected are relatively simple. However, there is a notion of curriculum among them. Moreover, as our experiments highlight, even in those simple tasks, multi-armed bandit approaches to TSCL fail to uncover the existent curriculum. It is not unreasonable to assume that these results will hold for more complicated tasks where the interactions among them are less straightforward.

[1] Soviany, Petru, et al. “Curriculum Learning: A Survey.” International Journal of Computer Vision, vol. 130, no. 6, June 2022, pp. 1526–65, https://doi.org/10.1007/s11263-022-01611-x.

[2] Wang, Xin, et al. “A Survey on Curriculum Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, Sept. 2022, pp. 4555–76, https://doi.org/10.1109/TPAMI.2021.3069908.

[3] Graves, Alex, et al. “Automated Curriculum Learning for Neural Networks.” Proceedings of the 34th International Conference on Machine Learning, edited by Doina Precup and Yee Whye Teh, vol. 70, PMLR, 2017, pp. 1311–20, https://proceedings.mlr.press/v70/graves17a.html.

[4] Matiisen, Tambet, et al. “Teacher-Student Curriculum Learning.” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, Sept. 2020, pp. 3732–40, https://doi.org/10.1109/TNNLS.2019.2934906.

[5] Colas, Cédric, et al. “Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration.” arXiv [cs.AI], 21 Feb. 2020, http://arxiv.org/abs/2002.09253. arXiv.

[6] Portelas, Rémy, et al. “Teacher Algorithms for Curriculum Learning of Deep RL in Continuously Parameterized Environments.” arXiv [cs.LG], 16 Oct. 2019, http://arxiv.org/abs/1910.07224. arXiv.

[7] Racaniere, Sebastien, et al. “Automated Curricula through Setter-Solver Interactions.” arXiv [cs.LG], 27 Sept. 2019, http://arxiv.org/abs/1909.12892. arXiv.

[8] Turchetta, Matteo, et al. “Safe Reinforcement Learning via Curriculum Induction.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 12151–62

审稿意见
3

This paper provided a game theoretic-based angle to understand teacher student curriculum learning. The paper proposed a data-centric view of the learning process and treat each learning sample as a unit of experience, which could be a batch, dataset, or even an environment. Then each unit of experience is formulated as a player in the game-theoretical framework. The paper considered teacher as an agent to select unit of experience over time with the goal of steering learner's policy to meet some target objective, e.g., maximize performance on a target evaluation set. Using multi-armed bandit as the selection policy for unit of experience, the paper argued that curriculum learning achieves monotonic model improvement when the game is cooperative in nature. Extensive experiments on supervised learning, reinforcement learning among others demonstrate that the new game-theoretic angle is indeed helpful in gaining better insight of curriculum learning.

优点

(1) The paper provided a new perspective of curriculum learning and laid down a foundational framework for assessing the functionality of each individual curriculum based on the cooperative game theoretic framework. The new angle has its own novelty and was never studied before. This opens another road for researchers in this area to gain better understanding of the problem.

(2) Apart from the game-theoretic framework, the work also performed extensive experiments on real-world data such as MNIST to demonstrate the rationality behind the proposed framework, which is pretty interesting and illustrative.

(3) I'd like to highlight that the proposed framework applies to a broad class of learning paradigms, including supervised learning, reinforcement learning, among others. It's easy to see that the proposed methodology can be extended to many other scenarios such as active learning, online learning etc. Therefore, the paper is general enough.

缺点

(1) While the paper laid down a solid framework for understanding curriculum learning, it lacks a theoretical understanding of the problem, and there are no affirmative theoretical results coming out of it. It provided some generic methodology, but the conclusions are drawn from empirical results, and I am not sure what theoretical questions can be further studied under this topic. This makes the paper somewhat weak and less satisfying.

(2) The framework proposed in this paper is nice and elegant, but I am very concerned about the practical side of the methodology - it requires training a model multiple times and evaluating the value of each unit of experience. This is very time consuming if each unit of experience is a large dataset or an RL environment. Therefore, it limits the generalizability of the proposed approach.

问题

(1) Please derive solid theoretical results to support the proposed method, or at least provide some theoretical insights.

(2) Please discuss the generalizability of the proposed method - see comment (2) in the weaknesses part.

评论

We thank the reviewer for the insightful review and the positive feedback on the novelty, broad applicability and general strengths of the paper's core framework and potential of its foundational nature.

Further Theoretical Questions: To the reviewer concern of lack of future theoretical avenues that can be derived from our work, we can mention, for instance, the connection between notions of convex games [1] with sub/super modularity in discrete combinatorial optimization [2,3,4]. However, we considered this connection and further alternative theoretical investigations outside of the scope of the present work. There is a significant communication complexity in presenting ideas at the intersection of these three generally disconnected fields. As the reviewer mentioned, our work presents a foundational framework to further study TSCL and its relationship with other areas like continual learning, transfer learning, or multitask learning under cooperative game-theoretic ideas. That’s why we believe our results warrants further communication to the community.

The revised version of the manuscript now more clearly points to these more theoretical avenues that should be a continuation of the path we have outlined here.

Scalability & Computational Limitations: We share the reviewers’ concerns about the scalability of our work. Undoubtedly, cooperative solution concepts are computationally-hard problems with well-known complexity classes [4]. However, recent advances [5, 6] make us hope to obtain better approximations of the cooperative game we designed. On the other hand, the central argument of our work is that, whether we can efficiently compute these solution concepts or not, the cooperative game among units of experience that we develop impacts TSCL's capacity to find curriculum in relatively simple settings. We believe our work should serve as a starting point to develop novel algorithmic contributions that address the problems this data-centric perspective uncovered.

[1] Shapley, Lloyd S. “Cores of Convex Games.” International Journal of Game Theory, 1971

[2] Bach, Francis. “Learning with Submodular Functions: A Convex Optimization Perspective.”

[3] Dughmi, Shaddin. “Submodular Functions: Extensions, Distributions, and Algorithms. A Survey.” arXiv [cs.DS], 1 Dec. 2009

[4] Deng, Xiaotie, and Christos H. Papadimitriou. “On the Complexity of Cooperative Solution Concepts.” Mathematics of Operations Research, vol. 19, no. 2, 1994, pp. 257–66,

[5] Yan, Tom, and Ariel D. Procaccia. “If You Like Shapley Then You’ll Love the Core.” Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 35, no. 6, May 2021, pp. 5751–59

[6] Mitchell, Rory, et al. “Sampling Permutations for Shapley Value Estimation.” Journal of Machine Learning Research: JMLR, vol. 23, no. 43, 2022, pp. 1–46

审稿意见
6

This paper studies the underlying mechanics of the teacher-student interactions in Teacher-Student Curriculum Learning (TSCL). The authors employ cooperative game theory to describe how the composition of the set of experiences presented by the teacher to the learner, as well as their order, influences the performance of the curriculum that are found by TSCL approaches. To do so, we demonstrate that for every TSCL problem, there exists an equivalent cooperative game, and several key components of the TSCL framework can be reinterpreted using game-theoretic principles. The authors also conducted experiments covering supervised learning, reinforcement learning, and classical game.

优点

This paper proposes a novel, promising perspective for theoretically understanding teacher-student learning based on cooperative game. This is an exciting advance for me.

The authors provide rigorous theory with detailed proofs.

The experiments cover a wide range of cases - supervised learning, reinforcement learning, and classical games.

The paper is well written.

缺点

It is much appreciated that the authors discussed the limitations of the proposed method. However, such a comparison is still not enough. The readers would be much interested in discussions on the differences and advantages of the proposed method from the existing algorithms.

问题

Please address the above.

评论

We thank the reviewer for the positive feedback on the novelty and broad applicability of our work.

Comparison to Existing Algorithms: The field of curriculum learning, and approaches that adopt the teacher-student framework, is vast [1, 2] and algorithms are evaluated on a wide range of dissimilar tasks. In many cases, the notion of curriculum among those tasks is not clear. In our experimental setting, we selected the multi-armed bandit approach as it is the foundational work on TSCL [3,4] and it is commonly utilized through the literature [5,6,7,8]. If the reviewer has some suggestions on what algorithm/family of algorithms within TSCL would better suited for this comparison, we would gladly consider it.

[1] Soviany, Petru, et al. “Curriculum Learning: A Survey.” International Journal of Computer Vision, vol. 130, no. 6, June 2022, pp. 1526–65, https://doi.org/10.1007/s11263-022-01611-x.

[2] Wang, Xin, et al. “A Survey on Curriculum Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, Sept. 2022, pp. 4555–76, https://doi.org/10.1109/TPAMI.2021.3069908.

[3] Graves, Alex, et al. “Automated Curriculum Learning for Neural Networks.” Proceedings of the 34th International Conference on Machine Learning, edited by Doina Precup and Yee Whye Teh, vol. 70, PMLR, 2017, pp. 1311–20, https://proceedings.mlr.press/v70/graves17a.html.

[4] Matiisen, Tambet, et al. “Teacher-Student Curriculum Learning.” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, Sept. 2020, pp. 3732–40, https://doi.org/10.1109/TNNLS.2019.2934906.

[5] Colas, Cédric, et al. “Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration.” arXiv [cs.AI], 21 Feb. 2020, http://arxiv.org/abs/2002.09253. arXiv.

[6] Portelas, Rémy, et al. “Teacher Algorithms for Curriculum Learning of Deep RL in Continuously Parameterized Environments.” arXiv [cs.LG], 16 Oct. 2019, http://arxiv.org/abs/1910.07224. arXiv.

[7] Racaniere, Sebastien, et al. “Automated Curricula through Setter-Solver Interactions.” arXiv [cs.LG], 27 Sept. 2019, http://arxiv.org/abs/1909.12892. arXiv.

[8] Turchetta, Matteo, et al. “Safe Reinforcement Learning via Curriculum Induction.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 12151–62

AC 元评审

The paper proposes a novel methodology to study the teacher-student curriculum learning framework using cooperative game theory. The reviewers acknowledged that the proposed methodology provides a new perspective to understanding the mechanics of teacher-student curriculum learning. However, the reviewers pointed out several weaknesses in the paper, and raised concerns related to the limited practicality of the proposed methodology and lack of new theoretical results. We want to thank the authors for their detailed responses. Based on the raised concerns and follow-up discussions, unfortunately, the final decision is a rejection. Nevertheless, the reviewers have provided detailed and constructive feedback. We hope the authors can incorporate this feedback when preparing future revisions of the paper.

为何不给更高分

The reviewers pointed out several weaknesses in the paper, and raised concerns related to the limited practicality of the proposed methodology and lack of new theoretical results. A majority of the reviewers think that the work is not yet ready for publication.

为何不给更低分

N/A

最终决定

Reject