TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning

审稿意见

评分: 4置信度: 42025-06-26

The paper introduces a semi-supervised learning (SSL) method that is based on a game-theoretic interaction among models. This paper observes how existing methods select pseduo-labels based on the models' confidence, which is susceptible to calibration error and can be detrimental. The proposed solution uses mutual information as an alternative to pseudo-label selection. It is shown that across a wide range of standard SSL benchmarks, the proposed method outperforms not only existing SSL methods but also fully-supervised methods despite using only 25% of the dataset.

优缺点分析

Strengths

The proposed method is tested on many image classification benchmarks. This is convincing, as many SSL methods are tested on fewer datasets due to the higher compute requirements.
This paper demonstrates various benefits to the proposed method, including feature clustering and CAM visualizations.

Weaknesses

Complexity: This is a limitation that exists not only in the proposed method, but also in existing semi-supervised learning methods. The proposed method uses two encoders that are trained using different methods, several additional models, and adversarially-robust training, in addition to the many details. Such multi-stage training methods and those that combine various modules are hard to apply to new problems due to development costs and potentially requiring a whole new set of hyperparameters to tune.
It's unclear what the role of each stage plays. Can some of the components be removed yet achieve similar performance? While Section 2 motivates each component, it's unclear that the intended effects are the cause for enhanced performance. For example, describing the teacher-student training as a Stackelberg game doesn't seem to use anything from game theory. Existence of a Nash equilibrium often doesn't say much about the quality of the equilibrium or convergence to equilibrium. While the teacher-student training is described as a bi-level (Stackelberg) game, the students/teacher are updated iteratively instead of being solved at different time scales.

Overall, the practical benefits seem to outweigh the limitations as an off-the-shelf method. My main concern is that the method seems to be a bit ad-hoc, and therefore may have limited applicability outside of the considered benchmarks.

Minor Comments

The figures should have thicker curves; they are hard to read, for example Figure 1.a.
It seems unnecessary to put the results figure in page 1. I recommend moving this to the experiments/results sections.

问题

Is there any part of the method that can be removed or combined with others and yet retain the high performance?

局限性

Yes.

最终评判理由

The authors addressed my technical concerns regarding computational overhead and the Stackelberg formulation. The method is new and the authors demonstrate moderate gains on standard benchmarks. The contribution is solid, but does not substantially advance the state-of-the-art or open up new applications, which prevents me from giving a stronger recommendation. Therefore, I hold my rating as weak accept.

格式问题

None.

作者回复

2025-07-29

Q1: Method Complexity, Modularity, and Practicality

We thank the reviewer for raising important concerns regarding the complexity and modularity of TRiCo. While our method integrates several components, it remains a principled, end-to-end, and fully differentiable framework—distinct from multi-stage pipelines or ad-hoc ensembles. All components are trained jointly, and their interactions are synergistic rather than additive.

To assess modularity, we conduct an ablation study to examine the contribution of each component. Results (Table: Ablation on TRiCo Components) show that removing any major module degrades performance, with the full TRiCo achieving 96.3% on CIFAR-10 (10% labeled), and drops of 1.1–2.2% observed when disabling MI filtering, the meta-teacher, generator, or co-training structure.

Table: Ablation on TRiCo Components (CIFAR-10, 10% labeled) All ablations are repeated over 5 random splits to ensure stability.

Variant	Top-1 Acc. (%)	Δ vs. Full
Full TRiCo	96.3 ± 0.3	—
w/o MI filtering (Conf-τ only)	95.2 ± 0.4	−1.1
w/o Meta-Teacher (Fixed threshold)	94.9 ± 0.6	−1.4
w/o Generator (No PGD)	95.0 ± 0.5	−1.3
Single Student (No Co-training)	94.1 ± 0.5	−2.2

Despite integrating multiple components, TRiCo remains efficient and simple to deploy: frozen backbones (no encoder training), lightweight embedding-level adversarial updates (no input gradients), and meta-learning with only first-order updates.

To quantify computational cost, we provide a component-wise breakdown in Appendix B and summarize below:

Table: Component-wise Complexity Breakdown of TRiCo

Component	Added Overhead	Optimization Strategy / Description
Mutual Information Estimation	~+4.5% FLOPs	$K{=}5$ forward passes; stop-gradient; no backprop
Adversarial Generator (1-step PGD)	~+1.5% FLOPs	Embedding-level perturbation only; no backward computation
Meta-Gradient Update	~+1% FLOPs, ~+10% memory	First-order gradient only; unrolled once per step
Total (vs. MCT)	~+7% FLOPs, ~+10% memory	No mixed-precision or checkpointing; further savings possible

Overall, TRiCo offers a practical trade-off between performance and complexity. It does not rely on strong augmentations, learnable view encoders, or auxiliary networks, and operates efficiently on a single NVIDIA RTX A6000 (48GB) or equivalent 24GB-class GPU (e.g., RTX 3090, RTX 4090).

Q2: Theoretical Justification and Stackelberg Formulation

We sincerely thank the reviewer for their insightful question regarding the distinction between Nash and Stackelberg equilibria in our triadic game formulation. You are absolutely correct—our framework is designed as a Stackelberg game, where the teacher $\pi_T$ acts as the leader, while the student classifiers $f_1$ , $f_2$ , and the generator $G$ act as followers responding to the teacher’s strategy.

While our original statement focused on the existence of a Nash equilibrium, we emphasize that what we actually prove is a Stackelberg-Nash equilibrium. That is, the equilibrium point is derived from a Stackelberg setting, but satisfies the equilibrium conditions of all agents, as clarified below.

In particular, our proof (Appendix A, Theorem 1) proceeds in two stages: In Equations (9–18), we establish that the strategy spaces of the teacher, students, and generator are all compact subsets of Euclidean space, and the payoff functions are jointly continuous. Therefore, by Glicksberg’s Theorem, a pure-strategy Nash equilibrium exists. Notably, Glicksberg’s theorem is agnostic to role asymmetry and is applicable to Stackelberg games as long as continuity and compactness hold. In Equations (19–32), we construct the solution explicitly under the Stackelberg game formulation, where the teacher optimizes a meta-objective based on the best responses of the students and generator. This construction satisfies the Stackelberg equilibrium condition by solving the bilevel optimization problem where the followers’ reactions are uniquely defined.

Although the generator $G$ is non-parametric, it is defined via a deterministic one-step projected gradient ascent (PGD) update in the embedding space, which depends on the current student parameters $\theta_S$ . This structure allows us to treat $G$ as an implicit function $G(\theta_S)$ , which is continuous with respect to $\theta_S$ .

This design ensures that the generator’s response satisfies the continuity and stability conditions required for the teacher's optimization in the Stackelberg game to be well-posed. Consequently, we do NOT require any additional structural assumptions on $G$ beyond those already assumed in our setting. Moreover, as shown in Equations(29)-(31) in Appendix A, the joint strategy of the students and generator admits a well-defined best-response mapping under a fixed teacher strategy $\pi_T$ . By applying a fixed-point theorem over this composite response, we construct the Stackelberg equilibrium even in the presence of a non-parametric generator. This validates the existence of a triadic Stackelberg equilibrium under the assumptions already established.

To avoid ambiguity, we will revise the manuscript to explicitly state that our framework admits a Stackelberg equilibrium, and we will introduce this result formally as Theorem 2 in the revised version. We believe this clarification strengthens the theoretical foundation and removes any residual confusion about equilibrium definitions.

Theorem 2 (Existence of Stackelberg Equilibrium). In the Stackelberg formulation, we assume that one party (the leader) commits to a strategy first, and the remaining parties (the followers) best-respond. In our case, the teacher $\pi_T \in \Pi_T$ is the leader, while the students $(f_1, f_2) \in \Pi_S$ and generator $G \in \Pi_G$ are the followers. We use the same assumptions on compactness and continuity as in Theorem 1.

Given a fixed teacher strategy $\pi_T \in \Pi_T$ , the followers play a simultaneous game. Their equilibrium is defined by the following best-response conditions:

$\forall f_i \in \Pi_S: \quad f_i^* (\pi_T) = \arg\min_{f_i \in \Pi_S} R_S(f_i, \pi_T, G^* (\pi_T)), \quad i = 1,2,$

$G^* (\pi_T) = \arg\max_{G \in \Pi_G} R_G(G, \pi_T, f_i^* (\pi_T)).$

The teacher then selects her strategy to maximize her own payoff given the best responses of the followers:

\pi_T^* = \arg\max_{\pi_T \in \Pi_T} R_T\left(\pi_T, f_1^* (\pi_T), f_2^* (\pi_T), G^* (\pi_T)\right).

Then, the tuple $\left(\pi_T^* , f_1^* (\pi_T^* ), f_2^* (\pi_T^* ), G^* (\pi_T^* )\right)$ is a Stackelberg equilibrium if the following conditions hold:

(Leader’s optimality)

R_T\left(\pi_T^* , f_1^* (\pi_T^* ), f_2^* (\pi_T^* ), G^* (\pi_T^* )\right) \geq R_T\left(\pi_T, f_1^* (\pi_T), f_2^* (\pi_T), G^* (\pi_T)\right), \quad \forall \pi_T \in \Pi_T,

(Followers’ best responses)

\forall f_i \in \Pi_S:\quad R_S\left(f_i^* (\pi_T^* ), \pi_T^* , G^* (\pi_T^* )\right) \leq R_S\left(f_i, \pi_T^* , G^* (\pi_T^* )\right), \quad i=1,2,

\forall G \in \Pi_G:\quad R_G\left(G^* (\pi_T^* ), \pi_T^* , f_i^* (\pi_T^* )\right) \geq R_G\left(G, \pi_T^* , f_i^* (\pi_T^* )\right).

Proof Sketch. We reuse the compactness and continuity assumptions established in Theorem 1 (Eqs. 9–18 in Appendix A). For each fixed $\pi_T$ , the follower subgame among $f_1, f_2, G$ admits a Nash equilibrium. The teacher’s objective is continuous in $\pi_T$ given continuous best-response mappings. Therefore, the leader’s optimization has a maximizer $\pi_T^* \in \Pi_T$ , which implies the existence of a Stackelberg equilibrium. The specific equlibria is provided in (Eqs. 19–32 in Appendix A)

We greatly appreciate the reviewer’s suggestion and are confident that this update will improve both precision and clarity for readers and practitioners.

Q3: Clarity and Presentation Improvements

We thank the reviewer for the suggestions on figure quality. We will increase curve thickness and relocate Figure 1 to the experiments section in the final version to improve clarity. Rather than a loose combination of techniques, TRiCo’s design is guided by a unified game-theoretic principle: the teacher regulates pseudo-label quality and loss dynamics to shape stable, complementary student learning trajectories.

Summary

We clarify that TRiCo is a principled, end-to-end differentiable framework—not a multi-stage ensemble—where all components are co-trained for synergy. Through ablations, we show that each module (e.g., MI filtering, generator, meta-teacher) contributes significantly to performance. Despite its triadic structure, TRiCo incurs only modest computational overhead and runs efficiently on a 24GB-class GPU.

Theoretically, we explicitly formalize TRiCo as a triadic Stackelberg game, not just a heuristic co-training strategy. We prove the existence of a Stackelberg equilibrium using compactness and continuity assumptions, even with a non-parametric generator. This resolves the ambiguity around equilibrium type and strengthens the theoretical foundation.

We also appreciate the reviewer’s suggestions on figure clarity and will revise visuals accordingly. Overall, TRiCo is both theoretically grounded and practically viable for semi-supervised learning.

2025-08-05

Thank you for the comments. The authors have addressed my concerns, and I have no further questions.

评论- Thank you for your constructive feedback and recognition

2025-08-07

Dear Reviewer C9DT,

Thank you very much for your recognition of our work and for the constructive suggestions provided during the review process. Your feedback was instrumental in helping us improve the paper, and we are glad that your concerns have been addressed. We sincerely appreciate your support!

Authors of Paper 6732

审稿意见

评分: 5置信度: 42025-07-02

This paper introduces TriCO, a new framework of semi-supervised learning based on Pseudo-labeling strategy, involving three (sets of) agents: Students, Teacher, and Generator. With Teacher being the leader in the Stackelberg Game, students and teacher balance each other to find a solution that is very competitive, sometimes even SOTA, in the standard SSL task, as well as in OOD task. The paper also provides the guarantee that there is a nash equilibrium in the three-player game.

优缺点分析

Strength

This paper is clearly strong in two aspects : first, it brings in “game theory” into the challenging problem of interacting systems in the setting of SSL, and backs it up with a formal guarantee. “Three” player game is something that I have never encountered before in this genre, and strikes as a fundamental novelty.
Second, the efficacy of the method is demonstrated on a strong basis. Ablations are also conducted thoroughly. While I am committed not to evaluate the research based merely on “the depth of the experiments”, I believe that this thoroughness solidly supports the significance of the proposition.

Weaknesses

Several parts are difficult to read and some fix would greatly help conveying the idea. For example, several definitions seems missing. From the context, $p_i$ in (1) seems like the ensemble average, but this has never been explicitly stated in the main manuscript. $H(p(y))$ is also confusing, since $p(y)$ is a scaler ; I believe this is $p_Y$ . It also seems that $L_sup$ stands for $E_{i, X, Y}[ CE (f_i (v_i(X), Y) ]$ , but I do not see it explicitly defined. The same goes for $Accuracy_{val}(f_1, f_2)$ . Theorem 1, I believe, is supposed to be followed by “for all $f_i, G, \pi_T$ .
Lastly, although the existence of Nash Equilibrium is wonderful and inspiring, I believe this problem is a Stackelberg game, and I believe Stackelberg Equilbrium and Nash Equilbrium are not always the same? If I am not missing some lines, I believe some explanation in this aspect is warranted. I believe that if the paper can provide the existence of Stackelberg Equilbrium, this work is irrefutable. I'm very much open to increasing the score when these subtlties and questions below are resolved to a degree.

问题

Please see the second weakness section; I wonder that Stackelberg Equilbrium is difficult in this context because the Generator is non-parametric? Can this problem be resolved if we can impose some structural assumptions here? I believe that the result of Stackelberg Equilibrium would be beneficial to the community even if it requires additional (even possibly, a little unrealistic) assumption.
Also, I was a little worried that mentions were not made so much regarding the stability of the learning in the main manuscript, other than at the theorem section (Fig 5 in the Appendix and Table 15 are acknowledged). As an devil's advocate, I would also like to know if there are any implicit hyperparameters (or the parameters of the training) that one must be particularly careful about in training TriCO, because "training multi-agents system" has always been a challenge---this information would definitely benefit the researchers that would follow this study.

局限性

As claimed "Limitations are implicitly discussed in Section 4", regarding limited supervision and drastic distributional shift. I believe this should be explicitly stated, and this will not hurt the work's credibility.

最终评判理由

The authors have provided convincing rebuttal, providing the existence of Stackelberg-Nash equlibrium, making the theory in stronger alignment to the methods and experiments. The authors also conducted additional experiments to investigate the sensitivity of the hyperparameters, which is one of the most fearful factor in the systems involving multiple agents. I believe that this work contributes an important aspect of the semi-supervised learning, and I raised my score to 5.

格式问题

Nothing in Particular

作者回复

2025-07-29

Q1: Clarification of Notation and Mathematical Precision

We thank the reviewer for highlighting the importance of notation clarity and agree that the current manuscript can be improved in this regard. We will revise the final version to explicitly define key symbols and clarify equations.

In Equation (1), we compute mutual information (MI) between model predictions $\hat{y}$ and model posterior induced by dropout. Specifically, we use:

$\mathcal{I}[\hat{y}; \omega] = \mathbb{H}\left[\frac{1}{K} \sum_{k=1}^{K} p_{\theta}(y|x;\omega_k)\right] - \frac{1}{K} \sum_{k=1}^{K} \mathbb{H}\left[p_{\theta}(y|x;\omega_k)\right]$

where $\omega_k$ denotes the $k$ -th Monte Carlo dropout mask. We will clarify that $\bar{p}(y|x) := \frac{1}{K} \sum_{k=1}^{K} p_{\theta}(y|x;\omega_k)$ is the ensemble distribution, and $\mathbb{H}[p] = -\sum_{c=1}^{C} p_c \log p_c$ is the entropy of a categorical distribution. Additionally, we will specify that $\tau_{\mathrm{MI}}$ is a scalar threshold used to select pseudo-labels when $\mathcal{I}[y;\omega] \geq \tau_{\mathrm{MI}}$ , and that its range lies in $[0, \log C]$ , where $C$ is the number of classes.

Theorem 1 will also be revised to include the appropriate quantifiers. The corrected statement reads:
Theorem 1. Under assumptions (A1)–(A3), for all teacher strategies $\pi_T \in \Pi$ , there exists a Stackelberg equilibrium $(f_1^\star, f_2^\star, G^\star)$ such that $(f_1^\star, f_2^\star, G^\star)$ form a joint best-response to $\pi_T$ , and $\pi_T$ minimizes validation loss given these optimal responses.

Moreover, while the generator $G$ is non-parametric (via 1-step PGD), its updates are deterministic given student parameters, and can be treated as an implicit, structured response in the Stackelberg formulation. We will update Section 4.4 to clarify these points and explicitly distinguish Nash and Stackelberg equilibrium.

Q2. Clarification on Equilibrium Type and Theoretical Guarantees