Co-Evolution Learning
摘要
评审与讨论
This paper tackles a key challenge in advancing generative and representation models: the dependence on high-quality, diverse data for training. To address these limitations, the authors introduce a co-evolution framework that enables generative and representation models to improve each other. Both representation and generation models progressively strengthen their performance by iterating through this mutual enhancement process.
优点
- The idea of co-evolution is interesting. It combines the two tasks in a unified framework and tries to help their corresponding model to improve each other in the mutual enhancement process.
- The paper is well-organized, starting with a clear introduction of the current limitations and a detailed breakdown of the design of the proposed framework.
缺点
- The use of a milder data augmentation strategy may have a limited impact on enhancing dataset diversity. Additionally, there is no ablation study to verify the effectiveness of this approach, even in Table 8, leaving its actual contribution to performance unclear.
- An interesting observation in Table 2 is that using a weak generation model leads to a decline in the performance of the trained representation model. However, there is no analysis provided on this phenomenon or its potential risks, which would be valuable for understanding the limitations and stability of the proposed framework.
- In the experiments across different datasets in Section 4.3, the generation model implementations vary, yet there is no clear explanations provided for these choices.
- In the co-evolution experiments, it is unclear whether the generation model is trained from scratch or utilizes pre-trained generative capabilities. This lack of clarification makes it difficult to discern the true source of the observed training benefits.
问题
Please refer to the weaknesses.
In this work, the authors propose to learn simultaneously a representation model and a generative model following a mutual feedback loop. One path (R2G) uses the embeddings provided by the representation model to guide the learning of the generative model. The other path (G2R) leverages the generated images as augmented data to train the representation model. The combination of both is referred to as co-evolution (CORE). The experiments show that this setting improves the performance in both generative and representation models.
优点
- The proposed approach is easy to understand, and provides moderate performance improvement.
- The paper is well structured and presented.
- Some experiments provide some useful insights.
缺点
- In my opinion the novelty is very limited. R2G is equivalent to an autoencoder with a pretrained and fixed encoder, and G2R is equivalent to an autoencoder with reconstruction loss in the latent space, i.e. with and , where the first encoder and the decoder are pretrained and fixed. These settings and their combination (i.e. CORE) have been extensively used in the context of autoencoders and image-to-image translation models (and cross-modal translation models). [A-D] are some early examples that come to mind with similar setting. The main difference is the use of more modern generative models (diffusion), but that is not novel in my view.
[A] Unsupervised cross-domain image generation, ICLR 2017 [B] MUNIT: Multimodal Unsupervised Image-to-Image Translation, ECCV 2018 [C] Perceptual Generative Autoencoders, ICML 2020 [D] Mix and match networks: encoder-decoder alignment for zero-pair image translation, CVPR 2018
问题
Please address my concern about the novelty, and justify why the proposed model is significantly different from autoencoders with latent reconstruction loss.
This paper proposes a co-learning framework called CORE to jointly learn the representation and generative models. Specifically, it has two components, R2G framework which uses pretrained representation vision encoder to project data into latent space z, and learn a generative models by maximizing the log-likelihood conditioned on the z. The second component is G2R, which can sample diverse data points and can be used to learn a better latent representation. Experiments show that co-evolving these two components can facilitate the task performance for representation/generative tasks.
优点
-- The proposed method empirically found that co-training can boost the performance of generative models training efficiency by 30%
-- The proposed Co-evolution of Representation modelsand Generative models (CORE) frame work is novel and interesting
缺点
--The paper is a bit hard to follow, for example, it is not clear what the main contribution of this framework after reading the introduction
--Experiments only conducted on small-scale dataset, CIFAR10/100 etc, where both SoTA generative models or representation learning methods already mastered and hard to tell if the performance come from parameter tuning or joint learning.
问题
-- How practical it is to implement this framework as the learning is iterative instead of end-to-end?
The paper introduces a co-evolution framework (CORE) that jointly trains generative and representation models to enhance each other iteratively. The framework leverages semantic embeddings from representation models to improve the semantic consistency of generated data and utilizes diverse generated data to enrich representations. The reviewers question the paper in its novelty and experiment scale. The authors do not provide rebuttals to address these problems, leading to a decision to reject this paper.
审稿人讨论附加意见
No rebuttal is provided.
Reject