D2R2: Diffusion-based Representation with Random Distance Matching for Tabular Few-shot Learning
We propose a novel framework for tabular few-shot learning, comprising a diffusion-based model with random distance matching for representation learning, and an instance-wise iterative prototype scheme for few-shot classification.
摘要
评审与讨论
This paper proposes a novel approach named Diffusion-based Representation with Random Distance matching (D2R2) for tabular few-shot learning. It leverages the powerful expression ability of diffusion models to extract essential semantic knowledge crucial for the denoising process. During the training process of the designed diffusion model, it introduces a random distance matching to preserve distance information in the embeddings, thereby improving the effectiveness of classification. During the classification stage, it introduces an instance-wise iterative prototype scheme to improve performance by accommodating the multimodality of embeddings and increasing clustering robustness. Experiments demonstrate the effectiveness of the proposed method. The main contributions of this paper are:
- It may be the first to propose a specifically designed diffusion method to learn semantic knowledge for tabular data.
- It proposes an innovative framework, D2R2, to extract representations in tabular few-shot learning.
- It introduces a novel classifier with instance-wise iteration prototypes to further improve few-shot classification performance.
优点
- The problem studied in this paper is interesting and valuable.
- This paper is well written and in good sharp, which is easy to follow.
- Experimental results are promising and can validate the effectiveness of the proposed method.
缺点
- In my opinion, the description of the shortcomings of existing tabular few-shot learning methods in the introduction is not sufficient. Specifically, the authors should further emphasize the problems the heterologous feature types bring that SOTA tabular few-shot learning methods ignore, to demonstrate the necessity and innovation of the proposed method.
- On the basis of the first issue, the introduction of diffusion models should be based on the drawbacks of existing problems, rather than solely on their strong expressiveness. The authors should point out the motivation for the leverage of diffusion models.
- The emergences of semi-supervised learning and self-supervised learning mentioned in this paper are somewhat abrupt. The authors should explain the relationship between them and tabular few-shot learning, as well as their relevance to the D2R2 proposed in this paper.
问题
- The authors should further emphasize the problems the heterologous feature types bring that SOTA tabular few-shot learning methods ignore, to demonstrate the necessity and innovation of the proposed method.
- The authors should point out the motivation for the leverage of diffusion models.
- The authors should explain the relationship between semi-supervised learning, self-supervised learning, and tabular few-shot learning, as well as their relevance to the D2R2 proposed in this paper.
- Tabular few-shot learning seems not to emphasize the variation of categories in the training (base ) and test (novel) datasets, does this indicate that it ignores the requirement for the generalization of the algorithm?
局限性
The authors point out the limitations of this work. No obvious potential negative societal impact remains.
Thanks for your comments!
Q1.
We plan to add more explanations about shortcomings of existing methods as follows.
Firstly, tabular data comprises heterologous features, which underscores the importance of simultaneously modeling continuous and categorical features. The current SOTA method STUNT[28] in our paper, and other existing few-shot learning methods designed for images or texts, treat all features as the same type, neglecting the unique information in different features. Failing to address heterogeneous features hinders the model's ability to capture complex data patterns and relationships. Moreover, the heterogeneous features make tabular data lacking in straightforward augmentation techniques, which are easily implemented in images, as demonstrated by UMTRA[21].
Secondly, unlike images and texts, tabular data lacks strong spatial and sequential relationships between features. While STUNT[28] generates pseudo-labels based on the assumption that substitutable features, such as "occupation" acting as a proxy for "income", may not universally apply to tabular data. Besides, the arbitrary permutation of columns in tabular data highlights the model's robustness, which is ignored by existing methods.
Thirdly, the influence of column permutation and multimodal behavior within same class are all ignored by existing methods.
Q2.
We will add more explanations about motivations as follows.
Existing methods failing to address issues in Q1. thus we propose a novel approach, D2R2, to tackle above challenges in tabular data. Considering the first and second challenges above, we avoid to rely on proxy methods or augmentation methods, which are limited by the lack of relationships or augmentation techniques in tabular data. Instead, we create an information bottleneck for extracting semantic knowledge, named D2R2, which leverages the strong expressiveness of the diffusion model and distance information from pairwise comparison. Notably, we handle numerical and categorical data types separately. For example, we introduce two distinct noises and generate two distinct random projections for different feature types. Adding small noise to the input also enhances the model's robustness.
For the third challenge in Q1, we propose an instance-wise iteration prototypes classifier, which can construct stable prototypes while revealing the multimodal behavior within same class.
Q3.
Based on the unique characteristics of tabular data, as detailed in Q4, we address scenarios where there is an unlabeled training set and a very limited labeled support set (K-shot) to help predict class labels for a testing query set. This follows the problem definition in [28] and is stated in Section 3 of our paper. Such scenarios are common in critical applications like credit risk assessment[1] and diagnosing patients with rare or new diseases[2]. Based on our research, three types of methods can address these scenarios: few-shot learning, semi-supervised learning, and self-supervised learning. Meta-learning is one of the most common techniques in few-shot learning. The SOTA method [28] for few-shot tabular learning in our paper follows this scheme. On the other hand, our method innovates within the prototype scheme by enhancing embedding generation and prototype classification. In semi-supervised learning, a model is trained on labeled data then to predict the unlabeled data, where predicted labels are treated as true labels in further training rounds. This method aligns with our problem definition; however, it requires more labeled data compared to few-shot learning. Self-supervised learning excels in generating robust representations. Since our approach involves generating representations, we compare our method with the SOTA self-supervised learning methods[54,4,47] in the tabular domain. The mentioned methods serve to position our work within the broader context of existing methodologies that address similar problems. We have already outlined the relevant methods in Section 2. We will include the above more detailed discussion in the updated version.
[1]Chen N, Ribeiro B, Chen A. Financial credit risk assessment: a recent review[J]. Artificial Intelligence Review, 2016, 45: 1-23. [2]Cereda D, Tirani M, Rovida F, et al. The early phase of the COVID-19 outbreak in Lombardy, Italy[J]. 2003.09320, 2020.
Q4.
Our problem setting follows the definition provided in [28], where there is no variation of categories between the training (base) and test (novel) datasets, but the training set is unlabeled, which is a more common scenario in tabular data. Tabular datasets consist of numerical or categorical inputs that have specific and explicit meanings, where classes are well-defined and consistent across the dataset. it is rare to encounter new categories based solely on existing features. New categories usually emerge only with the introduction of new features, which is different from domains like image or text data, which have spatial and sequential relationships. For example, new image categories can be effectively classified based on the learned spatial structure patterns between pixels. Our study focus on the most common scenario in tabular few-shot learning, as mentioned in Section 3 of our paper and supported by the problem setting in [28].
However, we also conducted experiments to test the generalization of our method regarding the variation of categories in the training (base) and test (novel) datasets. For the dataset for 10-class classification 'optdigit', we randomly removed one classes from the train set and validated the effectiveness of the test results. The experimental results are as follows, demonstrating the robustness and effectiveness of our method.
| Scenario | shot=1 | shot=5 |
|---|---|---|
| Remove Class #1 | 75.21 | 87.37 |
| Remove Class #2 | 72.15 | 84.63 |
| Remove Class #6 | 77.16 | 88.62 |
| Non-Remove | 81.13 | 90.73 |
Thanks for the response. The response addresses the majority of my concerns. Therefore, I update my score to "Weak Accept".
Thank you for your positive feedback and for recognizing the value of our work! We will certainly enhance the clarity of our manuscript based on your suggestions.
The paper proposed a new diffusion-based representation learning method for tabular data, namely D2R2, specifically for few-shot learning. It is the first paper to use the diffusion model for tabular data representation learning. The method trains a conditional diffusion model with a combined loss of vanilla diffusion reconstruction loss and a random distance match loss to learn an embedding function . encodes a representation of the data with different noise levels. The resulting embedding is used as the conditional information to the diffusion model. Further, random linear projections are performed on the original data, and the random distance match loss is to preserve distances in the embedding space and the random projection space. Two distinct projections are used for numerical and categorical data. After training, instance-wise iteration prototypes are generated for more accurate and stable few-shot learning. The paper conducts experiments on 7 datasets from OpenML-CC18, showing superior performances than a variety of baselines.
优点
- The paper introduced the first diffusion-based method for tabular few-shot representation learning.
- The proposed method has superior performance compared to SOTA methods.
- The paper is well-written and easy to follow.
缺点
- Even though D2R2 is the first method to utilize the diffusion model for representation learning, it is not the first paper to analyze/utilize the diffusion model for representation learning. The paper lacks discussion on related works in related areas such as [1] [2] [3].
- After training, the paper only extracts features of clean tabular data during few-shot learning, while the embedding function is learned for different noise levels. There is no analysis of the effect of different noise levels as input to the embedding function for few-shot learning.
- The method is limited to only tabular data, such as the specific projection head design for numerical and categorical data. [1] Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He. Deconstructing Denoising Diffusion Models for Self-Supervised Learning. 2024. [2] Sarthak Mittal, Korbinian Abstreiter, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou. Diffusion Based Representation Learning. 2023. [3] Xingyi Yang, Xinchao Wang. Diffusion Model as Representation Learner. 2023.
问题
- See weakness 1. What's the relationship and key novelty in the method compared to existing methods?
- Will other noise levels be more useful for few-shot learning? Or will a combination of different noise levels be better?
局限性
Limitation is discussed and the paper states to have discussed potential negative societal impact.
Thanks for your comments!
Q1. What's the relationship and key novelty in the method compared to existing methods? Existing papers about representation learnings are all designed for image data. Paper [1] introduces a "latent Denoising Autoencoder" (LDAE) architecture where the learned representations are used for Denoising Autoencoder. Unlike LDAE, we learn a latent z that supports image diffusion process, rather than Autoencoder reconstruction process, which is less suitable for tabular data as discussed in [4]. The success of the representations in paper [2] depends on the quality of the generated images, meaning accurate image generation is crucial for performance. Generating high-quality data is especially difficult for tabular data due to its varied features and the absence of strong spatial or sequential relationships [5]. Our approach, however, does not rely on data quality for results, making our model more effective with tabular data. Paper [3] extracts representations from noisy data at a specific time step t in the diffusion process, using a student model trained with task labels. Our method differs by deriving representations directly from the original feature space without requiring a student model, eliminating the need for labels during training. This is especially critical for tabular few-shot learning where labeled data is scarce.
Moreover, different from existing papers, our work effectively modify diffusion models according to tabular data characteristics, including heterogeneous features, lack of strong spatial and sequential relationships between features, the influence of column permutation, and multimodality within the same class. We design a conditional diffusion process with distance matching to extract representations from original space, which is not influenced by the new generated samples' quality. Besides, tabular columns can be arbitrarily permuted without altering the underlying information, thus models for tabular data need to be robust to such permutations. We perturbed original data with two distinct types of relatively small noises for numerical and categorical input respectively (Section 4.1), which enhanced the robustness of the performance. Moreover, deriving representations solely from the "diffusion-only" model is not effective enough, as shown in our manuscript's Ablation study (Table 2). Therefore, we align the comparison of pairwise distances between the embedding space and a randomly projected space with the diffusion training process, using different projection matrices for various feature types. Additionally, we propose instance-wise iterative prototype to address multimodality for tabular data.
[4]Nam J, Tack J, Lee K, et al. Stunt: Few-shot tabular learning with self-generated tasks from unlabeled tables[J]. arXiv preprint arXiv:2303.00918, 2023.
[5]Xu L, Skoularidou M, Cuesta-Infante A, et al. Modeling tabular data using conditional gan[J]. Advances in neural information processing systems, 2019, 32.
Q2. Will other noise levels be more useful for few-shot learning? Or will a combination of different noise levels be better?
The timestep t = 1, ..., T indicates noise levels in diffusion models. In our experimental analysis, we have noted a trend where performance sees an improvement with an increase in t during the initial stages. However, this trend of performance enhancement plateaus after the first few timesteps, which tends to stabilize with minimal fluctuations, as shown in the following table. We speculate that this is because, as t approaches 0, the noise levels decrease, potentially causing only the fine-grained details to be lost. Hence, the representation learns to keep the information needed to recover these fine-grained details. On the contrary, as t approaches T, noise levels rise, and the mutual information between xt and x0 begins to decrease. In these situations, effective denoising requires that all information about x0 be thoroughly encoded, which contains the most information about the data class. Thus, as stated in line 200 of our manuscript, "We focus on larger timesteps thus extract the low-frequency semantic information rather than details." In our experiments, we selected the last step (T) for all datasets to ensure stable performance. We will include a discussion on this observation in the updated version of our manuscript.
| Dataset | Shot | First Step | Middle Step | 80th Percentile Step | Last Step | Average of All Steps |
|---|---|---|---|---|---|---|
| optgidits | 1 | 72.39 | 79.21 | 80.93 | 81.13 | 80.66 |
| 5 | 84.57 | 88.67 | 91.73 | 90.73 | 89.66 | |
| cmc | 1 | 37.19 | 42.26 | 42.86 | 42.88 | 42.74 |
| 5 | 36.28 | 42.73 | 43.04 | 43.39 | 42.37 |
In the updated version, we will include the above discussion following your valuable comments.
Thanks for the response. I've read through other reviewers' feedback and responses as well. I agree with some reviewers' opinions that there lack of method motivation, experiments on scalability, and some technical details, but I am overall satisfied with the authors' responses to address the existing problems and will keep the score as it is.
Thank you for your positive feedback and for recognizing the value of our work! We appreciate your acknowledgment of our responses in addressing the existing problems.
The paper introduces a novel method, Diffusion-based Representation and Random Distance Matching (D2R2), to address the challenge of few-shot learning on tabular data. This approach leverages the robust representational capacity of diffusion models to extract essential semantic knowledge from tabular data, thereby enhancing the performance of downstream classification tasks. Additionally, the paper presents the Random Distance Matching (RDM) loss function, which preserves distance information within embedding vectors during diffusion model training, further boosting classification performance. During the classification phase, the authors propose an instance-based iterative prototype scheme to accommodate the multimodal characteristics of the embedding vectors, thus improving clustering robustness. Extensive experiments conducted on various few-shot learning benchmarks for tabular data demonstrate that D2R2 surpasses current benchmark methods. In conclusion, this paper introduces an innovative few-shot learning method for tabular data, achieving significant advancements in extracting effective semantic representations and improving classification performance.
优点
- The use of diffusion models to extract effective semantic representations from tabular data is a novel approach that overcomes the limitations of traditional supervised learning methods.
- The introduction of the Random Distance Matching (RDM) loss function, which preserves distance information within the embedding vectors during diffusion model training, further enhances classification performance.
- Extensive experiments conducted on multiple few-shot learning benchmarks demonstrate that D2R2 outperforms the latest benchmark methods, validating the effectiveness of the proposed method.
- The paper provides comprehensive explanations and descriptions of the various components of the method, including the expressive power of the diffusion model, the role of the RDM loss function, and the principles of the iterative prototype scheme, making the methodology and process clear and understandable.
缺点
- The paper lacks a clear description of the input format for tabular data. Specifically, it is unclear whether the tabular data is input in text form or as images. Clarification on this point is essential, as the input format may significantly impact performance.
- The paper utilizes a diffusion model to enhance the semantic representational capacity of tabular data. However, it remains uncertain whether the diffusion model is effective across all types of tabular data. Detailed demonstrations and explanations of how the model captures and enhances these semantic features are needed to substantiate its broad applicability.
- The authors introduce a diffusion model that differs from existing methods. It is crucial to provide additional information regarding the model's parameter count and FLOPs (floating point operations per second) to allow for a thorough comparison of its computational efficiency with that of other models. This would help in understanding the trade-offs between performance gains and computational costs.
问题
Please see the Weakness section.
局限性
The author has already provided a description of the limitations, but it needs to be further elaborated.
Thanks for your comments!
Q1. The paper lacks a clear description of the input format for tabular data.
In our study, "tabular data" refers to the dataset organized in tables, which is a structured format that presents information in rows and columns. It is defined as consisting of instances (rows) and dimensional features (columns). Each data instance may or may not hold strong relationship among features. The feature types of tabular data can be a mixtures of numerical numbers or categorical indicators. For example, the "income" dataset classifies individuals' incomes based on features like "age" (numerical),"education" (categorical) and so on. Table 3 in Appendix provides a detailed information on each dataset.
Tabular data is widely used in practical applications, and few-shot learning is particularly important in the context of tabular data, as limited labeled tabular data is inherently common in many real-world applications, such as financial fraud detection [1], disease diagnosis [2], and social science [3]. Different from CV and NLP, tabular data contains a mixture of numerical and categorical features; tabular data lacks spatial and sequential relationships between columns, making it more challenging to extract semantic knowledge; tabular column order can be arbitrary permuted without affecting the tabular information; tabular embeddings within the same class may exhibit multiple modes, as demonstrated in Figure 3 of our manuscript. The significance and challenges of tabular few-shot learning motivate us to conduct this work.
Q2. Whether the diffusion model is effective across all types of tabular data?
From the experimental analysis, our datasets (seven in the paper and two additional datasets in the response to Review wij1's Q1) involve various types of tabular data. These include varying sample sizes (from 768 to 48842), feature dimensions (from 8 to 24482), feature types (numerical, categorical or the mixture of both), and number of classes (from 2 to 10). Our experimental results demonstrate the effectiveness of our proposed method and its scalability across various types of those tabular data.
From a theoretical analysis perspective, reasons that the designed diffusion model can extract semantic knowledge come from twofold. Firstly, the diffusion model with powerful expressiveness encodes the information needed for denoising. Specifically, in conditional diffusion models, the noise reconstruction loss trains the noise prediction function to predict the true noise given the noisy sample , the knowing and the condition information . If , we could expect can almost perfectly recover . By replacing the conditional information by a function that maps to a embedding space with lower dimension than , we introduce an information bottleneck to the noise reconstruction process. This forces to extract effective information for denoising from , leading to representation learning through the noise reconstruction loss.
Q3. what is model's computational costs?
The model complexity is manageable through our framework's utilization of two neural networks: the embedding network and the noise prediction network, both of which are demonstrated as 3-layer MLPs in our paper. These settings (hidden dimensions, embedding dimensions and model structure) can be adjusted to meet different needs. In terms of computational efficiency, The main time-consuming factor after the embedding model is the calculation of the instance-wise iterative prototype. For an N-way K-shot problem with L iterations and a query set of size Q, the computation complexity is O(NKLQ). In few-shot learning, where N, K, L are small, the computation complexity is linear in the size of the query set. In the updated version, we will include the above discussion following your valuable comments.
[1]West J, Bhattacharya M. Intelligent financial fraud detection: a comprehensive review[J]. Computers & security, 2016, 57: 47-66.
[2]Kim S J, Choi S J, Jang J S, et al. Innovative nanosensor for disease diagnosis[J]. Accounts of Chemical Research, 2017, 50(7): 1587-1596.
[3]Hicks D. The four literatures of social science[J]. Handbook of quantitative science and technology research: The use of publication and patent statistics in studies of S&T systems, 2004: 473-496.
Dear authors, I greatly appreciate your responses and the additional results presented in the PDF. I think the authors addressed all my comments and I think this work should be accepted.
We sincerely appreciate your positive feedback and are pleased that our responses addressed your comments. Thank you for recommending our work for acceptance!
The paper proposed a few-shot learning framework for tabular data by designing a diffusion based semantic knowledge encoder and introducing a random distance matching mechanism to preserve distance information in the embeddings. During classification, an instance-wise iterative prototype scheme is utilized to improve performance by accommodating the multimodality of embeddings and increasing clustering robustness. Extensive experiments on multiple datasets show that D2R2 achieves state-of-the-art performance compared to other schemes.
优点
- The proposed approach that combines the diffusion model with random distance matching for tabular few-shot learning is relatively unexplored compared to classical few-shot learning tasks.
- The paper is well-written and organized, with detailed explanations of the challenges, methodology, and experimental results.
缺点
- The feature dimensions of the benchmark datasets are relatively low, which leaves the the scalability to large datasets or those with extremely high dimensionality not fully explored.
- The pseudo-label validation scheme for hyperparameter selection relies on the assumption that the clustering of raw features can provide a reliable proxy for true labels, which might not always hold in practice.
问题
- How does the proposed method scale with larger datasets or those with higher feature dimensions? Are there any practical limitations or considerations for scaling?
- Could the authors provide more empirical evidence or analysis on the pseudo-label validation scheme, such as when the clustering might not accurately reflect the true class structure?
局限性
No potential negative societal impact is observed.
Thanks for your comments!
Q1. Does the proposed method scale with larger datasets or those with higher feature dimensions?
In our experiments, we utilized seven datasets that are widely used in tabular data research, as referenced in popular papers [28], [2], and [54] in our paper. Among these, the "income" dataset has the most samples (48842), and the "pixel" dataset has the highest feature dimension (240).
To further investigate the scalability of our method, we added two new datasets to our experiments. The "nomao" dataset has a larger size of 34465 samples, and the "breast" dataset has a significantly higher feature dimension of 24482. The following table shows their results. Our method consistently outperforms other baselines. These results will be included in the updated version of our paper, demonstrating that our method scales well to larger and high dimensional datasets.
| Dataset | Shot | CatBoost | KNN | SubTab | VIME | SCARF | RTDL | STUNT | D2R2 |
|---|---|---|---|---|---|---|---|---|---|
| nomao | 1-shot | 0.636 | 0.635 | 0.676 | 0.647 | 0.689 | 0.683 | 0.715 | 0.794 |
| 5-shot | 0.753 | 0.737 | 0.761 | 0.749 | 0.776 | 0.736 | 0.814 | 0.826 | |
| brest | 1-shot | 0.697 | 0.718 | 0.729 | 0.701 | 0.753 | 0.763 | 0.769 | 0.776 |
| 5-shot | 0.770 | 0.794 | 0.827 | 0.858 | 0.844 | 0.796 | 0.868 | 0.882 |
Q2. Could the authors provide more empirical evidence or analysis on the pseudo-label validation scheme?
In the few-shot learning scenario, particularly in 1-shot classification, there are no labeled samples available for validation, which leads most researchers to rely on fixed parameters. However, we argue that a better set of hyper-parameters can always be identified for a specific dataset. To this end, we generate pseudo-labels for a validation set using the fundamental principles of the soft k-means algorithm. Firstly, achieving complete consistency between the validation labels and the true labels is indeed challenging. Nonetheless, existing studies have demonstrated satisfactory performance despite minor discrepancies between the validation and test sets [1]. It is important to note that we have consciously avoided over-tuning the hyper-parameters on the validation set, limiting our adjustments to only three parameters, as detailed in the appendix. Secondly, we posit that raw features possess inherent clustering properties, and the soft k-means method has been proven effective and widely adopted in various contexts, as supported by references [3][4]. Our empirical evidence further substantiates this claim.
In our experiments, the accuracy of pseudo-labels was evaluated on several datasets, all showing relative high accuracy. This leads us to believe that pseudo-labels serve as a relative reliable proxy for true labels during validation. For the dataset "income" , where accuracy was comparatively lower compared to the classes number, we conducted experiments without using a validation set but with fixed hyper-parameters. The results indicated that selecting parameters using a pseudo-label validation set achieved better performance than using fixed hyper-parameters. The robustness of our method, demonstrated across multiple datasets, enables our approach to be applicable to a wide variety of tabular datasets.
Table 1: Accuracy of pseudo-labels
| Dataset | DNA | Income | Karkunen | Optdigits | Nomao | Breast |
|---|---|---|---|---|---|---|
| Accuracy | 0.37 | 0.52 | 0.53 | 0.47 | 0.59 | 0.63 |
Table 2: Fixed hyper-parameters VS Tuned hyper-parameters
| Accuracy | ||||
|---|---|---|---|---|
| Fix | 0.1 | 10 | 0.1 | 0.742 |
| Fix | 0.1 | 80 | 0.1 | 0.755 |
| Fix | 0.5 | 10 | 0.1 | 0.739 |
| Fix | 0.5 | 80 | 0.5 | 0.721 |
| Fix | 0.9 | 10 | 0.1 | 0.732 |
| Fix | 0.9 | 80 | 0.9 | 0.743 |
| Tune | 0.775 | 10 | 0.325 | 0.758 |
[1]Peng X, Usman B, Kaushik N, et al. Visda: The visual domain adaptation challenge[J]. arXiv preprint arXiv:1710.06924, 2017. [3]Ferraro M B, Giordani P. Soft clustering[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2020, 12(1): e1480. [4]Singh V K, Tiwari N, Garg S. Document clustering using k-means, heuristic k-means and fuzzy c-means[C]//2011 International conference on computational intelligence and communication networks. IEEE, 2011: 297-301.
Thanks for the authors' response. The authors did additional experiments to answer my questions and they partially address my question related to the dimension of studied data. I have updated my scores.
Thank you for your positive feedback and for recognizing the value of our work! We will incorporate the new experimental results into our revised version.
This paper presents a diffusion-based representation learning method for tabular few-shot learning. It receives four weak accept after the response. It merits, including the interesting idea, good writing, and sufficient experiments, are well recognized by the reviewers. For the concerns about the motivation and unclear technique details, the authors well address them in the response. I agree with the reviewers and think this work should be accepted. Please also incorporate these details in the revised manuscript.