4.0

/10

withdrawn4 位审稿人

最低3最高5标准差1.0

3.5

置信度

正确性2.3

贡献度2.0

表达2.3

ICLR 2025

LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks

Sungwon Han,Seungeon Lee,Meeyoung Cha,Sercan O Arik,Jinsung Yoon

OpenReview PDF

提交: 2024-09-26更新: 2024-11-25

摘要

关键词

Self-supervised learningRepresentation learningTabular dataLarge language model

评审与讨论

审稿意见

评分: 5置信度: 22024-11-03

This paper introduces LLMs into self-supervised learning for tabular data. The authors propose the TST-LLM framework, which uses LLMs to generate task-specific pre-training tasks based on natural language descriptions of downstream tasks. The method consists of two main steps: first using LLMs to discover relevant features based on task descriptions, then using these features as supervision signals to train representations. Experiments on 22 benchmark datasets show significant improvements over existing methods.

优点

The method is well-designed, including multiple components like feature discovery, feature selection, and multi-task learning, forming an end-to-end solution
Thorough experimental validation with comprehensive comparative experiments and ablation studies across multiple datasets

缺点

Limited novelty. The core idea is essentially using LLMs for feature engineering with prompt design, then using these features as self-supervision signals. Using LLMs as feature generators has been explored in many other fields - this paper simply applies it to tabular data.
At the methodology level, the framework is quite straightforward and lacks deep technical innovation. The feature selection and multi-task learning components use basic methods without proposing new improvements.
Lacks theoretical foundation. There's no analysis of why LLM-generated features help improve performance; no guarantees on feature quality; the whole method feels more like an empirical attempt.

问题

Please refer the weakness part.

审稿意见

评分: 5置信度: 42024-11-04

This work focuses on learning semantic representations for tabular data, specifically tailored for downstream tasks. It aims to narrow the gap between representation learning objectives and downstream task objectives. Specifically, the paper proposes a two-stage procedure: (1) applying existing models (GPT-3.5) to extract new task-related features, and (2) using pre-text training to predict the values of these new features as the target. In contrast to existing methods for learning semantic representations over tables, this work aims to solve the mismatch between pretext tasks and downstream applications. The experimental results demonstrate the superiority of the proposed method.

优点

The idea of using existing models to extract new features is interesting. It can be used as feature augmentation for tabular data.
The proposed method shows superior performance to baseline methods employed in this work.

缺点

The proposed training method tailored for specific downstream task limits its application into broader spectrum.
Lack of comparision against existing pretraining methods on tables (e.g. TUTA, TabLLM, XTab, TabPFN). As the model is trained with the specific objectives tailored for downstream task, the learned representation in itself demonstrates better performance against other representation learning methods. Adding comparision against other existing pretraining methods will provide a thorough examination.
The small-scale training restricts the broader applicability of the proposed method. The training datasets are limited in comparison to existing self-supervised training approaches.

问题

Whether or not the author(s) will provide a comparative analysis against existing self-supervised training methods like mask-and-predict approaches?
Does the author(s) plan to scale up the model? In the paper, the trained encoder is a two-layer MLP.

审稿意见

评分: 3置信度: 42024-11-06

The paper introduces TST-LLM, a framework that improves self-supervised learning by aligning pre-text tasks with downstream tasks. It uses task descriptions and data meta-information to discover relevant features, treating them as ground-truth labels. TST-LLM outperforms methods like STUNT and LFR on 22 benchmark datasets, achieving win ratios of 95% and 81%.

优点

The method is well-explained and easy to understand. The detailed description of how features are discovered and integrated into the learning process is commendable.
TST-LLM shows superior performance compared to the baselines. The model utilizes features from the LLM and achieves high performance across various datasets.

缺点

While the paper compares TST-LLM to several self-supervised learning methods, it overlooks some highly competitive supervised learning approaches (e.g., XGBoost [1], CatBoost [2], TabR [3], FT-Transformer [4], ModernNCA [5]), as well as general learning methods like TabPFN [6], Tp-Berta [7], and XTFormer [8]. These methods, along with LLM-based feature engineering techniques (CAAFE [9], OCTree [10], FeatLLM [11]), are already highly effective in tabular data tasks. A comparison to these methods would be necessary to justify the need for self-supervised learning with TST-LLM.
TST-LLM seems to mainly rely on features generated by LLM-based methods like CAAFE [9], OCTree [10], and FeatLLM[11] and applies self-supervised learning on them. This approach does not offer significant novelty in comparison to these existing methods, as it mainly repurposes the generated features for a different training objective.
The ablation study does not fully explore the potential of automatic feature engineering methods like OPENFE, which can generate a large number of features within the same time as LLM-based methods. A more fair comparison would involve using TST-LLM's feature selection approach on the most informative features generated by OPENFE, rather than randomly selecting features.
The method appears to be more effective on datasets with clear semantic relationships that can be easily articulated by LLMs. This limits the applicability of TST-LLM to domains where such semantic relationships are less obvious or harder to define, restricting its generalizability compared to other tabular data methods.

[1] Xgboost: A scalable tree boosting system

[2] CatBoost: unbiased boosting with categorical features

[3] TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023

[4] Revisiting Deep Learning Models for Tabular Data

[5] Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later

[6] TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

[7] Making Pre-trained Language Models Great on Tabular Prediction

[8] Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data

[9] Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

[10] Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

[11] Optimized feature generation for tabular data via llms with decision tree reasoning

问题

See weaknesses.

审稿意见

评分: 3置信度: 42024-11-10

The paper presents TST-LLM, a novel framework for self-supervised representation learning tailored for tabular data. By defining pre-text tasks in a task-specific manner using natural language descriptions, TST-LLM effectively identifies and combines relevant features, leading to enhanced performance across various downstream tasks. The framework was evaluated on 22 diverse datasets, demonstrating its ability to outperform existing methods in both classification and regression tasks.

优点

Task-Specific Approach: TST-LLM utilizes natural language descriptions to create pre-text tasks that align closely with downstream objectives, improving relevance and performance.
Diverse Dataset Evaluation: The framework was tested on a wide range of datasets, ensuring robustness and generalizability across different types of tabular data.

缺点

Lack of Novelty: The idea of tailoring pre-text tasks to specific downstream objectives has been previously explored in various frameworks, making the contributions appear incremental rather than groundbreaking. Additionally, the reliance on existing techniques such as supervised contrastive learning further diminish the uniqueness of the approach, as many prior works have already demonstrated similar strategies. Overall, the integration of known methods does not provide a significant advancement in the field, leading to questions about the paper's originality.
Complexity of Implementation: The task-specific design may require more intricate setup and understanding compared to simpler, task-agnostic approaches.

问题

What does it mean to treat the discovered features as the gound-truth labels?
Have you tried any other open-source LLMs (e.g., llama, mistral) as the backbone models? What is the performance?

撤稿通知

2024-11-25

Thank you for the comment. We will further improve the paper based on the feedback.