PaperHub

Taiji Suzuki

~Taiji_Suzuki1

41
论文总数
20.5
年均投稿
6.2
平均评分
接收情况35/41
会议分布
ICLR
19
NeurIPS
13
ICML
8
COLM
1

发表论文 (41 篇)

202526

7.3
4

State Size Independent Statistical Error Bound for Discrete Diffusion Models

NeurIPS 2025Poster
6.0
4

Flow matching achieves almost minimax optimal convergence

ICLR 2025Poster
4.9
4

Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models

ICML 2025Poster
8.7
3

Transformers Provably Solve Parity Efficiently with Chain of Thought

ICLR 2025Oral
7.3
3

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

ICLR 2025Spotlight
5.8
4

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

ICLR 2025Poster
8.2
4

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

NeurIPS 2025Poster
6.8
4

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

NeurIPS 2025Poster
7.0
3

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

ICML 2025Poster
6.7
3

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

ICLR 2025Poster
6.8
4

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression

NeurIPS 2025Poster
7.0
5

Direct Distributional Optimization for Provable Alignment of Diffusion Models

ICLR 2025Poster
5.5
4

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

ICML 2025Poster
6.4
5

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

NeurIPS 2025Spotlight
7.3
3

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

ICLR 2025Spotlight
6.6
4

Nonlinear transformers can perform inference-time feature learning

ICML 2025Poster
6.8
4

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

NeurIPS 2025Poster
5.5
4

Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning

ICML 2025Poster
5.6
5

Quantifying Memory Utilization with Effective State-Size

ICLR 2025Rejected
4.9
4

Quantifying Memory Utilization with Effective State-Size

ICML 2025Poster
6.3
3

Provable In-Context Vector Arithmetic via Retrieving Task Concepts

ICML 2025Poster
5.6
5

Label Noise Gradient Descent Improves Generalization in the Low SNR Regime

ICLR 2025Rejected
6.8
5

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

NeurIPS 2025Poster
5.3
3

The Role of Label Noise in the Feature Learning Process

ICLR 2025Rejected
6.3
3

On the Role of Label Noise in the Feature Learning Process

ICML 2025Poster
6.3
4

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

COLM 2025Poster

202415

7.3
3

Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods

ICLR 2024Poster
3.5
4

Tensor methods to learn the Green's function to solve high-dimensional PDE

ICLR 2024Rejected
6.3
4

Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime

ICLR 2024Poster
5.3
4

Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context

NeurIPS 2024Poster
6.5
4

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

NeurIPS 2024Poster
6.7
3

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

ICLR 2024Poster
5.8
4

Transformers are Minimax Optimal Nonparametric In-Context Learners

NeurIPS 2024Poster
5.8
4

Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory

ICLR 2024Poster
5.2
5

Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning

ICLR 2024Rejected
5.6
5

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

NeurIPS 2024Poster
6.3
4

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

NeurIPS 2024Poster
5.0
5

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

ICLR 2024Rejected
6.8
5

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

ICLR 2024Spotlight
6.3
3

Koopman-based generalization bound: New aspect for full-rank weights

ICLR 2024Poster
5.7
3

On the Comparison between Multi-modal and Single-modal Contrastive Learning

NeurIPS 2024Poster