Taiji Suzuki

~Taiji_Suzuki1

41

论文总数

20.5

年均投稿

6.2

平均评分

接收情况35/41

会议分布

ICLR

19

NeurIPS

13

ICML

8

COLM

1

发表论文 (41 篇)

202526 篇

State Size Independent Statistical Error Bound for Discrete Diffusion Models

NeurIPS 2025Poster

Flow matching achieves almost minimax optimal convergence

ICLR 2025Poster

Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models

ICML 2025Poster

Transformers Provably Solve Parity Efficiently with Chain of Thought

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

ICLR 2025Spotlight

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

ICLR 2025Poster

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

NeurIPS 2025Poster

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

NeurIPS 2025Poster

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

ICML 2025Poster

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

ICLR 2025Poster

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression

NeurIPS 2025Poster

Direct Distributional Optimization for Provable Alignment of Diffusion Models

ICLR 2025Poster

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

ICML 2025Poster

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

NeurIPS 2025Spotlight

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

ICLR 2025Spotlight

Nonlinear transformers can perform inference-time feature learning

ICML 2025Poster

Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

NeurIPS 2025Poster

Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning

ICML 2025Poster

Quantifying Memory Utilization with Effective State-Size

ICLR 2025Rejected

Quantifying Memory Utilization with Effective State-Size

ICML 2025Poster

Provable In-Context Vector Arithmetic via Retrieving Task Concepts

ICML 2025Poster

Label Noise Gradient Descent Improves Generalization in the Low SNR Regime

ICLR 2025Rejected

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

NeurIPS 2025Poster

The Role of Label Noise in the Feature Learning Process

ICLR 2025Rejected

On the Role of Label Noise in the Feature Learning Process

ICML 2025Poster

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

COLM 2025Poster

202415 篇

Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods

ICLR 2024Poster

Tensor methods to learn the Green's function to solve high-dimensional PDE

ICLR 2024Rejected

Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime

ICLR 2024Poster

Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context

NeurIPS 2024Poster

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

NeurIPS 2024Poster

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

ICLR 2024Poster

Transformers are Minimax Optimal Nonparametric In-Context Learners

NeurIPS 2024Poster

Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory

ICLR 2024Poster

Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning

ICLR 2024Rejected

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

NeurIPS 2024Poster

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

NeurIPS 2024Poster

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

ICLR 2024Rejected

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

ICLR 2024Spotlight

Koopman-based generalization bound: New aspect for full-rank weights

ICLR 2024Poster

On the Comparison between Multi-modal and Single-modal Contrastive Learning

NeurIPS 2024Poster

合作者 (20)

Wei Huang13 篇

Kazusato Oko9 篇

Naoki Nishikawa6 篇

Atsushi Nitanda6 篇

Yujin Song5 篇