Taiji Suzuki
~Taiji_Suzuki1
41
论文总数
20.5
年均投稿
平均评分
接收情况35/41
会议分布
ICLR
19
NeurIPS
13
ICML
8
COLM
1
发表论文 (41 篇)
202526 篇
4
State Size Independent Statistical Error Bound for Discrete Diffusion Models
NeurIPS 2025Poster
4
Flow matching achieves almost minimax optimal convergence
ICLR 2025Poster
4
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
ICML 2025Poster
3
Transformers Provably Solve Parity Efficiently with Chain of Thought
ICLR 2025Oral
3
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
ICLR 2025Spotlight
4
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
ICLR 2025Poster
4
Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
NeurIPS 2025Poster
4
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
NeurIPS 2025Poster
3
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
ICML 2025Poster
3
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
ICLR 2025Poster
4
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
NeurIPS 2025Poster
5
Direct Distributional Optimization for Provable Alignment of Diffusion Models
ICLR 2025Poster
4
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
ICML 2025Poster
5
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
NeurIPS 2025Spotlight
3
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
ICLR 2025Spotlight
4
Nonlinear transformers can perform inference-time feature learning
ICML 2025Poster
4
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
NeurIPS 2025Poster
4
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
ICML 2025Poster
5
Quantifying Memory Utilization with Effective State-Size
ICLR 2025Rejected
4
Quantifying Memory Utilization with Effective State-Size
ICML 2025Poster
3
Provable In-Context Vector Arithmetic via Retrieving Task Concepts
ICML 2025Poster
5
Label Noise Gradient Descent Improves Generalization in the Low SNR Regime
ICLR 2025Rejected
5
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
NeurIPS 2025Poster
3
The Role of Label Noise in the Feature Learning Process
ICLR 2025Rejected
3
On the Role of Label Noise in the Feature Learning Process
ICML 2025Poster
4
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
COLM 2025Poster
202415 篇
3
Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods
ICLR 2024Poster
4
Tensor methods to learn the Green's function to solve high-dimensional PDE
ICLR 2024Rejected
4
Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime
ICLR 2024Poster
4
Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context
NeurIPS 2024Poster
4
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
NeurIPS 2024Poster
3
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data
ICLR 2024Poster
4
Transformers are Minimax Optimal Nonparametric In-Context Learners
NeurIPS 2024Poster
4
Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory
ICLR 2024Poster
5
Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning
ICLR 2024Rejected
5
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
NeurIPS 2024Poster
4
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning
NeurIPS 2024Poster
5
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective
ICLR 2024Rejected
5
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
ICLR 2024Spotlight
3
Koopman-based generalization bound: New aspect for full-rank weights
ICLR 2024Poster
3
On the Comparison between Multi-modal and Single-modal Contrastive Learning
NeurIPS 2024Poster