Mingze Wang
~Mingze_Wang2
9
论文总数
4.5
年均投稿
平均评分
接收情况6/9
会议分布
NeurIPS
4
ICLR
4
ICML
1
发表论文 (9 篇)
20254 篇
5
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
NeurIPS 2025Spotlight
5
How Transformers Implement Induction Heads: Approximation and Optimization Analysis
ICLR 2025Rejected
5
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
ICLR 2025Spotlight
4
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
ICML 2025Poster
20245 篇
3
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
ICLR 2024withdrawn
4
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
ICLR 2024withdrawn
3
Improving Generalization and Convergence by Enhancing Implicit Regularization
NeurIPS 2024Poster
3
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
NeurIPS 2024Poster
3
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
NeurIPS 2024Poster