影响力指数

70.25/100

前 2.6%

全站排名 #1,677

发表论文14 篇

平均评分5.7

年均产出4.7 篇/年

zhou Xun

Researcher@ByteDance Inc.·中国·OpenReview

研究方向

bytedance

6.5

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

ICLR 2026Poster

5.0

Scaling Law for Quantization-Aware Training

ICLR 2026Rejected

4.4

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

ICLR 2026Rejected

4.0

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

NeurIPS 2025Poster

7.2

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

ICML 2025Poster

通讯

7.0

Model Merging in Pre-training of Large Language Models

NeurIPS 2025Poster

6.4

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

NeurIPS 2025Poster

6.0

Ultra-Sparse Memory Network

ICLR 2025Poster

通讯

4.8

Investigating the Overlooked Hessian Structure: From CNNs to LLMs

合作者 (20)

zhou Xun

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Scaling Law for Quantization-Aware Training

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

Conda: Column-Normalized Adam for Training Large Language Models Faster

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Model Merging in Pre-training of Large Language Models

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Ultra-Sparse Memory Network

Investigating the Overlooked Hessian Structure: From CNNs to LLMs