影响力指数

73.06/100

前 2.2%

全站排名 #1,389

发表论文19 篇

平均评分5.6

年均产出6.3 篇/年

Zhihang Yuan

Researcher@ByteDance Inc.·中国·OpenReview

研究方向

Hardware and software co-optimization · Neural Network Quantization · Acceleration of Deep Learning · Efficient AI Algorithm

5.0

PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling

ICLR 2026Withdrawn

三作

4.0

Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs

ICLR 2026Rejected

三作

4.0

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

NeurIPS 2025Poster

6.8

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

NeurIPS 2025Poster

6.3

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

ICLR 2025Rejected

一作

6.3

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

ICLR 2025Poster

6.2

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

ICLR 2025Poster

6.1

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

ICML 2025Poster

5.8

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

ICLR 2025Rejected

通讯

5.5

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

ICML 2025Poster

5.0

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

ICLR 2025Rejected

4.9

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

ICML 2025Poster

三作

4.0

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

合作者 (20)

Zhihang Yuan

PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling

Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs

SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling

BMAttn: Block-Aligned Mixed-Precision Attention Quantization for LLM Inference

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training