Xianzhi Yu
~Xianzhi_Yu1
8
论文总数
8.0
年均投稿
平均评分
接收情况6/8
会议分布
NeurIPS
4
ICLR
2
COLM
1
ICML
1
发表论文 (8 篇)
20258 篇
3
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference
ICLR 2025Rejected
4
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
NeurIPS 2025Poster
4
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
COLM 2025Poster
4
A Simple Linear Patch Revives Layer-Pruned Large Language Models
NeurIPS 2025Poster
5
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
NeurIPS 2025Spotlight
4
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
NeurIPS 2025Poster
4
FlatQuant: Flatness Matters for LLM Quantization
ICML 2025Poster
5
FlatQuant: Flatness Matters for LLM Quantization
ICLR 2025Rejected