Beidi Chen

~Beidi_Chen1

30

论文总数

15.0

年均投稿

6.1

平均评分

接收情况25/30

会议分布

NeurIPS

13

ICLR

12

ICML

3

COLM

2

发表论文 (30 篇)

202514 篇

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

ICML 2025Poster

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

ICLR 2025Poster

Kinetics: Rethinking Test-Time Scaling Law

NeurIPS 2025Poster

FACTOR: Factoring Complexity and Context Length in Long-Context Model Evaluation

ICLR 2025Rejected

Memory Mosaics

ICLR 2025Poster

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

NeurIPS 2025Spotlight

GSM-$\infty$: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?

ICML 2025Poster

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

NeurIPS 2025Poster

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

ICLR 2025Rejected

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ICLR 2025Rejected

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ICML 2025Spotlight

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

ICLR 2025Poster

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

ICLR 2025Poster

MagicPIG: LSH Sampling for Efficient LLM Generation

ICLR 2025Spotlight

202416 篇

On the Similarity between Attention and SVM on the Token Separation and Selection Behavior

ICLR 2024withdrawn

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation

COLM 2024Poster

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

ICLR 2024Rejected

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

NeurIPS 2024Poster

Efficient Streaming Language Models with Attention Sinks

ICLR 2024Poster

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training

NeurIPS 2024Poster

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices

NeurIPS 2024Poster

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

NeurIPS 2024Poster

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

NeurIPS 2024Poster

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

ICLR 2024Poster

Learn To be Efficient: Build Structured Sparsity in Large Language Models

NeurIPS 2024Spotlight

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs

NeurIPS 2024Poster

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

COLM 2024Poster

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

NeurIPS 2024Poster

Sequoia: Scalable and Robust Speculative Decoding

NeurIPS 2024Spotlight

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

NeurIPS 2024Poster

合作者 (20)

Zhuoming Chen10 篇

Yuandong Tian6 篇

Xinyu Yang6 篇

Ranajoy Sadhukhan4 篇

Jiawei Zhao3 篇

Harry Dong3 篇

Yuejie Chi3 篇