Beidi Chen
~Beidi_Chen1
30
论文总数
15.0
年均投稿
平均评分
接收情况25/30
会议分布
NeurIPS
13
ICLR
12
ICML
3
COLM
2
发表论文 (30 篇)
202514 篇
3
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation
ICML 2025Poster
5
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
ICLR 2025Poster
4
Kinetics: Rethinking Test-Time Scaling Law
NeurIPS 2025Poster
4
FACTOR: Factoring Complexity and Context Length in Long-Context Model Evaluation
ICLR 2025Rejected
4
Memory Mosaics
ICLR 2025Poster
4
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
NeurIPS 2025Spotlight
5
GSM-$\infty$: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?
ICML 2025Poster
4
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
NeurIPS 2025Poster
3
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
ICLR 2025Rejected
4
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ICLR 2025Rejected
4
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ICML 2025Spotlight
4
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
ICLR 2025Poster
5
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
ICLR 2025Poster
5
MagicPIG: LSH Sampling for Efficient LLM Generation
ICLR 2025Spotlight
202416 篇
-
On the Similarity between Attention and SVM on the Token Separation and Selection Behavior
ICLR 2024withdrawn
4
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
COLM 2024Poster
4
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
ICLR 2024Rejected
4
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
NeurIPS 2024Poster
4
Efficient Streaming Language Models with Attention Sinks
ICLR 2024Poster
3
Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
NeurIPS 2024Poster
4
SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices
NeurIPS 2024Poster
4
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
NeurIPS 2024Poster
4
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
NeurIPS 2024Poster
4
JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention
ICLR 2024Poster
4
Learn To be Efficient: Build Structured Sparsity in Large Language Models
NeurIPS 2024Spotlight
4
SIRIUS : Contexual Sparisty with Correction for Efficient LLMs
NeurIPS 2024Poster
3
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
COLM 2024Poster
3
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
NeurIPS 2024Poster
4
Sequoia: Scalable and Robust Speculative Decoding
NeurIPS 2024Spotlight
4
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
NeurIPS 2024Poster