Martin Jaggi
~Martin_Jaggi1
22
论文总数
11.0
年均投稿
平均评分
接收情况18/22
会议分布
ICLR
9
NeurIPS
8
COLM
4
ICML
1
发表论文 (22 篇)
202513 篇
8
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
ICLR 2025Rejected
4
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
ICLR 2025Poster
4
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
ICLR 2025Rejected
5
GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
NeurIPS 2025Poster
4
HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation
COLM 2025Poster
3
URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training
NeurIPS 2025Poster
3
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
ICML 2025Poster
4
Towards Fully FP8 GEMM LLM Training at Scale
NeurIPS 2025Poster
4
Intrinsic User-Centric Interpretability through Global Mixture of Experts
ICLR 2025Poster
4
Attention with Markov: A Curious Case of Single-layer Transformers
ICLR 2025Spotlight
3
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
COLM 2025Poster
4
Effective Interplay between Sparsity and Quantization: From Theory to Practice
ICLR 2025Spotlight
4
FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language
COLM 2025Poster
20249 篇
4
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
ICLR 2024Rejected
4
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
NeurIPS 2024Poster
3
CoBo: Collaborative Learning via Bilevel Optimization
NeurIPS 2024Poster
4
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
COLM 2024Poster
4
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
NeurIPS 2024Poster
6
LASER: Linear Compression in Wireless Distributed Optimization
ICLR 2024Rejected
4
Layer-wise linear mode connectivity
ICLR 2024Poster
3
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
NeurIPS 2024Spotlight
4
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
NeurIPS 2024Poster