影响力指数

94.2/100

前 0.3%

全站排名 #201

发表论文29 篇

平均评分5.9

年均产出9.7 篇/年

Martin Jaggi

Associate Professor@EPFL·瑞士·OpenReview

研究方向

Optimization · Distributed Training · Large Language Models · Collaborative Learning

Gradient-Normalized Smoothness for Optimization with Approximate Hessians

ICLR 2026Poster

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

ICLR 2026Poster

Weight Decay may matter more than µP for Learning Rate Transfer in Practice

ICLR 2026Poster

Benchmarking Optimizers for Large Language Model Pretraining

ICLR 2026Rejected

A Split-Client Approach to Second-Order Optimization

ICLR 2026Rejected

$\alpha$-LoRA: Effective Fine-Tuning via Base Model Rescaling

ICLR 2026Rejected

MXSens: Sensitivity-Aware Mixed-Precision Quantization for Efficient LLM Inference

ICLR 2026Rejected

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

COLM 2025Poster

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

COLM 2025Poster

Effective Interplay between Sparsity and Quantization: From Theory to Practice

ICLR 2025Spotlight

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

NeurIPS 2025Poster

Attention with Markov: A Curious Case of Single-layer Transformers

ICLR 2025Spotlight

HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation

COLM 2025Poster

Intrinsic User-Centric Interpretability through Global Mixture of Experts

ICLR 2025Poster

URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training

NeurIPS 2025Poster

Towards Fully FP8 GEMM LLM Training at Scale

NeurIPS 2025Poster

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

ICML 2025Poster

CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference

ICLR 2025Poster

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

ICLR 2025Rejected

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

ICLR 2025Rejected

合作者 (20)

Bettina Messmer

Amirkeivan Mohtashami

Matteo Pagliardini

Vinko Sabolčec

El Mahdi Chayti