暂无评分数据
ICLR 2025
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs
TL;DR
Layerwise LLM quantization based on importance, with less critical layers in lower bits and key layers in higher bits, enables memory efficiency. Supports any quantization technique, enables decimal-point bit quantization for low-memory settings.
摘要
关键词
Layerwise Quantization of LLMs based on layer importancememory-constraint quantizationvariable decimal-point bit quantization based on memory availabilityreduced model size for resource-efficient NLP systems
评审与讨论
PC编辑台拒稿
直接拒稿原因
Violate anonymity policy.