PaperHub

暂无评分数据

ICLR 2025

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs

OpenReviewPDF
提交: 2024-09-28更新: 2024-10-28
TL;DR

Layerwise LLM quantization based on importance, with less critical layers in lower bits and key layers in higher bits, enables memory efficiency. Supports any quantization technique, enables decimal-point bit quantization for low-memory settings.

摘要

关键词
Layerwise Quantization of LLMs based on layer importancememory-constraint quantizationvariable decimal-point bit quantization based on memory availabilityreduced model size for resource-efficient NLP systems

评审与讨论

编辑台拒稿

直接拒稿原因

Violate anonymity policy.