Shiwei Liu

~Shiwei_Liu2

25

论文总数

12.5

年均投稿

6.1

平均评分

接收情况15/25

会议分布

ICLR

15

NeurIPS

7

ICML

3

发表论文 (25 篇)

202513 篇

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

ICLR 2025Poster

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

ICLR 2025Rejected

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

ICML 2025Poster

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

ICLR 2025Rejected

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

ICLR 2025withdrawn

Composable Interventions for Language Models

ICLR 2025Poster

The Curse of Depth in Large Language Models

NeurIPS 2025Poster

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

ICLR 2025Poster

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

NeurIPS 2025Poster

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

ICML 2025Poster

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

ICML 2025Poster

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

NeurIPS 2025Rejected

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

NeurIPS 2025Poster

202412 篇

Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity

ICLR 2024Rejected

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

ICLR 2024Rejected

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

NeurIPS 2024Poster

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

NeurIPS 2024Poster

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

NeurIPS 2024Poster

AdaMerging: Adaptive Model Merging for Multi-Task Learning

ICLR 2024Poster

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

ICLR 2024Rejected

($\texttt{PASS}$) Visual Prompt Locates Good Structure Sparisty through a Recurent HyperNetwork

ICLR 2024Rejected

BiDST: Dynamic Sparse Training is a Bi-Level Optimization Problem

ICLR 2024Rejected

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

ICLR 2024Poster

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization

ICLR 2024Poster

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

ICLR 2024Rejected

合作者 (20)

Zhangyang Wang9 篇

AJAY KUMAR JAISWAL7 篇

Zhenyu Zhang6 篇

Tianjin Huang5 篇

Pengxiang Li5 篇

Mykola Pechenizkiy4 篇