Shiwei Liu
~Shiwei_Liu2
25
论文总数
12.5
年均投稿
平均评分
接收情况15/25
会议分布
ICLR
15
NeurIPS
7
ICML
3
发表论文 (25 篇)
202513 篇
5
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
ICLR 2025Poster
4
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
ICLR 2025Rejected
4
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
ICML 2025Poster
4
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
ICLR 2025Rejected
4
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
ICLR 2025withdrawn
5
Composable Interventions for Language Models
ICLR 2025Poster
4
The Curse of Depth in Large Language Models
NeurIPS 2025Poster
3
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
ICLR 2025Poster
4
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
NeurIPS 2025Poster
3
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
ICML 2025Poster
4
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
ICML 2025Poster
3
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
NeurIPS 2025Rejected
4
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
NeurIPS 2025Poster
202412 篇
3
Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity
ICLR 2024Rejected
3
Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once
ICLR 2024Rejected
3
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
NeurIPS 2024Poster
3
E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation
NeurIPS 2024Poster
4
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
NeurIPS 2024Poster
4
AdaMerging: Adaptive Model Merging for Multi-Task Learning
ICLR 2024Poster
4
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
ICLR 2024Rejected
4
($\texttt{PASS}$) Visual Prompt Locates Good Structure Sparisty through a Recurent HyperNetwork
ICLR 2024Rejected
3
BiDST: Dynamic Sparse Training is a Bi-Level Optimization Problem
ICLR 2024Rejected
3
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
ICLR 2024Poster
3
NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization
ICLR 2024Poster
5
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
ICLR 2024Rejected