Fazl Barez
~Fazl_Barez1
16
论文总数
8.0
年均投稿
平均评分
接收情况6/16
会议分布
ICLR
12
NeurIPS
2
ICML
1
COLM
1
发表论文 (16 篇)
202510 篇
4
Scaling Sparse Feature Circuits For Studying In-Context Learning
ICLR 2025Rejected
4
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
ICLR 2025Rejected
3
Attacking Audio Language Models with Best-of-N Jailbreaking
ICLR 2025Rejected
4
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
ICLR 2025Rejected
4
Best-of-N Jailbreaking
NeurIPS 2025Poster
5
Plan B: Training LLMs to fail less severely
ICLR 2025withdrawn
4
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025Poster
4
Towards Interpreting Visual Information Processing in Vision-Language Models
ICLR 2025Poster
3
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
COLM 2025Poster
6
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
ICLR 2025Rejected
20246 篇
4
Understanding Addition in Transformers
ICLR 2024Poster
3
What does GPT store in its MLP weights? A case study of long-range dependencies
ICLR 2024Rejected
-
Value-Evolutionary-Based Reinforcement Learning
ICLR 2024withdrawn
5
Neuron to Graph: Interpreting Language Model Neurons at Scale
ICLR 2024Rejected
3
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
ICLR 2024withdrawn
4
Interpreting Learned Feedback Patterns in Large Language Models
NeurIPS 2024Poster