PaperHub

Philip Torr

~Philip_Torr1

77
论文总数
38.5
年均投稿
5.5
平均评分
接收情况40/77
会议分布
ICLR
51
NeurIPS
17
ICML
5
COLM
4

发表论文 (77 篇)

202542

6.0
4

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

ICLR 2025Rejected
-

Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability

ICLR 2025withdrawn
7.5
4

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

ICLR 2025Spotlight
7.0
4

Towards Interpreting Visual Information Processing in Vision-Language Models

ICLR 2025Poster
4.0
4

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

ICLR 2025withdrawn
5.8
4

Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models

ICLR 2025Poster
4.4
4

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

ICML 2025Poster
5.0
6

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

ICLR 2025Rejected
7.1
5

On the Coexistence and Ensembling of Watermarks

NeurIPS 2025Poster
6.5
4

On the Coexistence and Ensembling of Watermarks

ICLR 2025Rejected
6.4
4

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

NeurIPS 2025Poster
6.4
4

Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval

NeurIPS 2025Poster
8.2
4

Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?

NeurIPS 2025Spotlight
6.8
4

Towards Certification of Uncertainty Calibration under Adversarial Attacks

ICLR 2025Poster
5.5
4

MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection

ICLR 2025Rejected
5.5
6

Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation

ICLR 2025Rejected
7.0
3

Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

ICML 2025Poster
4.8
4

SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?

ICLR 2025Rejected
5.5
4

Focus On This, Not That! Steering LLMs With Adaptive Feature Specification

ICLR 2025Rejected
2.5
4

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

ICLR 2025Rejected
3.5
4

Cracking the Collective Mind: Adversarial Manipulation in Multi-Agent Systems

ICLR 2025withdrawn
2.5
4

Questioning Simplicity Bias Assumptions

ICLR 2025withdrawn
5.5
3

Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation

ICML 2025Poster
7.0
3

MALT: Improving Reasoning with Multi-Agent LLM Training

COLM 2025Poster
4.3
4

LLM Jailbreak Detection for (Almost) Free!

ICLR 2025withdrawn
3.5
4

Learning Visual Prompts for Guiding the Attention of Vision Transformers

ICLR 2025Rejected
6.3
3

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

COLM 2025Poster
4.9
4

Minimalist Concept Erasure in Generative Models

ICML 2025Poster
4.7
3

Incrementally Adapting Generative Vision-Language Models with Task Codebook

ICLR 2025Rejected
5.0
4

REVIP: Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

ICLR 2025withdrawn
5.8
4

Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

ICLR 2025Rejected
4.3
4

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

ICLR 2025withdrawn
6.5
4

True Multimodal In-Context Learning Needs Attention to the Visual Context

COLM 2025Poster
6.8
4

Shh, don't say that! Domain Certification in LLMs

ICLR 2025Poster
6.6
4

Mixture of Experts Made Intrinsically Interpretable

ICML 2025Poster
6.8
4

Towards Reliable Identification of Diffusion-based Image Manipulations

NeurIPS 2025Poster
3.5
4

A Scalable Communication Protocol for Networks of Large Language Models

ICLR 2025withdrawn
3.5
4

FAIRMINDSIM: ALIGNMENT OF BEHAVIOR, EMO- TION, AND BELIEF IN HUMANS AND LLM AGENTS AMID ETHICAL DILEMMAS

ICLR 2025Rejected
7.3
4

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

NeurIPS 2025Poster
4.4
5

Can Editing LLMs Inject Harm?

ICLR 2025Rejected
4.3
4

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

ICLR 2025Rejected
4.3
4

OASIS: Open Agents Social Interaction Simulations on a Large Scale

ICLR 2025Rejected

202435

6.5
4

When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations

ICLR 2024Poster
5.8
4

Set-based Neural Network Encoding Without Weight Tying

NeurIPS 2024Poster
3.0
3

Central Force Field: Unifying Generative and Discriminative Models While Harmonizing Energy-Based and Score-Based Models

ICLR 2024withdrawn
4.8
4

Fine-tuning can cripple foundation models; preserving features may be the solution

ICLR 2024withdrawn
7.0
5

Select to Perfect: Imitating desired behavior from large multi-agent data

ICLR 2024Poster
5.8
5

Efficient Lifelong Model Evaluation in an Era of Rapid Progress

NeurIPS 2024Poster
4.5
4

From Malicious to Marvelous: The Art of Adversarial Attack as Diffusion

ICLR 2024withdrawn
7.5
4

Influencer Backdoor Attack on Semantic Segmentation

ICLR 2024Spotlight
5.5
4

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators

ICLR 2024Rejected
6.3
4

RanDumb: Random Representations Outperform Online Continually Learned Representations

NeurIPS 2024Poster
6.5
4

Universal In-Context Approximation By Prompting Fully Recurrent Models

NeurIPS 2024Poster
6.3
4

Generalized Temporal Difference Learning Models for Supervised Learning

ICLR 2024Rejected
4.3
3

Online Continual Learning Without the Storage Constraint

ICLR 2024withdrawn
6.0
4

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

ICLR 2024Poster
6.8
5

An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models

ICLR 2024Spotlight
6.0
4

PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION

ICLR 2024Poster
6.8
4

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

ICLR 2024Poster
4.0
4

Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation

ICLR 2024Rejected
3.0
3

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders

ICLR 2024withdrawn
5.8
4

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

NeurIPS 2024Poster
5.8
4

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study

NeurIPS 2024Poster
6.7
3

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images

ICLR 2024Poster
7.3
3

Illusory Attacks: Information-theoretic detectability matters in adversarial attacks

ICLR 2024Spotlight
5.0
5

From Categories to Classifier: Name-Only Continual Learning by Exploring the Web

ICLR 2024withdrawn
5.5
4

Label Delay in Online Continual Learning

NeurIPS 2024Poster
5.5
4

Efficient Certification of Physics-Informed Neural Networks

ICLR 2024Rejected
6.8
4

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

NeurIPS 2024Poster
-

PatchCraft: Learning Optimized Image Patch for Enhanced Visual Attention of CLIP

ICLR 2024withdrawn
5.0
5

Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

NeurIPS 2024Poster
5.3
4

Interpreting Learned Feedback Patterns in Large Language Models

NeurIPS 2024Poster
4.0
4

Modeling Annotation Delay In Continual Learning

ICLR 2024Rejected
4.3
4

Understanding Graph Transformers by Generalized Propagation

ICLR 2024Rejected
5.8
4

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

ICLR 2024Rejected
7.0
3

Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image

COLM 2024Poster
5.5
4

Can Large Language Model Agents Simulate Human Trust Behavior?

NeurIPS 2024Poster