Philip Torr
~Philip_Torr1
77
论文总数
38.5
年均投稿
平均评分
接收情况40/77
会议分布
ICLR
51
NeurIPS
17
ICML
5
COLM
4
发表论文 (77 篇)
202542 篇
4
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
ICLR 2025Rejected
-
Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability
ICLR 2025withdrawn
4
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
ICLR 2025Spotlight
4
Towards Interpreting Visual Information Processing in Vision-Language Models
ICLR 2025Poster
4
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
ICLR 2025withdrawn
4
Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models
ICLR 2025Poster
4
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025Poster
6
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
ICLR 2025Rejected
5
On the Coexistence and Ensembling of Watermarks
NeurIPS 2025Poster
4
On the Coexistence and Ensembling of Watermarks
ICLR 2025Rejected
4
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
NeurIPS 2025Poster
4
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
NeurIPS 2025Poster
4
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
NeurIPS 2025Spotlight
4
Towards Certification of Uncertainty Calibration under Adversarial Attacks
ICLR 2025Poster
4
MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection
ICLR 2025Rejected
6
Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation
ICLR 2025Rejected
3
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
ICML 2025Poster
4
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
ICLR 2025Rejected
4
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
ICLR 2025Rejected
4
Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
ICLR 2025Rejected
4
Cracking the Collective Mind: Adversarial Manipulation in Multi-Agent Systems
ICLR 2025withdrawn
4
Questioning Simplicity Bias Assumptions
ICLR 2025withdrawn
3
Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation
ICML 2025Poster
3
MALT: Improving Reasoning with Multi-Agent LLM Training
COLM 2025Poster
4
LLM Jailbreak Detection for (Almost) Free!
ICLR 2025withdrawn
4
Learning Visual Prompts for Guiding the Attention of Vision Transformers
ICLR 2025Rejected
3
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
COLM 2025Poster
4
Minimalist Concept Erasure in Generative Models
ICML 2025Poster
3
Incrementally Adapting Generative Vision-Language Models with Task Codebook
ICLR 2025Rejected
4
REVIP: Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
ICLR 2025withdrawn
4
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
ICLR 2025Rejected
4
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
ICLR 2025withdrawn
4
True Multimodal In-Context Learning Needs Attention to the Visual Context
COLM 2025Poster
4
Shh, don't say that! Domain Certification in LLMs
ICLR 2025Poster
4
Mixture of Experts Made Intrinsically Interpretable
ICML 2025Poster
4
Towards Reliable Identification of Diffusion-based Image Manipulations
NeurIPS 2025Poster
4
A Scalable Communication Protocol for Networks of Large Language Models
ICLR 2025withdrawn
4
FAIRMINDSIM: ALIGNMENT OF BEHAVIOR, EMO- TION, AND BELIEF IN HUMANS AND LLM AGENTS AMID ETHICAL DILEMMAS
ICLR 2025Rejected
4
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
NeurIPS 2025Poster
5
Can Editing LLMs Inject Harm?
ICLR 2025Rejected
4
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
ICLR 2025Rejected
4
OASIS: Open Agents Social Interaction Simulations on a Large Scale
ICLR 2025Rejected
202435 篇
4
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
ICLR 2024Poster
4
Set-based Neural Network Encoding Without Weight Tying
NeurIPS 2024Poster
3
Central Force Field: Unifying Generative and Discriminative Models While Harmonizing Energy-Based and Score-Based Models
ICLR 2024withdrawn
4
Fine-tuning can cripple foundation models; preserving features may be the solution
ICLR 2024withdrawn
5
Select to Perfect: Imitating desired behavior from large multi-agent data
ICLR 2024Poster
5
Efficient Lifelong Model Evaluation in an Era of Rapid Progress
NeurIPS 2024Poster
4
From Malicious to Marvelous: The Art of Adversarial Attack as Diffusion
ICLR 2024withdrawn
4
Influencer Backdoor Attack on Semantic Segmentation
ICLR 2024Spotlight
4
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
ICLR 2024Rejected
4
RanDumb: Random Representations Outperform Online Continually Learned Representations
NeurIPS 2024Poster
4
Universal In-Context Approximation By Prompting Fully Recurrent Models
NeurIPS 2024Poster
4
Generalized Temporal Difference Learning Models for Supervised Learning
ICLR 2024Rejected
3
Online Continual Learning Without the Storage Constraint
ICLR 2024withdrawn
4
Real-Fake: Effective Training Data Synthesis Through Distribution Matching
ICLR 2024Poster
5
An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models
ICLR 2024Spotlight
4
PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION
ICLR 2024Poster
4
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation
ICLR 2024Poster
4
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation
ICLR 2024Rejected
3
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
ICLR 2024withdrawn
4
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
NeurIPS 2024Poster
4
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
NeurIPS 2024Poster
3
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
ICLR 2024Poster
3
Illusory Attacks: Information-theoretic detectability matters in adversarial attacks
ICLR 2024Spotlight
5
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web
ICLR 2024withdrawn
4
Label Delay in Online Continual Learning
NeurIPS 2024Poster
4
Efficient Certification of Physics-Informed Neural Networks
ICLR 2024Rejected
4
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
NeurIPS 2024Poster
-
PatchCraft: Learning Optimized Image Patch for Enhanced Visual Attention of CLIP
ICLR 2024withdrawn
5
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
NeurIPS 2024Poster
4
Interpreting Learned Feedback Patterns in Large Language Models
NeurIPS 2024Poster
4
Modeling Annotation Delay In Continual Learning
ICLR 2024Rejected
4
Understanding Graph Transformers by Generalized Propagation
ICLR 2024Rejected
4
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
ICLR 2024Rejected
3
Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
COLM 2024Poster
4
Can Large Language Model Agents Simulate Human Trust Behavior?
NeurIPS 2024Poster