影响力指数

94.87/100

前 0.3%

全站排名 #174

发表论文35 篇

平均评分5.6

年均产出11.7 篇/年

Ruoxi Jia

Assistant Professor@Virginia Tech·美国·OpenReview

研究方向

Machine learning · Security · Privacy

Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice

ICLR 2026Poster

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead

ICLR 2026Poster

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

ICLR 2026Poster

Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance

ICLR 2026Poster

CASPO: Confidence-aware Step-wise Preference Optimization for Reliable Reasoning in Large Language Models

ICLR 2026Rejected

Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention

ICLR 2026Poster

Data Valuation and Selection in a Federated Model Marketplace

ICLR 2026Rejected

ARTS: Alleviating Hallucinations in Large Vision–Language Models via Redundancy-Aware Token Selection

ICLR 2026Rejected

AdaDeDup: Adaptive Hybrid Data Pruning for Efficient Object Detection Training

ICLR 2026Withdrawn

Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data

ICLR 2026Withdrawn

Capturing the Temporal Dependence of Training Data Influence

Data Shapley in One Training Run

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

ICLR 2025Spotlight

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

COLM 2025Poster

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

ICLR 2025Poster

Data-Centric Human Preference with Rationales for Direct Preference Alignment

COLM 2025Poster

Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

ICLR 2025Poster

LLMs Can Plan Only If We Tell Them

ICLR 2025Poster

Probing Hidden Knowledge Holes in Unlearned LLMs

NeurIPS 2025Poster

LLMs Can Reason Faster Only If We Let Them

ICML 2025Poster

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

ICML 2025Poster

AutoScale: Automatic Prediction of Compute-optimal Data Compositions for Training LLMs

ICLR 2025Rejected

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

COLM 2025Poster

LLM Spark: Critical Thinking Evaluation of Large Language Models

ICLR 2025Rejected

Data-Centric Human Preference Optimization with Rationales

ICLR 2025Rejected

SCOPE: Scalable and Adaptive Evaluation of Misguided Safety Refusal in LLMs

ICLR 2025Rejected

Fast and Noise-Robust Diffusion Solvers for Inverse Problems: A Frequentist Approach

ICLR 2025Rejected

CONCORD: Concept-informed Diffusion for Dataset Distillation

ICLR 2025Withdrawn

合作者 (20)

博后导师8 篇

Jiachen T. Wang