影响力指数

87.53/100

前 0.8%

全站排名 #488

发表论文35 篇

平均评分5.0

年均产出11.7 篇/年

Pengfei Liu

Associate Professor@Shanghai Jiaotong University·中国·OpenReview

研究方向

Alignment in Large Language Model · Evaluation · benchmark · Pretraining Model

SR-Scientist: Scientific Equation Discovery With Agentic AI

ICLR 2026Poster

MegaScience: Pushing the Frontiers of Open Post-Training Datasets for Science Reasoning

ICLR 2026Rejected

InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research

ICLR 2026Poster

ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of AI Research

ICLR 2026Rejected

GeneVLM: Automated Parsing Executable Digital Gene from a Single Image

ICLR 2026Withdrawn

LIMI: Less is More for Agency

ICLR 2026Rejected

Efficient Agent Training for Computer Use

ICLR 2026Poster

Proximal Supervised Fine-Tuning

ICLR 2026Poster

DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery

ICLR 2026Rejected

One RL to See Them All: Visual Triple Unified Reinforcement Learning

ICLR 2026Rejected

Discovering Architectures via an Evolutionary Agentic Framework

ICLR 2026Rejected

Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

ICLR 2026Withdrawn

Attention Localization Through Separator Tokens: Unlocking Long Numerical Sequence Processing in LLMs

ICLR 2026Withdrawn

ARGO: Asynchronous Rollout with Human Guidance for Research Agent Optimization

ICLR 2026Withdrawn

Deep Cognition: A Multi-Agent Framework for Collaborative Research with Real-Time Cognitive Oversight

ICLR 2026Withdrawn

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

ICLR 2026Withdrawn

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

ICLR 2025Spotlight

Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

ICML 2025Poster

Progress or Regress? Self-Improvement Reversal in Post-training

ICLR 2025Poster

On Evaluating LLM Alignment by Evaluating LLMs as Judges

NeurIPS 2025Poster

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

NeurIPS 2025Poster

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

COLM 2025Poster

LIMO: Less is More for Reasoning

COLM 2025Poster

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

ICLR 2025Rejected

OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance

ICML 2025Poster

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

COLM 2025Poster

BeHonest: Benchmarking Honesty in Large Language Models

ICLR 2025Rejected

OMNIBAL: TOWARDS FAST INSTRUCT-TUNING FOR VISION-LANGUAGE MODELS VIA OMNIVERSE COMPUTATION BALANCE

ICLR 2025Withdrawn

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

ICLR 2025Rejected

TOMVALLEY: EVALUATING THE THEORY OF MIND REASONING OF LLMS IN REALISTIC SOCIAL CONTEXT

ICLR 2025Withdrawn

合作者 (20)