Tong Zhang
~Tong_Zhang2
26
论文总数
13.0
年均投稿
平均评分
接收情况23/26
会议分布
NeurIPS
11
ICLR
10
ICML
5
发表论文 (26 篇)
202514 篇
4
AdaGrad under Anisotropic Smoothness
ICLR 2025Poster
4
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
ICML 2025Spotlight
4
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
NeurIPS 2025Poster
4
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
ICLR 2025Rejected
5
Logarithmic Regret for Online KL-Regularized Reinforcement Learning
ICML 2025Poster
4
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
ICML 2025Poster
4
Personalized Visual Instruction Tuning
ICLR 2025Poster
4
ASGO: Adaptive Structured Gradient Optimization
NeurIPS 2025Poster
3
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
NeurIPS 2025Poster
3
MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
ICML 2025Poster
4
Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction
NeurIPS 2025Poster
4
Building Math Agents with Multi-Turn Iterative Preference Learning
ICLR 2025Poster
4
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ICML 2025Oral
4
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
NeurIPS 2025Poster
202412 篇
4
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithms
NeurIPS 2024Poster
4
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
ICLR 2024Poster
4
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
NeurIPS 2024Poster
4
Reverse Diffusion Monte Carlo
ICLR 2024Poster
5
Recursive Score Estimation Accelerates Diffusion-Based Monte Carlo
ICLR 2024Rejected
4
A Sober Look at the Robustness of CLIPs to Spurious Features
NeurIPS 2024Poster
4
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
NeurIPS 2024Poster
5
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
NeurIPS 2024Spotlight
4
Active Prompting with Chain-of-Thought for Large Language Models
ICLR 2024Rejected
3
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
NeurIPS 2024Poster
4
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
ICLR 2024Spotlight
3
Spurious Feature Diversification Improves Out-of-distribution Generalization
ICLR 2024Poster