NeurIPS

为您找到 9,778 篇相关研究

全部论文

为您找到 9,778 篇相关研究

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly in mathematics and programming tasks. It is widely believed that, similar to how traditional RL helps agents to explore and learn new strategies, RLVR enables LLMs to continuously self-improve, thus acquiring novel reasoning abilities that exceed the capacity of the corresponding base models. In this study, we take a critical look at \textit{the current state of RLVR} by systematically probing the reasoning capability boundaries of RLVR-trained LLMs across diverse model families, RL algorithms, and math/coding/visual reasoning benchmarks, using pass@\textit{k} at large \textit{k} values as the evaluation metric. While RLVR improves sampling efficiency towards the correct path, we surprisingly find that current training does \emph{not} elicit fundamentally new reasoning patterns. We observe that while RLVR-trained models outperform their base models at smaller values of $k$ (\eg, $k$=1), base models achieve higher pass@$k$ score when $k$ is large. Moreover, we observe that the reasoning capability boundary of LLMs often narrows as RLVR training progresses. Further coverage and perplexity analysis shows that the reasoning paths generated by RLVR models are already included in the base models' sampling distribution, suggesting that their reasoning abilities originate from and are \textit{bounded} by the base model. From this perspective, treating the base model as an upper bound, our quantitative analysis shows that six popular RLVR algorithms perform similarly and remain far from optimal in fully leveraging the potential of the base model. In contrast, we find that distillation can introduce new reasoning patterns from the teacher and genuinely expand the model’s reasoning capabilities. Taken together, our findings suggest that current RLVR methods have not fully realized the potential of RL to elicit genuinely novel reasoning abilities in LLMs. This underscores the need for improved RL paradigms—such as continual scaling and multi-turn agent-environment interaction—to unlock this potential.

NeurIPS 2025Oral

Yang Yue et al.

9.4

Learning Linear Attention in Polynomial Time

Previous research has explored the expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the efficient learnability of Transformers from data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers with linear attention. We show that learning the optimal multi head linear attention can be recast as finding the optimal kernel predictor in a suitably defined RKHS. Moving to generalization, we construct an algorithm that, given a dataset, checks in polynomial time whether the set of best fit multi head linear attention networks on this data all perform an identical computation--a powerful notion for out of distribution generalization. We empirically validate our theoretical findings on several canonical tasks: learning random linear attention networks, key--value associations, and learning to execute finite automata. Our findings bridge a critical gap between theoretical expressivity and learnability of Transformer models.

NeurIPS 2025Oral

Morris Yau et al.

9.1

FlexOLMo: Open Language Models for Flexible Data Use

We introduce FlexOLMo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on private datasets, and (2) data-flexible inference, where these parameters along with their associated data can be easily included or excluded from model inferences with no further training. FlexOLMo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on private datasets and later integrated through a new nonparametric routing without any joint training across datasets. FlexOLMo is trained on FLEXMIX, a corpus we curate comprising seven restricted sets, either real or realistic approximations, alongside publicly available datasets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners significantly benefiting from these restricted sets (an average 41% relative improvement) while allowing flexible opt-out at inference time (e.g., for users without appropriate licenses or permissions). Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, FlexOLMo enables training on restricted data while keeping data local and supports fine-grained control of data access at inference.

NeurIPS 2025Spotlight

Weijia Shi et al.

9.1

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

Lip synchronization is the task of aligning a speaker’s lip movements in video with corresponding speech audio, and it is essential for creating realistic, expressive video content. However, existing methods often rely on reference frames and masked-frame inpainting, which limit their robustness to identity consistency, pose variations, facial occlusions, and stylized content. In addition, since audio signals provide weaker conditioning than visual cues, lip shape leakage from the original video will affect lip sync quality. In this paper, we present OmniSync, a universal lip synchronization framework for diverse visual scenarios. Our approach introduces a mask-free training paradigm using Diffusion Transformer models for direct frame editing without explicit masks, enabling unlimited-duration inference while maintaining natural facial dynamics and preserving character identity. During inference, we propose a flow-matching-based progressive noise initialization to ensure pose and identity consistency, while allowing precise mouth-region editing. To address the weak conditioning signal of audio, we develop a Dynamic Spatiotemporal Classifier-Free Guidance (DS-CFG) mechanism that adaptively adjusts guidance strength over time and space. We also establish the AIGC-LipSync Benchmark, the first evaluation suite for lip synchronization in diverse AI-generated videos. Extensive experiments demonstrate that OmniSync significantly outperforms prior methods in both visual quality and lip sync accuracy, achieving superior results in both real-world and AI-generated videos.

NeurIPS 2025Spotlight

Ziqiao Peng et al.

9.1

Affine-Invariant Global Non-Asymptotic Convergence Analysis of BFGS under Self-Concordance

In this paper, we establish global non-asymptotic convergence guarantees for the BFGS quasi-Newton method without requiring strong convexity or the Lipschitz continuity of the gradient or Hessian. Instead, we consider the setting where the objective function is strictly convex and strongly self-concordant. For an arbitrary initial point and any arbitrary positive-definite initial Hessian approximation, we prove global linear and superlinear convergence guarantees for BFGS when the step size is determined using a line search scheme satisfying the weak Wolfe conditions. Moreover, all our global guarantees are affine-invariant, with the convergence rates depending solely on the initial error and the strongly self-concordant constant. Our results extend the global non-asymptotic convergence theory of BFGS beyond traditional assumptions and, for the first time, establish affine-invariant convergence guarantees—aligning with the inherent affine invariance of the BFGS method.

NeurIPS 2025Spotlight

Qiujiang Jin et al.

9.1

Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems

We consider the following question: given a submodular or supermodular set function $f:2^V \to \mathbb{R}$, how should one minimize or maximize its average value $f(S)/|S|$ over non-empty subsets $S\subseteq V$? This problem generalizes several well-known objectives including Densest Subgraph (DSG), Densest Supermodular Set (DSS), and Submodular Function Minimization (SFM). Motivated by recent applications [39, 31], we formalize two new broad problems: the Unrestricted Sparsest Submodular Set (USSS) and Unrestricted Densest Supermodular Set (UDSS) which allow negative and non-monotone functions. Using classical results we observe that DSS, SFM, USSS, UDSS, and MNP are all equivalent under strongly polynomial-time reductions. This equivalence enables algorithmic cross-over: methods designed for one problem can be repurposed to solve others efficiently. In particular we use the perspective of the minimum norm point in the base polyhedron of a sub/supermodular function which, via Fujishige's results, yields the dense decomposition as a byproduct. Via this perspective we show that a recent converging heuristic for DSS, \textsc{SuperGreedy++} [15, 29], and Wolfe’s minimum norm point algorithm are both universal solvers for all of these problems. On the theoretical front, we explain the observation made in recent work [39, 31] that \textsc{SuperGreedy++} appears to work well even in settings beyond DSS. Surprisingly, we also show that this simple algorithm can be used for Submodular Function Minimization, including for example that it can act as an effective minimum $st$ cut algorithm. On the empirical front, we explore the utility of several different algorithms including Fujishige-Wolfe min-norm point algorithm for recent problems. We conduct over 400 experiments across seven problem types and large-scale synthetic and real-world datasets (up to $\approx 100$ million edges). Our results reveal that methods historically considered inefficient, such as convex-programming methods, flow-based solvers, and Fujishige-Wolfe’s algorithm, outperform state-of-the-art task-specific baselines by orders of magnitude on concrete problems like HNSN [39]. These findings challenge prevailing assumptions and demonstrate that with the right framing, general optimization algorithms can be both scalable and state-of-the-art for supermodular and submodular ratio problems.

NeurIPS 2025Spotlight

Elfarouk Harb et al.

9.1

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

Existing pretrained models for 3D mesh generation often suffer from data biases and produce low-quality results, while global reinforcement learning (RL) methods rely on object-level rewards that struggle to capture local structure details. To address these challenges, we present $\textbf{Mesh-RFT}$, a novel fine-grained reinforcement fine-tuning framework that employs Masked Direct Preference Optimization (M-DPO) to enable localized refinement via quality-aware face masking. To facilitate efficient quality evaluation, we introduce an objective topology-aware scoring system to evaluate geometric integrity and topological regularity at both object and face levels through two metrics: Boundary Edge Ratio (BER) and Topology Score (TS). By integrating these metrics into a fine-grained RL strategy, Mesh-RFT becomes the first method to optimize mesh quality at the granularity of individual faces, resolving localized errors while preserving global coherence. Experiment results show that our M-DPO approach reduces Hausdorff Distance (HD) by 24.6\% and improves Topology Score (TS) by 3.8\% over pre-trained models, while outperforming global DPO methods with a 17.4\% HD reduction and 4.9\% TS gain. These results demonstrate Mesh-RFT’s ability to improve geometric integrity and topological regularity, achieving new state-of-the-art performance in production-ready mesh generation.

NeurIPS 2025Spotlight

Jian Liu et al.

9.1

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Recent advancements in large language models (LLMs) underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM’s general reasoning potential. To address this limitation, we introduce the **Knowledge Orthogonal Reasoning Gymnasium (KORGym)**, a dynamic evaluation platform inspired by KOR-Bench and Gymnasium. KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios. Using KORGym, we conduct extensive experiments on 19 LLMs and 8 VLMs, revealing consistent reasoning patterns within model families and demonstrating the superior performance of closed-source models. Further analysis examines the effects of modality, reasoning strategies, reinforcement learning techniques, and response length on model performance. We expect KORGym to become a valuable resource for advancing LLM reasoning research and developing evaluation methodologies suited to complex, interactive environments.

NeurIPS 2025Spotlight

Jiajun Shi et al.

9.1

Optimal Mistake Bounds for Transductive Online Learning

We resolve a 30-year-old open problem concerning the power of unlabeled data in online learning by tightly quantifying the gap between transductive and standard online learning. We prove that for every concept class $\mathcal{H}$ with Littlestone dimension $d$, the transductive mistake bound is at least $\Omega(\sqrt{d})$. This establishes an exponential improvement over previous lower bounds of $\Omega(\log \log d)$, $\Omega(\sqrt{\log d})$, and $\Omega(\log d)$, respectively due to Ben-David, Kushilevitz, and Mansour (1995, 1997) and Hanneke, Moran, and Shafer (2023). We also show that our bound is tight: for every $d$, there exists a class of Littlestone dimension $d$ with transductive mistake bound $O(\sqrt{d})$. Our upper bound also improves the previous best known upper bound of $(2/3) \cdot d$ from Ben-David et al. (1997). These results demonstrate a quadratic gap between transductive and standard online learning, thereby highlighting the benefit of advanced access to the unlabeled instance sequence. This stands in stark contrast to the PAC setting, where transductive and standard learning exhibit similar sample complexities.

NeurIPS 2025Oral

Zachary Chase et al.

9.1

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM–VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about ''what'', ''where'', ''when'', and ''how'' of a video. We make our work fully reproducible by providing data, training recipes, code & models.

NeurIPS 2025Spotlight

Jang Hyun Cho et al.

8.9

Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy

A central challenge in machine learning is to understand how noise or measurement errors affect low-rank approximations, particularly in the spectral norm. This question is especially important in differentially private low-rank approximation, where one aims to preserve the top-$p$ structure of a data-derived matrix while ensuring privacy. Prior work often analyzes Frobenius norm error or changes in reconstruction quality, but these metrics can over- or under-estimate true subspace distortion. The spectral norm, by contrast, captures worst-case directional error and provides the strongest utility guarantees. We establish new high-probability spectral-norm perturbation bounds for symmetric matrices that refine the classical Eckart--Young--Mirsky theorem and explicitly capture interactions between a matrix $A \in \mathbb{R}^{n \times n}$ and an arbitrary symmetric perturbation $E$. Under mild eigengap and norm conditions, our bounds yield sharp estimates for $\| (A + E)_p - A_p \|$, where $A_p$ is the best rank-$p$ approximation of $A$, with improvements of up to a factor of $\sqrt{n}$. As an application, we derive improved utility guarantees for differentially private PCA, resolving an open problem in the literature. Our analysis relies on a novel contour bootstrapping method from complex analysis and extends it to a broad class of spectral functionals, including polynomials and matrix exponentials. Empirical results on real-world datasets confirm that our bounds closely track the actual spectral error under diverse perturbation regimes.

NeurIPS 2025Oral

Phuc Tran et al.

8.8

On Union-Closedness of Language Generation

We investigate language generation in the limit – a model by Kleinberg and Mullainathan and extended by Li, Raman, and Tewari. While Kleinberg and Mullainathan proved generation is possible for all countable collections, Li, Raman, and Tewari defined a hierarchy of generation notions (uniform, non-uniform, and generatable) and explored their feasibility for uncountable collections. Our first set of results resolve two open questions of Li et al. by proving finite unions of generatable or non-uniformly generatable classes need not be generatable. These follow from a stronger result: there is non-uniformly generatable class and a uniformly generatable class whose union is non-generatable. This adds to the aspects along which language generation in the limit is different from traditional tasks in statistical learning theory like classification, which are closed under finite unions. In particular, it implies that given two generators for different collections, one cannot combine them to obtain a single "more powerful" generator, prohibiting this notion of boosting. Our construction also addresses a third of Li et al.'s open questions on whether there are uncountable classes that are non-uniformly generatable and do not satisfy the eventually unbounded closure (EUC) condition introduced by Li et al. Our approach utilizes carefully constructed classes along with a novel diagonalization argument that could be of independent interest in the growing area of language generation.

NeurIPS 2025Poster

Steve Hanneke et al.

8.8

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge_insulation and open-source model weights are available at https://github.com/Physical-Intelligence/openpi.

NeurIPS 2025Spotlight

Danny Driess et al.

8.8

Fixed-Point RNNs: Interpolating from Diagonal to Dense

Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the state-tracking benchmarks $A_5$ and $S_5$, while matching performance on copying and other tasks.

NeurIPS 2025Spotlight

Sajad Movahedi et al.

8.8

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Unlocking deep and interpretable biological reasoning from complex genomic data remains a major AI challenge limiting scientific progress. While current DNA foundation models excel at representing sequences, they struggle with multi-step reasoning and lack transparent, biologically meaningful explanations. BioReason addresses this by tightly integrating a DNA foundation model with a large language model (LLM), enabling the LLM to directly interpret and reason over genomic information. Through supervised fine-tuning and reinforcement learning, BioReason learns to produce logical, biologically coherent deductions. It achieves major performance gains, boosting KEGG-based disease pathway prediction accuracy from 86% to 98% and improving variant effect prediction by an average of 15% over strong baselines. BioReason can reason over unseen biological entities and explain its decisions step by step, offering a transformative framework for interpretable, mechanistic AI in biology. All data, code, and checkpoints are available at [https://github.com/bowang-lab/BioReason](https://github.com/bowang-lab/BioReason).

NeurIPS 2025Poster

Adibvafa Fallahpour et al.

8.8

Exploitation of a Latent Mechanism in Graph Contrastive Learning: Representation Scattering

Graph Contrastive Learning (GCL) has emerged as a powerful approach for generating graph representations without the need for manual annotation. Most advanced GCL methods fall into three main frameworks: node discrimination, group discrimination, and bootstrapping schemes, all of which achieve comparable performance. However, the underlying mechanisms and factors that contribute to their effectiveness are not yet fully understood. In this paper, we revisit these frameworks and reveal a common mechanism—representation scattering—that significantly enhances their performance. Our discovery highlights an essential feature of GCL and unifies these seemingly disparate methods under the concept of representation scattering. To leverage this insight, we introduce Scattering Graph Representation Learning (SGRL), a novel framework that incorporates a new representation scattering mechanism designed to enhance representation diversity through a center-away strategy. Additionally, consider the interconnected nature of graphs, we develop a topology-based constraint mechanism that integrates graph structural properties with representation scattering to prevent excessive scattering. We extensively evaluate SGRL across various downstream tasks on benchmark datasets, demonstrating its efficacy and superiority over existing GCL methods. Our findings underscore the significance of representation scattering in GCL and provide a structured framework for harnessing this mechanism to advance graph representation learning. The code of SGRL is at https://github.com/hedongxiao-tju/SGRL.

NeurIPS 2024Oral

Dongxiao He et al.

8.7

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the model to do chain-of-thought-type reasoning before answering; (iii) multi-turn, multi-audio chat; (iv) long audio understanding and reasoning (including speech) up to 10 minutes; and (v) voice-to-voice interaction. To enable these capabilities, we propose several large-scale training datasets curated using novel strategies, including AudioSkills-XL, LongAudio-XL, AF-Think, and AF-Chat, and train AF3 with a novel five-stage curriculum-based training strategy. Trained on only open-source audio data, AF3 achieves new SOTA results on over 20+ (long) audio understanding and reasoning benchmarks, surpassing both open-weight and closed-source models trained on much larger datasets.

NeurIPS 2025Spotlight

Sreyan Ghosh et al.

8.7

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance on high-resolution image synthesis. STARFlow's main building block is Transformer Autoregressive Flow (TARFlow), which combines normalizing flows with Autoregressive Transformer architectures and has recently achieved impressive results in image modeling. In this work, we first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce a set of architectural and algorithmic innovations that significantly enhance the scalability: (1) a deep-shallow design where a deep Transformer block captures most of the model’s capacity, followed by a few shallow Transformer blocks that are computationally cheap yet contribute non-negligibly, (2) learning in the latent space of pretrained autoencoders, which proves far more effective than modeling pixels directly, and (3) a novel guidance algorithm that substantially improves sample quality. Crucially, our model remains a single, end-to-end normalizing flow, allowing exact maximum likelihood training in continuous space without discretization. STARFlow achieves competitive results in both class- and text-conditional image generation, with sample quality approaching that of state-of-the-art diffusion models. To our knowledge, this is the **first** successful demonstration of normalizing flows at this scale and resolution. Code and weights available at https://github.com/apple/ml-starflow.

NeurIPS 2025Spotlight

Jiatao Gu et al.

8.7

The Generative Leap: Tight Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace, and we study efficient agnostic estimation procedures for this hidden subspace. We introduce the *generative leap* exponent, a natural extension of the generative exponent from Damian et al. 2024 to the multi-index setting. We show that a sample complexity of $n=\Theta(d^{1 \vee k^\star/2})$ is necessary in the class of algorithms captured by the Low-Degree-Polynomial framework; and also sufficient, by giving a sequential estimation procedure based on a spectral U-statistic over appropriate Hermite tensors.

NeurIPS 2025Spotlight

Alex Damian et al.

8.7

OpenCUA: Open Foundations for Computer-Use Agents

Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs) capable of automating diverse computer tasks. As their commercial potential grows, critical details of the most capable CUA systems remain closed. As these agents will increasingly mediate digital interactions and execute consequential decisions on our behalf, the research community needs access to open CUA frameworks to study their capabilities, limitations, and risks. To bridge this gap, we propose OpenCUA, a comprehensive open-source framework for scaling CUA data and foundation models. Our framework consists of: (1) an annotation infrastructure that seamlessly captures human computer-use demonstrations; (2) AgentNet, the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; (3) a scalable pipeline that transforms demonstrations into state–action pairs with reflective long Chain-of-Thought reasoning that sustain robust performance gains as data scales. Our end-to-end agent models demonstrate strong performance across CUA benchmarks. In particular, OpenCUA-72B achieves an average success rate of 45.0% on OSWorld‑Verified, establishing a new state-of-the-art (SOTA) among open-source models. Further analysis confirms that our approach generalizes well across domains and benefits significantly from increased test-time computation. We release our annotation tool, datasets, code, and models to build open foundations for further CUA research.

NeurIPS 2025Spotlight

Xinyuan Wang et al.

共 9,778 篇论文，第 1 / 489 页

全部论文

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Learning Linear Attention in Polynomial Time

FlexOLMo: Open Language Models for Flexible Data Use

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

Affine-Invariant Global Non-Asymptotic Convergence Analysis of BFGS under Self-Concordance

Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Optimal Mistake Bounds for Transductive Online Learning

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy

On Union-Closedness of Language Generation

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Fixed-Point RNNs: Interpolating from Diagonal to Dense

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Exploitation of a Latent Mechanism in Graph Contrastive Learning: Representation Scattering

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

The Generative Leap: Tight Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

OpenCUA: Open Foundations for Computer-Use Agents

筛选条件

会议

年份

状态