这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 430 results for author: Bai, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.14396  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning

    Authors: Xiuxiu Qi, Yu Yang, Jiannong Cao, Luyao Bai, Chongshan Fan, Chengtai Cao, Hongpeng Wang

    Abstract: Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressi… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/

  2. arXiv:2511.14366  [pdf, ps, other

    cs.CL

    ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

    Authors: Hongwei Liu, Junnan Liu, Shudong Liu, Haodong Duan, Yuqiang Li, Mao Su, Xiaohong Liu, Guangtao Zhai, Xinyu Fang, Qianhong Ma, Taolin Zhang, Zihan Ma, Yufeng Zhao, Peiheng Zhou, Linchen Xiao, Wenlong Zhang, Shijie Zhou, Xingjian Ma, Siqi Sun, Jiaye Ge, Meng Li, Yuhong Liu, Jianxin Dong, Jiaying Li, Hui Wu , et al. (11 additional authors not shown)

    Abstract: The rapid advancement of Large Language Models (LLMs) has led to performance saturation on many established benchmarks, questioning their ability to distinguish frontier models. Concurrently, existing high-difficulty benchmarks often suffer from narrow disciplinary focus, oversimplified answer formats, and vulnerability to data contamination, creating a fidelity gap with real-world scientific inqu… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 39 pages

  3. arXiv:2511.13612  [pdf, ps, other

    cs.LG cs.AI cs.CL

    P1: Mastering Physics Olympiads with Reinforcement Learning

    Authors: Jiacheng Chen, Qianjia Cheng, Fangchen Yu, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Yun Luo, Yufeng Zhao, Futing Wang, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Wenxauan Zeng, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding , et al. (3 additional authors not shown)

    Abstract: Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to a… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  4. arXiv:2511.13035  [pdf, ps, other

    cs.LG cs.AI

    One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow

    Authors: Zeyuan Wang, Da Li, Yulin Chen, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu

    Abstract: We introduce a one-step generative policy for offline reinforcement learning that maps noise directly to actions via a residual reformulation of MeanFlow, making it compatible with Q-learning. While one-step Gaussian policies enable fast inference, they struggle to capture complex, multimodal action distributions. Existing flow-based methods improve expressivity but typically rely on distillation… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: Accepted in AAAI 2026 Poster

  5. arXiv:2511.08291  [pdf, ps, other

    cs.CV

    SynWeather: Weather Observation Data Synthesis across Multiple Regions and Variables via a General Diffusion Transformer

    Authors: Kaiyi Xu, Junchao Gong, Zhiwang Zhou, Zhangrui Li, Yuandong Pu, Yihao Liu, Ben Fei, Fenghua Ling, Wenlong Zhang, Lei Bai

    Abstract: With the advancement of meteorological instruments, abundant data has become available. Current approaches are typically focus on single-variable, single-region tasks and primarily rely on deterministic modeling. This limits unified synthesis across variables and regions, overlooks cross-variable complementarity and often leads to over-smoothed results. To address above challenges, we introduce Sy… ▽ More

    Submitted 14 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI-26 Oral

  6. arXiv:2511.06947  [pdf, ps, other

    cs.CV cs.AI

    FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection

    Authors: Yulin Chen, Zeyuan Wang, Tianyuan Yu, Yingmei Wei, Liang Bai

    Abstract: The well-aligned attribute of CLIP-based models enables its effective application like CLIPscore as a widely adopted image quality assessment metric. However, such a CLIP-based metric is vulnerable for its delicate multimodal alignment. In this work, we propose \textbf{FoCLIP}, a feature-space misalignment framework for fooling CLIP-based image quality metric. Based on the stochastic gradient desc… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: 15 page, 9 figures, published to PRCV

  7. arXiv:2511.05873  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.RO

    EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

    Authors: Tong Chen, Xinyu Ma, Long Bai, Wenyang Wang, Yue Sun, Luping Zhou

    Abstract: Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based f… ▽ More

    Submitted 10 November, 2025; v1 submitted 8 November, 2025; originally announced November 2025.

  8. arXiv:2511.01618  [pdf, ps, other

    cs.CV cs.CL

    Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

    Authors: Xiaoyu Zhan, Wenxuan Huang, Hao Sun, Xinyu Fu, Changfeng Ma, Shaosheng Cao, Bohan Jia, Shaohui Lin, Zhenfei Yin, Lei Bai, Wanli Ouyang, Yuanqi Li, Jie Guo, Yanwen Guo

    Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly improved 2D visual understanding, prompting interest in their application to complex 3D reasoning tasks. However, it remains unclear whether these models can effectively capture the detailed spatial information required for robust real-world performance, especially cross-view consistency, a key requirement for accurate… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  9. arXiv:2511.01409  [pdf, ps, other

    cs.CL

    LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic Knowledge

    Authors: Heng Zhou, Ao Yu, Yuchen Fan, Jianing Shi, Li Kang, Hejia Geng, Yongting Zhang, Yutao Fan, Yuhao Wu, Tiancheng He, Yiran Qin, Lei Bai, Zhenfei Yin

    Abstract: Evaluating large language models (LLMs) on question answering often relies on static benchmarks that reward memorization and understate the role of retrieval, failing to capture the dynamic nature of world knowledge. We present LiveSearchBench, an automated pipeline for constructing retrieval-dependent benchmarks from recent knowledge updates. Our method computes deltas between successive Wikidata… ▽ More

    Submitted 6 November, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

  10. arXiv:2511.00998  [pdf, ps, other

    cs.RO

    GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

    Authors: Ziye Wang, Li Kang, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang

    Abstract: Recently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality.… ▽ More

    Submitted 2 November, 2025; originally announced November 2025.

    Comments: Accepted by NeurIPS 2025. Project page: https://ziyeeee.github.io/gaudp.io/

  11. arXiv:2510.24987  [pdf, ps, other

    q-bio.QM cs.LG q-bio.GN

    scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

    Authors: Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye

    Abstract: Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: Accepted at NeurIPS 2025 (Spotlight)

  12. arXiv:2510.21847  [pdf, ps, other

    cs.LG

    SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization

    Authors: Kaiyi Xu, Junchao Gong, Wenlong Zhang, Ben Fei, Lei Bai, Wanli Ouyang

    Abstract: Precipitation nowcasting based on radar echoes plays a crucial role in monitoring extreme weather and supporting disaster prevention. Although deep learning approaches have achieved significant progress, they still face notable limitations. For example, deterministic models tend to produce over-smoothed predictions, which struggle to capture extreme events and fine-scale precipitation patterns. Pr… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

  13. arXiv:2510.18705  [pdf, ps, other

    cs.CV

    A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition

    Authors: Peiqin Zhuang, Lei Bai, Yichao Wu, Ding Liang, Luping Zhou, Yali Wang, Wanli Ouyang

    Abstract: Recently, action recognition has been dominated by transformer-based methods, thanks to their spatiotemporal contextual aggregation capacities. However, despite the significant progress achieved on scene-related datasets, they do not perform well on motion-sensitive datasets due to the lack of elaborate motion modeling designs. Meanwhile, we observe that the widely-used cost volume in traditional… ▽ More

    Submitted 22 October, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

    Comments: accepted by Pattern Recognition. We have been always curious to see whether our designs could be beneficial in other scenarios, such as embedding it into the DiT model or 3D-VAE for video generation. If you are interested in it, why not give it a shot?

  14. arXiv:2510.16880  [pdf, ps, other

    cs.CE

    Chem-R: Learning to Reason as a Chemist

    Authors: Weida Wang, Benteng Chen, Di Zhang, Wanhao Liu, Shuchen Pu, Ben Gao, Jin Zeng, Xiaoyong Wei, Tianshu Yu, Shuzhou Sun, Tianfan Fu, Wanli Ouyang, Lei Bai, Jiatong Li, Zifu Wang, Yuqiang Li, Shufei Zhang

    Abstract: Although large language models (LLMs) have significant potential to advance chemical discovery, current LLMs lack core chemical knowledge, produce unreliable reasoning trajectories, and exhibit suboptimal performance across diverse chemical tasks. To address these challenges, we propose Chem-R, a generalizable Chemical Reasoning model designed to emulate the deliberative processes of chemists. Che… ▽ More

    Submitted 22 October, 2025; v1 submitted 19 October, 2025; originally announced October 2025.

    Comments: 9 pages, 5 figures, 14 tables

  15. arXiv:2510.15978  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space

    Authors: Junchao Gong, Jingyi Xu, Ben Fei, Fenghua Ling, Wenlong Zhang, Kun Chen, Wanghan Xu, Weidong Yang, Xiaokang Yang, Lei Bai

    Abstract: Weather prediction is a critical task for human society, where impressive progress has been made by training artificial intelligence weather prediction (AIWP) methods with reanalysis data. However, reliance on reanalysis data limits the AIWPs with shortcomings, including data assimilation biases and temporal discrepancies. To liberate AIWPs from the reanalysis data, observation forecasting emerges… ▽ More

    Submitted 12 October, 2025; originally announced October 2025.

    Journal ref: https://neurips.cc/virtual/2025/poster/120074

  16. arXiv:2510.15600  [pdf, ps, other

    cs.AI cs.CL

    Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

    Authors: Haoran Sun, Yankai Jiang, Zhenyu Tang, Yaning Pan, Shuang Gu, Zekai Lin, Lilong Wang, Wenjie Lou, Lei Liu, Lei Bai, Xiaosong Wang

    Abstract: The foundation of reproducible science lies in protocols that are precise, logically ordered, and executable. The autonomous generation of these protocols through natural language queries could greatly improve the efficiency of the reproduction process. However, current leading large language models (LLMs) often generate incomplete or inconsistent protocols, limiting their utility. To address this… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

  17. arXiv:2510.15232  [pdf, ps, other

    cs.LG cs.CL

    FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

    Authors: Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao

    Abstract: Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of al… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

    Comments: EMNLP 2025 Main

  18. arXiv:2510.13451  [pdf, ps, other

    cs.CR

    Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts

    Authors: Li Bai, Qingqing Ye, Xinwei Zhang, Sen Zhang, Zi Liang, Jianliang Xu, Haibo Hu

    Abstract: Machine learning models are often vulnerable to inference attacks that expose sensitive information from their training data. Shadow model technique is commonly employed in such attacks, such as membership inference. However, the need for a large number of shadow models leads to high computational costs, limiting their practical applicability. Such inefficiency mainly stems from the independent tr… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

    Comments: To appear in NeurIPS 2025

  19. arXiv:2510.08959  [pdf, ps, other

    cs.AI

    DualResearch: Entropy-Gated Dual-Graph Retrieval for Answer Reconstruction

    Authors: Jinxin Shi, Zongsheng Cao, Runmin Ma, Yusong Hu, Jie Zhou, Xin Li, Lei Bai, Liang He, Bo Zhang

    Abstract: The deep-research framework orchestrates external tools to perform complex, multi-step scientific reasoning that exceeds the native limits of a single large language model. However, it still suffers from context pollution, weak evidentiary support, and brittle execution paths. To address these issues, we propose DualResearch, a retrieval and fusion framework that matches the epistemic structure of… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

    Comments: 16 pages, 6 figures, 5 tables, Under Review

  20. arXiv:2510.08529  [pdf, ps, other

    cs.CL cs.AI

    CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

    Authors: Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, Lei Bai

    Abstract: Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  21. arXiv:2510.08521  [pdf, ps, other

    cs.AI

    FlowSearch: Advancing deep research with dynamic structured knowledge flow

    Authors: Yusong Hu, Runmin Ma, Yue Fan, Jinxin Shi, Zongsheng Cao, Yuhao Zhou, Jiakang Yuan, Xiangchao Yan, Wenlong Zhang, Lei Bai, Bo Zhang

    Abstract: Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to dri… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  22. arXiv:2510.08511  [pdf, ps, other

    cs.AI cs.CL cs.LG

    AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents

    Authors: Shangheng Du, Xiangchao Yan, Dengyang Jiang, Jiakang Yuan, Yusong Hu, Xin Li, Liang He, Bo Zhang, Lei Bai

    Abstract: Large language models (LLMs) have shown impressive performance in general programming tasks. However, in Machine Learning Engineering (MLE) scenarios such as AutoML and Kaggle competitions, achieving high performance depends heavily on expert intervention and repeated adjustments rather than simply generating correct code. When applied directly to these tasks, LLMs often lack fine-grained domain p… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  23. arXiv:2510.07961  [pdf, ps, other

    cs.CV

    Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

    Authors: Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, Lei Bai, Zheng-Jun Zha

    Abstract: Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage… ▽ More

    Submitted 24 October, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  24. arXiv:2510.04006  [pdf, ps, other

    cs.LG nlin.CD physics.ao-ph

    Incorporating Multivariate Consistency in ML-Based Weather Forecasting with Latent-space Constraints

    Authors: Hang Fan, Yi Xiao, Yongquan Qu, Fenghua Ling, Ben Fei, Lei Bai, Pierre Gentine

    Abstract: Data-driven machine learning (ML) models have recently shown promise in surpassing traditional physics-based approaches for weather forecasting, leading to a so-called second revolution in weather forecasting. However, most ML-based forecast models treat reanalysis as the truth and are trained under variable-specific loss weighting, ignoring their physical coupling and spatial structure. Over long… ▽ More

    Submitted 4 October, 2025; originally announced October 2025.

  25. arXiv:2510.02752  [pdf, ps, other

    cs.CL

    The Path of Self-Evolving Large Language Models: Achieving Data-Efficient Learning via Intrinsic Feedback

    Authors: Hangfan Zhang, Siyuan Xu, Zhimeng Guo, Huaisheng Zhu, Shicheng Liu, Xinrun Wang, Qiaosheng Zhang, Yang Chen, Peng Ye, Lei Bai, Shuyue Hu

    Abstract: Reinforcement learning (RL) has demonstrated potential in enhancing the reasoning capabilities of large language models (LLMs), but such training typically demands substantial efforts in creating and annotating data. In this work, we explore improving LLMs through RL with minimal data. Our approach alternates between the LLM proposing a task and then attempting to solve it. To minimize data depend… ▽ More

    Submitted 3 October, 2025; originally announced October 2025.

  26. arXiv:2510.00844  [pdf, ps, other

    cs.AI

    Learning Compact Representations of LLM Abilities via Item Response Theory

    Authors: Jianhao Chen, Chenxu Wang, Gengrui Zhang, Peng Ye, Lei Bai, Wei Hu, Yuzhong Qu, Shuyue Hu

    Abstract: Recent years have witnessed a surge in the number of large language models (LLMs), yet efficiently managing and utilizing these vast resources remains a significant challenge. In this work, we explore how to learn compact representations of LLM abilities that can facilitate downstream tasks, such as model routing and performance prediction on new benchmarks. We frame this problem as estimating the… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

  27. arXiv:2509.26016  [pdf, ps, other

    cs.CV

    GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data

    Authors: Lubian Bai, Xiuyuan Zhang, Siqi Zhang, Zepeng Zhang, Haoyu Wang, Wei Qin, Shihong Du

    Abstract: Integrating ground-level geospatial data with rich geographic context, like OpenStreetMap (OSM), into remote sensing (RS) foundation models (FMs) is essential for advancing geospatial intelligence and supporting a broad spectrum of tasks. However, modality gap between RS and OSM data, including differences in data structure, content, and spatial granularity, makes effective synergy highly challeng… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: NeurIPS 2025

  28. arXiv:2509.25300  [pdf, ps, other

    cs.LG cs.AI

    Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

    Authors: Zelin Tan, Hejia Geng, Mulei Zhang, Xiaohang Yu, Guancheng Wan, Yifan Zhou, Qiang He, Xiangyuan Xue, Heng Zhou, Yutao Fan, Zhongzhi Li, Zaibin Zhang, Guibin Zhang, Chen Zhang, Zhenfei Yin, Lei Bai

    Abstract: While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper presents a systematic empirical investigation of scaling behaviors in RL-based post-training, with a particular focus on mathematical reasoning. Based on 54 experiments across diverse model sizes… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: V1 version

  29. arXiv:2509.25210  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting

    Authors: Hao Chen, Tao Han, Jie Zhang, Song Guo, Lei Bai

    Abstract: To gain finer regional forecasts, many works have explored the regional integration from the global atmosphere, e.g., by solving boundary equations in physics-based methods or cropping regions from global forecasts in data-driven methods. However, the effectiveness of these methods is often constrained by static and imprecise regional boundaries, resulting in poor generalization ability. To addres… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  30. arXiv:2509.24855  [pdf, ps, other

    cs.AI

    PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System

    Authors: Fangchen Yu, Junchi Yao, Ziyi Wang, Haiyuan Wan, Youling Huang, Bo Zhang, Shuyue Hu, Dongzhan Zhou, Ning Ding, Ganqu Cui, Lei Bai, Wanli Ouyang, Peng Ye

    Abstract: Physics is central to understanding and shaping the real world, and the ability to solve physics problems is a key indicator of real-world physical intelligence. Physics Olympiads, renowned as the crown of competitive physics, provide a rigorous testbed requiring complex reasoning and deep multimodal understanding, yet they remain largely underexplored in AI research. Existing approaches are predo… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  31. arXiv:2509.24771  [pdf, ps, other

    cs.CL

    LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space

    Authors: Guibin Zhang, Fanci Meng, Guancheng Wan, Zherui Li, Kun Wang, Zhenfei Yin, Lei Bai, Shuicheng Yan

    Abstract: Test-time Scaling (TTS) has been demonstrated to significantly enhance the reasoning capabilities of Large Language Models (LLMs) during the inference phase without altering model parameters. However, existing TTS methods are largely independent, implying that LLMs have not yet evolved to progressively learn how to scale more effectively. With the objective of evolving LLMs to learn ``how to scale… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  32. arXiv:2509.24285  [pdf, ps, other

    cs.AI cs.CL cs.LG

    SCI-Verifier: Scientific Verifier with Thinking

    Authors: Shenghe Zheng, Chenyu Huang, Fangchen Yu, Junchi Yao, Jingqi Ye, Tao Chen, Yun Luo, Ning Ding, LEI BAI, Ganqu Cui, Peng Ye

    Abstract: As large language models (LLMs) are increasingly applied to scientific reasoning, the complexity of answer formats and the diversity of equivalent expressions make answer verification a critical yet challenging task. Existing verification studies in scientific domains suffer from two major limitations: (a) the absence of systematic evaluation standards and insufficient disciplinary coverage, which… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: This paper focuses on LLM-as-a-Judge, and the project is currently in progress

  33. arXiv:2509.23915  [pdf, ps, other

    cs.CV

    Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis

    Authors: Yihang Guo, Tianyuan Yu, Liang Bai, Yanming Guo, Yirun Ruan, William Li, Weishi Zheng

    Abstract: Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly. While promising, its potential is often hindered by "unbalanced optimization", where task interference leads to subpar performance compared to single-task models. To facilitate research in MTL, this paper presents a systematic experimental analysis to dissect the f… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  34. arXiv:2509.23141  [pdf, ps, other

    cs.CV

    Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

    Authors: Peilin Feng, Zhutao Lv, Junyan Ye, Xiaolei Wang, Xinjie Huo, Jinhua Yu, Wanghan Xu, Wenlong Zhang, Lei Bai, Conghui He, Weijia Li

    Abstract: Earth observation (EO) is essential for understanding the evolving states of the Earth system. Although recent MLLMs have advanced EO research, they still lack the capability to tackle complex tasks that require multi-step reasoning and the use of domain-specific tools. Agent-based methods offer a promising direction, but current attempts remain in their infancy, confined to RGB perception, shallo… ▽ More

    Submitted 16 October, 2025; v1 submitted 27 September, 2025; originally announced September 2025.

  35. arXiv:2509.22186  [pdf, ps, other

    cs.CV cs.CL

    MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

    Authors: Junbo Niu, Zheng Liu, Zhuangcheng Gu, Bin Wang, Linke Ouyang, Zhiyuan Zhao, Tao Chu, Tianyao He, Fan Wu, Qintong Zhang, Zhenjiang Jin, Guang Liang, Rui Zhang, Wenzheng Zhang, Yuan Qu, Zhifei Ren, Yuefeng Sun, Yuanhong Zheng, Dongsheng Ma, Zirui Tang, Boyu Niu, Ziyang Miao, Hejun Dong, Siyi Qian, Junyuan Zhang , et al. (36 additional authors not shown)

    Abstract: We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsamp… ▽ More

    Submitted 29 September, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

    Comments: Technical Report; GitHub Repo: https://github.com/opendatalab/MinerU Hugging Face Model: https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B Hugging Face Demo: https://huggingface.co/spaces/opendatalab/MinerU

  36. arXiv:2509.21320  [pdf, ps, other

    cs.CL

    SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

    Authors: Yizhou Wang, Chen Tang, Han Deng, Jiabei Xiao, Jiaqi Liu, Jianyu Wu, Jun Yao, Pengze Li, Encheng Su, Lintao Wang, Guohang Zhuang, Yuchen Ren, Ben Fei, Ming Hu, Xin Chen, Dongzhan Zhou, Junjun He, Xiangyu Yue, Zhenfei Yin, Jiamin Wu, Qihao Zheng, Yuhao Zhou, Huihui Xu, Chenglong Ma, Yan Lu , et al. (7 additional authors not shown)

    Abstract: We present a scientific reasoning foundation model that aligns natural language with heterogeneous scientific representations. The model is pretrained on a 206B-token corpus spanning scientific text, pure sequences, and sequence-text pairs, then aligned via SFT on 40M instructions, annealed cold-start bootstrapping to elicit long-form chain-of-thought, and reinforcement learning with task-specific… ▽ More

    Submitted 29 October, 2025; v1 submitted 25 September, 2025; originally announced September 2025.

    Comments: technical report

  37. arXiv:2509.21193  [pdf, ps, other

    cs.CL cs.AI

    Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning

    Authors: Xiangru Tang, Wanghan Xu, Yujie Wang, Zijie Guo, Daniel Shao, Jiapeng Chen, Cixuan Zhang, Ziyi Wang, Lixin Zhang, Guancheng Wan, Wenlong Zhang, Lei Bai, Zhenfei Yin, Philip Torr, Hanrui Wang, Di Jin

    Abstract: Large language models (LLMs) have recently shown strong progress on scientific reasoning, yet two major bottlenecks remain. First, explicit retrieval fragments reasoning, imposing a hidden "tool tax" of extra tokens and steps. Second, multi-agent pipelines often dilute strong solutions by averaging across all candidates. We address these challenges with a unified framework that combines implicit r… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  38. arXiv:2509.21129  [pdf, ps, other

    cs.LG cs.CR

    EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense

    Authors: Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin

    Abstract: Modern email spam and phishing attacks have evolved far beyond keyword blacklists or simple heuristics. Adversaries now craft multi-modal campaigns that combine natural-language text with obfuscated URLs, forged headers, and malicious attachments, adapting their strategies within days to bypass filters. Traditional spam detection systems, which rely on static rules or single-modality models, strug… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  39. arXiv:2509.18578  [pdf, ps, other

    cs.CR

    MER-Inspector: Assessing model extraction risks from an attack-agnostic perspective

    Authors: Xinwei Zhang, Haibo Hu, Qingqing Ye, Li Bai, Huadi Zheng

    Abstract: Information leakage issues in machine learning-based Web applications have attracted increasing attention. While the risk of data privacy leakage has been rigorously analyzed, the theory of model function leakage, known as Model Extraction Attacks (MEAs), has not been well studied. In this paper, we are the first to understand MEAs theoretically from an attack-agnostic perspective and to propose a… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: Published in ACM WWW 2025

  40. COLA: Context-aware Language-driven Test-time Adaptation

    Authors: Aiming Zhang, Tianyuan Yu, Liang Bai, Jun Tang, Yanming Guo, Yirun Ruan, Yun Zhou, Zhihe Lu

    Abstract: Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing ``distribution shift'' issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capabl… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Journal ref: IEEE Trans. Image Process. (2025)

  41. arXiv:2509.15185  [pdf, ps, other

    cs.CV

    Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

    Authors: Xiaoyu Yue, Zidong Wang, Yuqing Wang, Wenlong Zhang, Xihui Liu, Wanli Ouyang, Lei Bai, Luping Zhou

    Abstract: Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the n… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: Accepted by NeurIPS 2025

  42. arXiv:2509.14201  [pdf, ps, other

    eess.SP cs.NI eess.SY

    Active Inference Framework for Closed-Loop Sensing, Communication, and Control in UAV Systems

    Authors: Guangjin Pan, Liping Bai, Zhuojun Tian, Hui Chen, Mehdi Bennis, Henk Wymeersch

    Abstract: Integrated sensing and communication (ISAC) is a core technology for 6G, and its application to closed-loop sensing, communication, and control (SCC) enables various services. Existing SCC solutions often treat sensing and control separately, leading to suboptimal performance and resource usage. In this work, we introduce the active inference framework (AIF) into SCC-enabled unmanned aerial vehicl… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: 5 pages, 2 figures

  43. arXiv:2509.10441  [pdf, ps, other

    cs.CV

    InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

    Authors: Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai

    Abstract: Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent gen… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: Accepted by ICCV 2025

  44. arXiv:2509.08736  [pdf, ps, other

    cs.LG

    ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

    Authors: Dong Han, Zhehong Ai, Pengxiang Cai, Shanya Lu, Jianpeng Chen, Zihao Ye, Shuzhou Sun, Ben Gao, Lingli Ge, Weida Wang, Xiangxin Zhou, Xihui Liu, Mao Su, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Tao Xu, Yuqiang Li, Shufei Zhang

    Abstract: Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LL… ▽ More

    Submitted 10 November, 2025; v1 submitted 10 September, 2025; originally announced September 2025.

  45. arXiv:2509.07894  [pdf, ps, other

    cs.AI

    HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

    Authors: Fangchen Yu, Haiyuan Wan, Qianjia Cheng, Yuchen Zhang, Jiacheng Chen, Fujun Han, Yulun Wu, Junchi Yao, Ruilizhen Hu, Ning Ding, Yu Cheng, Tao Chen, Lei Bai, Dongzhan Zhou, Yun Luo, Ganqu Cui, Peng Ye

    Abstract: Recently, the physical capabilities of (M)LLMs have garnered increasing attention. However, existing benchmarks for physics suffer from two major gaps: they neither provide systematic and up-to-date coverage of real-world physics competitions such as physics Olympiads, nor enable direct performance comparison with humans. To bridge these gaps, we present HiPhO, the first benchmark dedicated to hig… ▽ More

    Submitted 19 September, 2025; v1 submitted 9 September, 2025; originally announced September 2025.

  46. arXiv:2509.04394  [pdf, ps, other

    cs.LG cs.CV

    Transition Models: Rethinking the Generative Learning Objective

    Authors: Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai

    Abstract: A fundamental dilemma in generative modeling persists: iterative diffusion models achieve outstanding fidelity, but at a significant computational cost, while efficient few-step alternatives are constrained by a hard quality ceiling. This conflict between generation steps and output quality arises from restrictive training objectives that focus exclusively on either infinitesimal dynamics (PF-ODEs… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

    Comments: The code is released at https://github.com/WZDTHU/TiM

  47. arXiv:2509.03044   

    cs.CV

    DCDB: Dynamic Conditional Dual Diffusion Bridge for Ill-posed Multi-Tasks

    Authors: Chengjie Huang, Jiafeng Yan, Jing Li, Lu Bai

    Abstract: Conditional diffusion models have made impressive progress in the field of image processing, but the characteristics of constructing data distribution pathways make it difficult to exploit the intrinsic correlation between tasks in multi-task scenarios, which is even worse in ill-posed tasks with a lack of training data. In addition, traditional static condition control makes it difficult for netw… ▽ More

    Submitted 8 November, 2025; v1 submitted 3 September, 2025; originally announced September 2025.

    Comments: The article contains factual errors

  48. arXiv:2509.02547  [pdf, ps, other

    cs.AI cs.CL

    The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

    Authors: Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Francisco Piedrahita-Velez, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Jun Wang, Shuicheng Yan, Philip Torr, Lei Bai

    Abstract: The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Proc… ▽ More

    Submitted 8 November, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

  49. arXiv:2508.21148  [pdf, ps, other

    cs.CL cs.AI

    A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

    Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su , et al. (95 additional authors not shown)

    Abstract: Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a un… ▽ More

    Submitted 18 October, 2025; v1 submitted 28 August, 2025; originally announced August 2025.

  50. arXiv:2508.18124  [pdf, ps, other

    cs.LG cs.AI

    CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

    Authors: Weida Wang, Dongchen Huang, Jiatong Li, Tengchao Yang, Ziyang Zheng, Di Zhang, Dong Han, Benteng Chen, Binzhao Luo, Zhiyu Liu, Kunling Liu, Zhiyuan Gao, Shiqi Geng, Wei Ma, Jiaming Su, Xin Li, Shuchen Pu, Yuhan Shui, Qianjia Cheng, Zhihao Dou, Dongfei Cui, Changyong He, Jin Zeng, Zeke Xie, Mao Su , et al. (10 additional authors not shown)

    Abstract: We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated sys… ▽ More

    Submitted 29 August, 2025; v1 submitted 25 August, 2025; originally announced August 2025.

    Comments: 29 pages, 7 figures