这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 6,269 results for author: Wang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15605  [pdf, ps, other

    cs.RO cs.CL cs.CV

    SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

    Authors: Senyu Fei, Siyin Wang, Li Ji, Ao Li, Shiduo Zhang, Liming Liu, Jinlong Hou, Jingjing Gong, Xianzhong Zhao, Xipeng Qiu

    Abstract: Vision-Language-Action (VLA) models excel in robotic manipulation but are constrained by their heavy reliance on expert demonstrations, leading to demonstration bias and limiting performance. Reinforcement learning (RL) is a vital post-training strategy to overcome these limits, yet current VLA-RL methods, including group-based optimization approaches, are crippled by severe reward sparsity. Relyi… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  2. arXiv:2511.15580  [pdf, ps, other

    cs.CV cs.AI

    CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

    Authors: Sifan Zhou, Yichao Cao, Jiahao Nie, Yuqian Fu, Ziyu Zhao, Xiaobo Lu, Shuo Wang

    Abstract: 3D single object tracking (SOT) in LiDAR point clouds is a critical task in computer vision and autonomous driving. Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge that limits existing trackers: (1) vast spatial redundancy from background noise impairs accuracy, and (2) informational redundancy within the foreground hinders e… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026 (Oral)

  3. arXiv:2511.15203  [pdf, ps, other

    cs.CR cs.AI

    Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

    Authors: Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang

    Abstract: Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (S… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  4. CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations

    Authors: Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue Lu, Mingyu Chen

    Abstract: Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hiding memory latency, but they struggle to balance latency-hiding efficiency with runtime overhead. We present CoroAMU, a hardware-software co-designed system for memory-centric coroutines. It introduce… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Journal ref: Proceedings of the 2025 International Conference on Parallel Architecture and Compilation (PACT). USA: IEEE Computer Society, 2025, p. 431-444

  5. arXiv:2511.14510  [pdf, ps, other

    cs.LG

    CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design

    Authors: Jiawei Yi, Ping Gong, Youhui Bai, Jiaqi Ruan, Shengnan Wang, Pengcheng Wang, Haibo Wang, Weiguang Wang, Xia Zhu, Feng Wu, Cheng Li

    Abstract: The growth of million-token LLMs exposes the scalability limits of inference systems, where the KVCache dominates memory usage and data transfer overhead. Recent offloading systems migrate the KVCache to CPU memory and incorporate top-k attention to reduce the volume of data transferred from the CPU, while further applying system-level optimizations such as on-GPU caching and prefetching to lower… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  6. arXiv:2511.14414  [pdf, ps, other

    cs.HC

    PACEE: Supporting Children's Personal Emotion Education through Parent-AI Collaboration

    Authors: Yu Mei, Xutong Wang, Ziyao Zhang, Yiming Fu, Shiyi Wang, Qingyang Wan, Qinghuan Lan, Chang Liu, Jie Cai, Chun Yu, Yuanchun Shi

    Abstract: Emotion education is a crucial lesson for children aged 3 to 6. However, existing technologies primarily focus on promoting emotion education from the child's perspective, often neglecting the central role of parents in guiding early childhood emotion development. In this work, we conducted co-design sessions with five experienced kindergarten teachers and five parents to identify parental challen… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  7. arXiv:2511.14348  [pdf, ps, other

    cs.LG physics.comp-ph

    Enforcing hidden physics in physics-informed neural networks

    Authors: Nanxi Chen, Sifan Wang, Rujin Ma, Airong Chen, Chuanjie Cui

    Abstract: Physics-informed neural networks (PINNs) represent a new paradigm for solving partial differential equations (PDEs) by integrating physical laws into the learning process of neural networks. However, despite their foundational role, the hidden irreversibility implied by the Second Law of Thermodynamics is often neglected during training, leading to unphysical solutions or even training failures in… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  8. arXiv:2511.14310  [pdf, ps, other

    cs.CV

    Iterative Diffusion-Refined Neural Attenuation Fields for Multi-Source Stationary CT Reconstruction: NAF Meets Diffusion Model

    Authors: Jiancheng Fang, Shaoyu Wang, Junlin Wang, Weiwen Wu, Yikun Zhang, Qiegen Liu

    Abstract: Multi-source stationary computed tomography (CT) has recently attracted attention for its ability to achieve rapid image reconstruction, making it suitable for time-sensitive clinical and industrial applications. However, practical systems are often constrained by ultra-sparse-view sampling, which significantly degrades reconstruction quality. Traditional methods struggle under ultra-sparse-view s… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  9. arXiv:2511.14256  [pdf, ps, other

    cs.AI cs.IR

    PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models

    Authors: Yu Liu, Xixun Lin, Yanmin Shang, Yangxi Li, Shi Wang, Yanan Cao

    Abstract: Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkable performance in complex reasoning tasks. Despite promising success, current LLM-based KGR methods still face two critical limitations. First, existing methods often extract reasoning paths indiscriminately, w… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: AAAI 2026, Long Paper, Oral

  10. arXiv:2511.14227  [pdf, ps, other

    cs.AI cs.LG

    DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home

    Authors: Yuxiang Wang, Siwen Wang, Haowei Han, Ao Wang, Boya Liu, Yong Zhao, Chengbo Wu, Bin Zhu, Bin Qin, Xiaokai Zhou, Xiao Yan, Jiawei Jiang, Bo Du

    Abstract: Operation recommendation for IoT devices refers to generating personalized device operations for users based on their context, such as historical operations, environment information, and device status. This task is crucial for enhancing user satisfaction and corporate profits. Existing recommendation models struggle with complex operation logic, diverse user preferences, and sensitive to suboptima… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  11. arXiv:2511.14157  [pdf, ps, other

    cs.CV

    Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing

    Authors: Xun Lin, Shuai Wang, Yi Yu, Zitong Yu, Jiale Zhou, Yizhong Liu, Xiaochun Cao, Alex Kot, Yefeng Zheng

    Abstract: Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation than unimodal FAS when deployed in unseen domains. This is mainly due to two overlooked risks that affect cross-domain multimodal generalization. The first is the modal representation invariant risk, i.e., whether representations remain generalizable under… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  12. arXiv:2511.14131  [pdf, ps, other

    cs.AI

    Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation

    Authors: Yu Zhong, Zihao Zhang, Rui Zhang, Lingdong Huang, Haihan Gao, Shuo Wang, Da Li, Ruijian Han, Jiaming Guo, Shaohui Peng, Di Huang, Yunji Chen

    Abstract: Vision-and-Language Navigation (VLN) requires an agent to dynamically explore complex 3D environments following human instructions. Recent research underscores the potential of harnessing large language models (LLMs) for VLN, given their commonsense knowledge and general reasoning capabilities. Despite their strengths, a substantial gap in task completion performance persists between LLM-based app… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  13. arXiv:2511.13998  [pdf, ps, other

    cs.SE cs.AI

    LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering

    Authors: Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Jianguo Zhang, Haolin Chen, Shiyu Wang, Ming Zhu, Liangwei Yang, Juntao Tan, Roshan Ram, Akshara Prabhakar, Tulika Awalgaonkar, Zixiang Chen, Zhepeng Cen, Cheng Qian, Shelby Heinecke, Weiran Yao, Silvio Savarese, Caiming Xiong, Huan Wang

    Abstract: As large language models (LLMs) evolve into sophisticated autonomous agents capable of complex software development tasks, evaluating their real-world capabilities becomes critical. While existing benchmarks like LoCoBench~\cite{qiu2025locobench} assess long-context code understanding, they focus on single-turn evaluation and cannot capture the multi-turn interactive nature, tool usage patterns, a… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 54-pages

  14. arXiv:2511.13757  [pdf, ps, other

    cs.LG cs.AI

    VitalBench: A Rigorous Multi-Center Benchmark for Long-Term Vital Sign Prediction in Intraoperative Care

    Authors: Xiuding Cai, Xueyao Wang, Sen Wang, Yaoyao Zhu, Jiao Chen, Yu Yao

    Abstract: Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical time-series forecasting, several challenges persist, including the lack of standardized benchmarks, incomplete data, and limited cross-center validation. To address these challenges, we introduce VitalBench, a… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: Accepted by IEEE Sensors Journal

  15. arXiv:2511.13326  [pdf, ps, other

    stat.AP cs.AI

    TacEleven: generative tactic discovery for football open play

    Authors: Siyao Zhao, Hao Ma, Zhiqiang Pu, Jingjing Huang, Yi Pan, Shijie Wang, Zhi Ming

    Abstract: Creating offensive advantages during open play is fundamental to football success. However, due to the highly dynamic and long-sequence nature of open play, the potential tactic space grows exponentially as the sequence progresses, making automated tactic discovery extremely challenging. To address this, we propose TacEleven, a generative framework for football open-play tactic discovery developed… ▽ More

    Submitted 18 November, 2025; v1 submitted 17 November, 2025; originally announced November 2025.

  16. arXiv:2511.13246  [pdf, ps, other

    cs.CR cs.NI

    A Secure Semantic Communication System Based on Knowledge Graph

    Authors: Qin Guo, Haonan Tong, Sihua Wang, Peiyuan Si, Jun Zhao, Changchuan Yin

    Abstract: This study proposes a novel approach to ensure the security of textual data transmission in a semantic communication system. In the proposed system, a sender transmits textual information to a receiver, while a potential eavesdropper attempts to intercept the information. At the sender side, the text is initially preprocessed, where each sentence is annotated with its corresponding topic, and subs… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: accepted by IEEE Journal of Communications and Networks (JCN)

  17. arXiv:2511.13133  [pdf, ps, other

    cs.LG cs.AI

    Soft Conflict-Resolution Decision Transformer for Offline Multi-Task Reinforcement Learning

    Authors: Shudong Wang, Xinfei Wang, Chenhao Zhang, Shanchen Pang, Haiyuan Gui, Wenhao Ji, Xiaojian Liao

    Abstract: Multi-task reinforcement learning (MTRL) seeks to learn a unified policy for diverse tasks, but often suffers from gradient conflicts across tasks. Existing masking-based methods attempt to mitigate such conflicts by assigning task-specific parameter masks. However, our empirical study shows that coarse-grained binary masks have the problem of over-suppressing key conflicting parameters, hindering… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  18. arXiv:2511.13118  [pdf, ps, other

    cs.CL cs.AI

    Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction

    Authors: Quanjiang Guo, Sijie Wang, Jinchuan Zhang, Ben Zhang, Zhao Kang, Ling Tian, Ke Yan

    Abstract: Zero-shot event extraction (ZSEE) remains a significant challenge for large language models (LLMs) due to the need for complex reasoning and domain-specific understanding. Direct prompting often yields incomplete or structurally invalid outputs--such as misclassified triggers, missing arguments, and schema violations. To address these limitations, we present Agent-Event-Coder (AEC), a novel multi-… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 11 pages, 5 figures, accepted by AAAI 2026 (Oral)

  19. arXiv:2511.12988  [pdf, ps, other

    cs.CV cs.AI

    UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

    Authors: Furui Xu, Shaobo Wang, Jiajun Zhang, Chenghao Sun, Haixiang Tang, Linfeng Zhang

    Abstract: The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable performance. Previous approaches typically establish scoring metrics based on specific criteria to identify representative samples. However, these methods predominantly re… ▽ More

    Submitted 17 November, 2025; v1 submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026, 13 pages, 9 figures, 5 tables

  20. arXiv:2511.12941  [pdf, ps, other

    cs.RO

    GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving

    Authors: Chunyong Hu, Qi Luo, Jianyun Xu, Song Wang, Qiang Li, Sheng Yang

    Abstract: In the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  21. arXiv:2511.12639  [pdf, ps, other

    cs.CV

    Medical Knowledge Intervention Prompt Tuning for Medical Image Classification

    Authors: Ye Du, Nanxi Yu, Shujun Wang

    Abstract: Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning these models is resource-intensive due to their large number of parameters. Prompt tuning has emerged as a viable solution to mitigate memory usage and reduce training time while maintaining competitive performan… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: IEEE Transactions on Medical Imaging (Early Access) July 2025

  22. arXiv:2511.12502  [pdf, ps, other

    cs.LG cs.CV

    BSO: Binary Spiking Online Optimization Algorithm

    Authors: Yu Liang, Yu Yang, Wenjie Wei, Ammar Belatreche, Shuai Wang, Malu Zhang, Yang Yang

    Abstract: Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly red… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  23. arXiv:2511.12188  [pdf, ps, other

    cs.LG

    Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?

    Authors: Xuanyu Chen, Nan Yang, Shuai Wang, Dong Yuan

    Abstract: The recent success of large language models (LLMs) has sparked a growing interest in training large-scale models. As the model size continues to scale, concerns are growing about the depletion of high-quality, well-curated training data. This has led practitioners to explore training approaches like Federated Learning (FL), which can leverage the abundant data on edge devices while maintaining pri… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

    Comments: The extended version of the paper "Scaling Law Analysis in Federated Learning: How to Select the Optimal Model Size?". Accepted by AAAI2026

  24. arXiv:2511.12180  [pdf, ps, other

    cs.LG stat.ML

    Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering

    Authors: Ge Cheng, Shuo Wang, Yun Zhang

    Abstract: Contrastive learning has emerged as a cornerstone of unsupervised representation learning across vision, language, and graph domains, with InfoNCE as its dominant objective. Despite its empirical success, the theoretical underpinnings of InfoNCE remain limited. In this work, we introduce an explicit feature space to model augmented views of samples and a transition probability matrix to capture da… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

    Comments: 31 pages, 8 figures

  25. arXiv:2511.12169  [pdf, ps, other

    cs.AI

    Incremental Maintenance of DatalogMTL Materialisations

    Authors: Kaiyue Zhao, Dingqi Chen, Shaoyu Wang, Pan Hu

    Abstract: DatalogMTL extends the classical Datalog language with metric temporal logic (MTL), enabling expressive reasoning over temporal data. While existing reasoning approaches, such as materialisation based and automata based methods, offer soundness and completeness, they lack support for handling efficient dynamic updates, a crucial requirement for real-world applications that involve frequent data up… ▽ More

    Submitted 19 November, 2025; v1 submitted 15 November, 2025; originally announced November 2025.

    Comments: Accepted as oral paper at the main track of AAAI 2026

  26. arXiv:2511.12114  [pdf, ps, other

    cs.IR

    Continuous-time Discrete-space Diffusion Model for Recommendation

    Authors: Chengyi Liu, Xiao Chen, Shijie Wang, Wenqi Fan, Qing Li

    Abstract: In the era of information explosion, Recommender Systems (RS) are essential for alleviating information overload and providing personalized user experiences. Recent advances in diffusion-based generative recommenders have shown promise in capturing the dynamic nature of user preferences. These approaches explore a broader range of user interests by progressively perturbing the distribution of user… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

    Comments: Accepted by WSDM 2026

  27. arXiv:2511.12056  [pdf, ps, other

    cs.CV cs.AI cs.DC

    PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

    Authors: Sijie Wang, Qiang Wang, Shaohuai Shi

    Abstract: Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered by slow inference speeds and high memory con- sumption. In this paper, we propose a novel pipelining frame- work named PipeDiT to accelerate video generation, which is equipped with three main innovations. Fir… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  28. arXiv:2511.12046  [pdf, ps, other

    cs.CR cs.AI cs.CV cs.LG

    BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

    Authors: Shanmin Wang, Dongdong Zhao

    Abstract: Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks -- most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability,… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  29. arXiv:2511.11990  [pdf, ps, other

    cs.AI

    Improving Autoformalization Using Direct Dependency Retrieval

    Authors: Shaoqi Wang, Lu Yu, Chunjie Yang

    Abstract: The convergence of deep learning and formal mathematics has spurred research in formal verification. Statement autoformalization, a crucial first step in this process, aims to translate informal descriptions into machine-verifiable representations but remains a significant challenge. The core difficulty lies in the fact that existing methods often suffer from a lack of contextual awareness, leadin… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  30. arXiv:2511.11793  [pdf, ps, other

    cs.CL

    MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

    Authors: MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Wenhan Dou, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li , et al. (30 additional authors not shown)

    Abstract: We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of p… ▽ More

    Submitted 18 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

    Comments: Technical Report

  31. arXiv:2511.11770  [pdf, ps, other

    cs.AI cs.LG

    Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

    Authors: Floris Vossebeld, Shenghui Wang

    Abstract: Generating complex, logically-sound SPARQL queries for multi-hop questions remains a critical bottleneck for Knowledge Graph Question Answering, as the brittle nature of one-shot generation by Large Language Models (LLMs) hinders reliable interaction with structured data. Current methods lack the adaptive policies needed to dynamically debug queries based on real-time execution feedback. This pape… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    MSC Class: 68P20; 68T42 ACM Class: H.3.3; I.2.4

  32. arXiv:2511.11672  [pdf, ps, other

    cs.DC

    OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents

    Authors: Zengyi Qin, Jinyuan Chen, Yunze Man, Shengcao Cao, Ziqi Pang, Zhuoyuan Wang, Xin Sun, Gen Lin, Han Fang, Ling Zhu, Zixin Xie, Zibu Wei, Tianshu Ran, Haoran Geng, Xander Wu, Zachary Bright, Qizhen Sun, Rui Wang, Yuyang Cai, Song Wang, Jiace Zhao, Han Cao, Yeyang Zhou, Tianrui Liu, Ray Pan , et al. (7 additional authors not shown)

    Abstract: We introduce OSGym, a super-scalable distributed data engine for training agents across diverse computer-related tasks. OSGym efficiently scales to over a thousand operating system (OS) replicas at an academia-affordable cost, serving as dynamic runtime environments for intelligent agents. It offers three key advantages. (1) Scalability: Despite the intensive resource requirements of running multi… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  33. arXiv:2511.11356  [pdf, ps, other

    cs.CR

    SEAL: Subspace-Anchored Watermarks for LLM Ownership

    Authors: Yanbo Dai, Zongjie Li, Zhenlan Ji, Shuai Wang

    Abstract: Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level performance in text generation, reasoning, and question answering. However, training such models requires substantial computational resources, large curated datasets, and sophisticated alignment procedures. As a result, they constitute highly valuable in… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  34. arXiv:2511.11257  [pdf

    cs.AI cs.CE cs.LG

    AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery

    Authors: Yuqi Yin, Yibo Fu, Siyuan Wang, Peng Sun, Hongyu Wang, Xiaohui Wang, Lei Zheng, Zhiyong Li, Zhirong Liu, Jianji Wang, Zhaoxi Sun

    Abstract: The discovery of novel Ionic Liquids (ILs) is hindered by critical challenges in property prediction, including limited data, poor model accuracy, and fragmented workflows. Leveraging the power of Large Language Models (LLMs), we introduce AIonopedia, to the best of our knowledge, the first LLM agent for IL discovery. Powered by an LLM-augmented multimodal domain foundation model for ILs, AIonoped… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  35. arXiv:2511.11031  [pdf, ps, other

    cs.CV cs.MM

    Accelerating Controllable Generation via Hybrid-grained Cache

    Authors: Lin Liu, Huixia Ben, Shuo Wang, Jinda Lu, Junxiang Qiu, Shengeng Tang, Yanbin Hao

    Abstract: Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computational requirements, resulting in generally low generation efficiency. To address this issue, we propose a Hybrid-Grained Cache (HGC) approach that reduces computational overhead by adopting cache strategies with… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  36. arXiv:2511.10851  [pdf, ps, other

    cs.DS cs.CC math.NT

    A number-theoretic conjecture implying faster algorithms for polynomial factorization and integer factorization

    Authors: Chris Umans, Siki Wang

    Abstract: The fastest known algorithm for factoring a degree $n$ univariate polynomial over a finite field $\mathbb{F}_q$ runs in time $O(n^{3/2 + o(1)}\text{polylog } q)$, and there is a reason to believe that the $3/2$ exponent represents a ''barrier'' inherent in algorithms that employ a so-called baby-steps-giant-steps strategy. In this paper, we propose a new strategy with the potential to overcome the… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  37. arXiv:2511.10721  [pdf, ps, other

    cs.CV cs.LG

    Fast Data Attribution for Text-to-Image Models

    Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei A Efros, Richard Zhang, Jun-Yan Zhu

    Abstract: Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them impractical for real-world applications. We propose a novel approach for scalable and efficient data attribution. Our key idea is to distill a slow, unlearning-base… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: NeurIPS 2025 camera ready. Project page: https://peterwang512.github.io/FastGDA

  38. arXiv:2511.10352  [pdf, ps, other

    cs.CV

    FOUND: Fourier-based von Mises Distribution for Robust Single Domain Generalization in Object Detection

    Authors: Mengzhu Wang, Changyuan Deng, Shanshan Wang, Nan Yin, Long Lan, Liang Yang

    Abstract: Single Domain Generalization (SDG) for object detection aims to train a model on a single source domain that can generalize effectively to unseen target domains. While recent methods like CLIP-based semantic augmentation have shown promise, they often overlook the underlying structure of feature distributions and frequency-domain characteristics that are critical for robustness. In this paper, we… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  39. arXiv:2511.10303  [pdf, ps, other

    cs.CL

    Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning

    Authors: Changyuan Tian, Zhicong Lu, Shuang Qian, Nayu Liu, Peiguang Li, Li Jin, Leiyi Hu, Zhizhao Zeng, Sirui Wang, Ke Zeng, Zhi Guo

    Abstract: To improve Multi-step Mathematical Reasoning (MsMR) of Large Language Models (LLMs), it is crucial to obtain scalable supervision from the corpus by automatically critiquing mistakes in the reasoning process of MsMR and rendering a final verdict of the problem-solution. Most existing methods rely on crafting high-quality supervised fine-tuning demonstrations for critiquing capability enhancement a… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI2026

  40. arXiv:2511.10222  [pdf, ps, other

    cs.SD cs.AI

    Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

    Authors: Yudong Yang, Xuezhen Zhang, Zhifeng Han, Siyin Wang, Jimin Zhuang, Zengrui Jin, Jing Shao, Guangzhi Sun, Chao Zhang

    Abstract: Recent progress in large language models (LLMs) has enabled understanding of both speech and non-speech audio, but exposing new safety risks emerging from complex audio inputs that are inadequately handled by current safeguards. We introduce SACRED-Bench (Speech-Audio Composition for RED-teaming) to evaluate the robustness of LLMs under complex audio-based attacks. Unlike existing perturbation-bas… ▽ More

    Submitted 14 November, 2025; v1 submitted 13 November, 2025; originally announced November 2025.

  41. arXiv:2511.10038  [pdf, ps, other

    cs.AI

    Efficient Thought Space Exploration through Strategic Intervention

    Authors: Ziheng Li, Hengyi Cai, Xiaochi Wei, Yuchen Li, Shuaiqiang Wang, Zhi-Hong Deng, Dawei Yin

    Abstract: While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs by exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden output, except for a few critical tokens that lead to deviations. Inspired by this phenomenon, we propose… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: AAAI 2026

  42. arXiv:2511.09722  [pdf, ps, other

    stat.ML cs.LG stat.AP

    Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling

    Authors: Sujay Nair, Evan Coleman, Sherrie Wang, Elsa Olivetti

    Abstract: Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspired by recent progress in generative modeling, we develop a learning method which infers the locations of minerals by masking and infilling geospatial maps of resource availability. We demonstrate this technique… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: 7 pages, 6 figures, includes 23 pages of Supplementary Materials for paper accepted to AAAI2026

  43. arXiv:2511.09690  [pdf, ps, other

    cs.CL

    Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

    Authors: Omnilingual ASR team, Gil Keren, Artyom Kozhevnikov, Yen Meng, Christophe Ropers, Matthew Setzler, Skyler Wang, Ife Adebara, Michael Auli, Can Balioglu, Kevin Chan, Chierh Cheng, Joe Chuang, Caley Droof, Mark Duppenthaler, Paul-Ambroise Duquenne, Alexander Erben, Cynthia Gao, Gabriel Mejia Gonzalez, Kehan Lyu, Sagar Miglani, Vineel Pratap, Kaushik Ram Sadagopan, Safiyyah Saleem, Arina Turkatenko , et al. (8 additional authors not shown)

    Abstract: Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community co… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  44. arXiv:2511.09602  [pdf, ps, other

    cs.RO

    ScaleADFG: Affordance-based Dexterous Functional Grasping via Scalable Dataset

    Authors: Sizhe Wang, Yifan Yang, Yongkang Luo, Daheng Li, Wei Wei, Yan Zhang, Peiying Hu, Yunjin Fu, Haonan Duan, Jia Sun, Peng Wang

    Abstract: Dexterous functional tool-use grasping is essential for effective robotic manipulation of tools. However, existing approaches face significant challenges in efficiently constructing large-scale datasets and ensuring generalizability to everyday object scales. These issues primarily arise from size mismatches between robotic and human hands, and the diversity in real-world object scales. To address… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: Accepted by IEEE Robotics and Automation Letters

  45. arXiv:2511.09407  [pdf, ps, other

    cs.CL

    CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

    Authors: Bichen Wang, Yixin Sun, Junzhe Wang, Hao Yang, Xing Fu, Yanyan Zhao, Si Wei, Shijin Wang, Bing Qin

    Abstract: The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  46. arXiv:2511.09211  [pdf, ps, other

    cs.LG

    Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)

    Authors: Lijun Zhang, Suyuan Liu, Siwei Wang, Shengju Yu, Xueling Zhu, Miaomiao Li, Xinwang Liu

    Abstract: Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperparameters such as the number of clusters or other sensitive settings, limiting their applicability in real-world scenarios. To address this long-standing challenge, we propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax. Our fr… ▽ More

    Submitted 13 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

  47. arXiv:2511.09109  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

    Authors: Wenda Wei, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Lixin Su, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit gui… ▽ More

    Submitted 13 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

  48. arXiv:2511.09047  [pdf, ps, other

    cs.LG

    Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

    Authors: Shengbo Wang, Hong Sun, Ke Li

    Abstract: Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems. Dueling bandit (DB) algorithms enable optimal decision-making in IPE building on pairwise comparisons. However, they remain inefficient when human feedback is sparse. Existing methods address sparsity by heavily relying on parametric reward models, w… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: Extended version of our AAAI 2026 paper

  49. arXiv:2511.09042  [pdf, ps, other

    cs.LG

    GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs

    Authors: Liangwei Yang, Jing Ma, Jianguo Zhang, Zhiwei Liu, Jielin Qiu, Shirley Kokane, Shiyu Wang, Haolin Chen, Rithesh Murthy, Ming Zhu, Huan Wang, Weiran Yao, Caiming Xiong, Shelby Heinecke

    Abstract: Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation. However, the representation spaces of modern PLMs are highly non--linear and geometrically structured, where textual embeddings reside on curved semantic manifolds rather than flat Euclidean spaces… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: 10 pages

  50. arXiv:2511.08620  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM

    Authors: Yibai Liu, Shihang Wang, Zeming Liu, Zheming Song, Junzhe Wang, Jingjing Liu, Qingjie Liu, Yunhong Wang

    Abstract: Despite large language models (LLMs) have achieved impressive achievements across numerous tasks, supervised fine-tuning (SFT) remains essential for adapting these models to specialized domains. However, SFT for domain specialization can be resource-intensive and sometimes leads to a deterioration in performance over general capabilities due to catastrophic forgetting (CF). To address these issues… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

    Comments: Under review