这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 11,731 results for author: Li, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15613  [pdf, ps, other

    cs.CV cs.CL

    When to Think and When to Look: Uncertainty-Guided Lookback

    Authors: Jing Bi, Filippos Bellos, Junjia Guo, Yayuan Li, Chao Huang, Yunlong, Tang, Luchuan Song, Susan Liang, Zhongfei, Zhang, Jason J. Corso, Chenliang Xu

    Abstract: Test-time thinking (that is, generating explicit intermediate reasoning chains) is known to boost performance in large language models and has recently shown strong gains for large vision language models (LVLMs). However, despite these promising results, there is still no systematic analysis of how thinking actually affects visual reasoning. We provide the first such analysis with a large scale, c… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  2. arXiv:2511.15407  [pdf, ps, other

    cs.AI cs.CV

    IPR-1: Interactive Physical Reasoner

    Authors: Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li

    Abstract: Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. We study this in a Game-to-Unseen (G2U) setting, curating 1,000+ heterogeneous games with diverse physical and causal mechanisms, and evaluate at three human-like… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Comments: 11 pages, 5 figures

  3. arXiv:2511.15351  [pdf, ps, other

    cs.AI cs.CV

    Octopus: Agentic Multimodal Reasoning with Six-Capability Orchestration

    Authors: Yifu Guo, Zishan Xu, Zhiyuan Yao, Yuquan Lu, Jiaye Lin, Sen Hu, Zhenheng Tang, Yingchao Li, Huacan Wang, Ronghao Chen

    Abstract: Existing multimodal reasoning models and frameworks suffer from fundamental architectural limitations: most lack the human-like ability to autonomously explore diverse reasoning pathways-whether in direct inference, tool-driven visual exploration, programmatic visual manipulation, or intrinsic visual imagination. Consequently, they struggle to adapt to dynamically changing capability requirements… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  4. arXiv:2511.15174  [pdf, ps, other

    cs.LG cs.AI

    FaultDiffusion: Few-Shot Fault Time Series Generation with Diffusion Model

    Authors: Yi Xu, Zhigang Chen, Rui Wang, Yangfan Li, Fengxiao Tang, Ming Zhao, Jiaqi Liu

    Abstract: In industrial equipment monitoring, fault diagnosis is critical for ensuring system reliability and enabling predictive maintenance. However, the scarcity of fault data, due to the rarity of fault events and the high cost of data annotation, significantly hinders data-driven approaches. Existing time-series generation models, optimized for abundant normal data, struggle to capture fault distributi… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Comments: 4 figures, 5 tables ,8 pages

  5. arXiv:2511.15090  [pdf, ps, other

    cs.DB cs.AI cs.CV

    BBox DocVQA: A Large Scale Bounding Box Grounded Dataset for Enhancing Reasoning in Document Visual Question Answer

    Authors: Wenhan Yu, Wang Chen, Guanqiang Qi, Weikang Li, Yang Li, Lei Sha, Deguo Xia, Jizhou Huang

    Abstract: Document Visual Question Answering (DocVQA) is a fundamental task for multimodal document understanding and a key testbed for vision language reasoning. However, most existing DocVQA datasets are limited to the page level and lack fine grained spatial grounding, constraining the interpretability and reasoning capability of Vision Language Models (VLMs). To address this gap, we introduce BBox DocVQ… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 22 pages, 4 figures

  6. arXiv:2511.15085  [pdf, ps, other

    cs.CV

    TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition

    Authors: Wen Yin, Siyu Zhan, Cencen Liu, Xin Hu, Guiduo Duan, Xiurui Xie, Yuan-Fang Li, Tao He

    Abstract: Multimodal Emotion Recognition (MER) aims to accurately identify human emotional states by integrating heterogeneous modalities such as visual, auditory, and textual data. Existing approaches predominantly rely on unified emotion labels to supervise model training, often overlooking a critical challenge: inter-modal emotion conflicts, wherein different modalities within the same sample may express… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 11 pages, 5 figures

  7. arXiv:2511.14813  [pdf, ps, other

    cs.LG

    DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models

    Authors: Yifan Li, Qin Li, Min Zhang, Min Zhang, Peixin Wang

    Abstract: Assessing the reasoning ability of Large Language Models (LLMs) over data remains an open and pressing research question. Compared with LLMs, human reasoning can derive corresponding modifications to the output based on certain kinds of changes to the input. This reasoning pattern, which relies on abstract rules that govern relationships between changes of data, has not been comprehensively descri… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  8. arXiv:2511.14766  [pdf, ps, other

    cs.IR cs.MM

    OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction

    Authors: Yang Li, Yajiao Wang, Wenhao Hu, Zhixiong Zhang, Mengting Zhang

    Abstract: Multimodal Information Extraction (MIE) requires fusing text and visual cues from visually rich documents. While recent methods have advanced multimodal representation learning, most implicitly assume modality equivalence or treat modalities in a largely uniform manner, still relying on generic fusion paradigms. This often results in indiscriminate incorporation of multimodal signals and insuffici… ▽ More

    Submitted 17 September, 2025; originally announced November 2025.

    Comments: 5 pages, 3 figures

  9. arXiv:2511.14559  [pdf, ps, other

    q-bio.BM cs.AI cs.LG q-bio.QM

    Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models

    Authors: Xinzhe Zheng, Shiyu Jiang, Gustavo Seabra, Chenglong Li, Yanjun Li

    Abstract: Deep generative models are rapidly advancing structure-based drug design, offering substantial promise for generating small molecule ligands that bind to specific protein targets. However, most current approaches assume a rigid protein binding pocket, neglecting the intrinsic flexibility of proteins and the conformational rearrangements induced by ligand binding, limiting their applicability in pr… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

  10. arXiv:2511.14515  [pdf, ps, other

    cs.SD cs.AI cs.CV

    IMSE: Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

    Authors: Xinxin Tang, Bin Qin, Yufang Li

    Abstract: Achieving a balance between lightweight design and high performance remains a significant challenge for speech enhancement (SE) tasks on resource-constrained devices. Existing state-of-the-art methods, such as MUSE, have established a strong baseline with only 0.51M parameters by introducing a Multi-path Enhanced Taylor (MET) transformer and Deformable Embedding (DE). However, an in-depth analysis… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  11. arXiv:2511.14423  [pdf, ps, other

    cs.CL

    Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education

    Authors: Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang, Xiaoling Wang, Liang He

    Abstract: Large Language Models (LLMs) are increasingly integrated into educational applications. However, they remain vulnerable to jailbreak and fine-tuning attacks, which can compromise safety alignment and lead to harmful outputs. Existing studies mainly focus on general safety evaluations, with limited attention to the unique safety requirements of educational scenarios. To address this gap, we constru… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  12. arXiv:2511.14366  [pdf, ps, other

    cs.CL

    ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

    Authors: Hongwei Liu, Junnan Liu, Shudong Liu, Haodong Duan, Yuqiang Li, Mao Su, Xiaohong Liu, Guangtao Zhai, Xinyu Fang, Qianhong Ma, Taolin Zhang, Zihan Ma, Yufeng Zhao, Peiheng Zhou, Linchen Xiao, Wenlong Zhang, Shijie Zhou, Xingjian Ma, Siqi Sun, Jiaye Ge, Meng Li, Yuhong Liu, Jianxin Dong, Jiaying Li, Hui Wu , et al. (11 additional authors not shown)

    Abstract: The rapid advancement of Large Language Models (LLMs) has led to performance saturation on many established benchmarks, questioning their ability to distinguish frontier models. Concurrently, existing high-difficulty benchmarks often suffer from narrow disciplinary focus, oversimplified answer formats, and vulnerability to data contamination, creating a fidelity gap with real-world scientific inqu… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 39 pages

  13. arXiv:2511.14302  [pdf, ps, other

    cs.CV cs.AI

    SAM-Fed: SAM-Guided Federated Semi-Supervised Learning for Medical Image Segmentation

    Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Yiping Li, Marcel Breeuwer, Raphael Sznitman, Klaus Schoeffmann

    Abstract: Medical image segmentation is clinically important, yet data privacy and the cost of expert annotation limit the availability of labeled data. Federated semi-supervised learning (FSSL) offers a solution but faces two challenges: pseudo-label reliability depends on the strength of local models, and client devices often require compact or heterogeneous architectures due to limited computational reso… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  14. arXiv:2511.14271  [pdf, ps, other

    cs.CV

    Let Language Constrain Geometry: Vision-Language Models as Semantic and Spatial Critics for 3D Generation

    Authors: Weimin Bai, Yubo Li, Weijian Luo, Zeqiang Lai, Yequan Wang, Wenzheng Chen, He Sun

    Abstract: Text-to-3D generation has advanced rapidly, yet state-of-the-art models, encompassing both optimization-based and feed-forward architectures, still face two fundamental limitations. First, they struggle with coarse semantic alignment, often failing to capture fine-grained prompt details. Second, they lack robust 3D spatial understanding, leading to geometric inconsistencies and catastrophic failur… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  15. arXiv:2511.14256  [pdf, ps, other

    cs.AI cs.IR

    PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models

    Authors: Yu Liu, Xixun Lin, Yanmin Shang, Yangxi Li, Shi Wang, Yanan Cao

    Abstract: Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkable performance in complex reasoning tasks. Despite promising success, current LLM-based KGR methods still face two critical limitations. First, existing methods often extract reasoning paths indiscriminately, w… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: AAAI 2026, Long Paper, Oral

  16. TailCue: Exploring Animal-inspired Robotic Tail for Automated Vehicles Interaction

    Authors: Yuan Li, Xinyue Gui, Ding Xia, Mark Colley, Takeo Igarashi

    Abstract: Automated vehicles (AVs) are gradually becoming part of our daily lives. However, effective communication between road users and AVs remains a significant challenge. Although various external human-machine interfaces (eHMIs) have been developed to facilitate interactions, psychological factors, such as a lack of trust and inadequate emotional signaling, may still deter users from confidently engag… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  17. arXiv:2511.14237  [pdf, ps, other

    cs.CV

    Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction

    Authors: Juncheng Hu, Zijian Zhang, Zeyu Wang, Guoyu Wang, Yingji Li, Kedi Lyu

    Abstract: Forecasting 3D human motion is an important embodiment of fine-grained understanding and cognition of human behavior by artificial agents. Current approaches excessively rely on implicit network modeling of spatiotemporal relationships and motion characteristics, falling into the passive learning trap that results in redundant and monotonous 3D coordinate information acquisition while lacking acti… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 8 pages, 3 figures

  18. arXiv:2511.14198  [pdf, ps, other

    cs.CY cs.AI

    DiverseClaire: Simulating Students to Improve Introductory Programming Course Materials for All CS1 Learners

    Authors: Wendy Wong, Yuchao Jiang, Yuekang Li

    Abstract: Although CS programs are booming, introductory courses like CS1 still adopt a one-size-fits-all formats that can exacerbate cognitive load and discourage learners with autism, ADHD, dyslexia and other neurological conditions. These call for compassionate pedagogies and Universal Design For Learning (UDL) to create learning environments and materials where cognitive diversity is welcomed. To addres… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 2 pages

    ACM Class: K.3.1

  19. arXiv:2511.14179  [pdf, ps, other

    cs.CV

    DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition

    Authors: Yanshan Li, Ke Ma, Miaomiao Wei, Linhui Dai

    Abstract: Existing self-supervised contrastive learning methods for skeleton-based action recognition often process all skeleton regions uniformly, and adopt a first-in-first-out (FIFO) queue to store negative samples, which leads to motion information loss and non-optimal negative sample selection. To address these challenges, this paper proposes Dominance-Game Contrastive Learning network for skeleton-bas… ▽ More

    Submitted 19 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

    Comments: 14 pages, 7 figures, journal

  20. arXiv:2511.14166  [pdf, ps, other

    cs.CL cs.AI

    Selective Weak-to-Strong Generalization

    Authors: Hao Lang, Fei Huang, Yongbin Li

    Abstract: Future superhuman models will surpass the ability of humans and humans will only be able to \textit{weakly} supervise superhuman models. To alleviate the issue of lacking high-quality data for model alignment, some works on weak-to-strong generalization (W2SG) finetune a strong pretrained model with a weak supervisor so that it can generalize beyond weak supervision. However, the invariable use of… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: AAAI2025 Special Track on AI Alignment

  21. arXiv:2511.13983  [pdf, ps, other

    cs.CE

    MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis

    Authors: Peng Shu, Junhao Chen, Zhengliang Liu, Hanqi Jiang, Yi Pan, Khanh Nhu Nguyen, Zihao Wu, Huaqin Zhao, Yiwei Li, Enze Shi, ShaoChen Xu

    Abstract: We present a novel approach called Mixture of Mixture of Expert (MoMoE) that combines the strengths of Mixture-of-Experts (MoE) architectures with collaborative multi-agent frameworks. By modifying the LLaMA 3.1 8B architecture to incorporate MoE layers in each agent of a layered collaborative structure, we create an ensemble of specialized expert agents that iteratively refine their outputs. Each… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  22. arXiv:2511.13789  [pdf, ps, other

    cs.CR cs.AI

    Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

    Authors: Haotian Jin, Yang Li, Haihui Fan, Lin Shen, Xiangfang Li, Bo Li

    Abstract: Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic or implicit triggers. This increased flexibility in trigger design makes it challenging for defenders to identify their specific forms accurately. Most existin… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  23. arXiv:2511.13733  [pdf, ps, other

    eess.SP cs.LG q-bio.NC

    THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations

    Authors: Wenchao Yang, Weidong Yan, Wenkang Liu, Yulan Ma, Yang Li

    Abstract: Large-scale pre-trained models hold significant potential for learning universal EEG representations. However, most existing methods, particularly autoregressive (AR) frameworks, primarily rely on straightforward temporal sequencing of multi-channel EEG data, which fails to capture the rich physiological characteristics inherent to EEG signals. Moreover, their time-centered modeling approach also… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  24. arXiv:2511.13647  [pdf, ps, other

    cs.CV

    Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

    Authors: Chunshi Wang, Junliang Ye, Yunhan Yang, Yang Li, Zizhuo Lin, Jun Zhu, Zhuo Chen, Yawei Luo, Chunchao Guo

    Abstract: We introduce Part-X-MLLM, a native 3D multimodal large language model that unifies diverse 3D tasks by formulating them as programs in a structured, executable grammar. Given an RGB point cloud and a natural language prompt, our model autoregressively generates a single, coherent token sequence encoding part-level bounding boxes, semantic descriptions, and edit commands. This structured output ser… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  25. arXiv:2511.13612  [pdf, ps, other

    cs.LG cs.AI cs.CL

    P1: Mastering Physics Olympiads with Reinforcement Learning

    Authors: Jiacheng Chen, Qianjia Cheng, Fangchen Yu, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Yun Luo, Yufeng Zhao, Futing Wang, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Wenxauan Zeng, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding , et al. (3 additional authors not shown)

    Abstract: Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to a… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  26. arXiv:2511.13587  [pdf, ps, other

    cs.CV cs.AI

    VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping

    Authors: Haotian Dong, Ye Li, Rongwei Lu, Chen Tang, Shu-Tao Xia, Zhi Wang

    Abstract: Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction of the forward passes, thus restrictin… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  27. arXiv:2511.13361  [pdf, ps, other

    cs.AI cs.MA

    MedDCR: Learning to Design Agentic Workflows for Medical Coding

    Authors: Jiyang Zheng, Islam Nassar, Thanh Vu, Xu Zhong, Yang Lin, Tongliang Liu, Long Duong, Yuan-Fang Li

    Abstract: Medical coding converts free-text clinical notes into standardized diagnostic and procedural codes, which are essential for billing, hospital operations, and medical research. Unlike ordinary text classification, it requires multi-step reasoning: extracting diagnostic concepts, applying guideline constraints, mapping to hierarchical codebooks, and ensuring cross-document consistency. Recent advanc… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  28. arXiv:2511.13144  [pdf, ps, other

    cs.LG

    Personalized Federated Learning with Bidirectional Communication Compression via One-Bit Random Sketching

    Authors: Jiacheng Cheng, Xu Zhang, Guanghui Qiu, Yifang Zhang, Yinchuan Li, Kaiyuan Feng

    Abstract: Federated Learning (FL) enables collaborative training across decentralized data, but faces key challenges of bidirectional communication overhead and client-side data heterogeneity. To address communication costs while embracing data heterogeneity, we propose pFed1BS, a novel personalized federated learning framework that achieves extreme communication compression through one-bit random sketching… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: Accepted in AAAI 2026

  29. arXiv:2511.13116  [pdf, ps, other

    cs.LG cs.AI

    Synthetic Forgetting without Access: A Few-shot Zero-glance Framework for Machine Unlearning

    Authors: Qipeng Song, Nan Yang, Ziqi Xu, Yue Li, Wei Shao, Feng Xia

    Abstract: Machine unlearning aims to eliminate the influence of specific data from trained models to ensure privacy compliance. However, most existing methods assume full access to the original training dataset, which is often impractical. We address a more realistic yet challenging setting: few-shot zero-glance, where only a small subset of the retained data is available and the forget set is entirely inac… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  30. Unidirectional-Road-Network-Based Global Path Planning for Cleaning Robots in Semi-Structured Environments

    Authors: Yong Li, Hui Cheng

    Abstract: Practical global path planning is critical for commercializing cleaning robots working in semi-structured environments. In the literature, global path planning methods for free space usually focus on path length and neglect the traffic rule constraints of the environments, which leads to high-frequency re-planning and increases collision risks. In contrast, those for structured environments are de… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 2023 IEEE International Conference on Robotics and Automation (ICRA)

  31. APP: A* Post-Processing Algorithm for Robots with Bidirectional Shortcut and Path Perturbation

    Authors: Yong Li, Hui Cheng

    Abstract: Paths generated by A* and other graph-search-based planners are widely used in the robotic field. Due to the restricted node-expansion directions, the resulting paths are usually not the shortest. Besides, unnecessary heading changes, or zig-zag patterns, exist even when no obstacle is nearby, which is inconsistent with the human intuition that the path segments should be straight in wide-open spa… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 11, November 2023)

  32. Mitigating Recommendation Biases via Group-Alignment and Global-Uniformity in Representation Learning

    Authors: Miaomiao Cai, Min Hou, Lei Chen, Le Wu, Haoyue Bai, Yong Li, Meng Wang

    Abstract: Collaborative Filtering~(CF) plays a crucial role in modern recommender systems, leveraging historical user-item interactions to provide personalized suggestions. However, CF-based methods often encounter biases due to imbalances in training data. This phenomenon makes CF-based methods tend to prioritize recommending popular items and performing unsatisfactorily on inactive users. Existing works a… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  33. arXiv:2511.12945  [pdf, ps, other

    cs.LG

    APT: Affine Prototype-Timestamp For Time Series Forecasting Under Distribution Shift

    Authors: Yujie Li, Zezhi Shao, Chengqing Yu, Yisong Fu, Tao Sun, Yongjun Xu, Fei Wang

    Abstract: Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to capture global distribution shift. Methods like RevIN and its variants attempt to decouple distribution and pattern but still struggle with missing values, noisy observations, and invalid channel-wise affine tra… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  34. arXiv:2511.12932  [pdf, ps, other

    cs.CV

    Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

    Authors: Feng Lv, Haoxuan Feng, Zilu Zhang, Chunlong Xia, Yanfeng Li

    Abstract: With the rapid advancement of intelligent transportation systems, text-driven image generation and editing techniques have demonstrated significant potential in providing rich, controllable visual scene data for applications such as traffic monitoring and autonomous driving. However, several challenges remain, including insufficient semantic richness of generated traffic elements, limited camera v… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  35. arXiv:2511.12910  [pdf

    cs.RO

    TOPP-DWR: Time-Optimal Path Parameterization of Differential-Driven Wheeled Robots Considering Piecewise-Constant Angular Velocity Constraints

    Authors: Yong Li, Yujun Huang, Yi Chen, Hui Cheng

    Abstract: Differential-driven wheeled robots (DWR) represent the quintessential type of mobile robots and find extensive appli- cations across the robotic field. Most high-performance control approaches for DWR explicitly utilize the linear and angular velocities of the trajectory as control references. However, existing research on time-optimal path parameterization (TOPP) for mobile robots usually neglect… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Report number: IROS20251376

    Journal ref: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

  36. arXiv:2511.12882  [pdf, ps, other

    cs.RO cs.AI

    Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos

    Authors: Taiyi Su, Jian Zhu, Yaxuan Li, Chong Ma, Zitai Huang, Hanli Wang, Yi Xu

    Abstract: Embodied world models aim to predict and interact with the physical world through visual observations and actions. However, existing models struggle to accurately translate low-level actions (e.g., joint positions) into precise robotic movements in predicted frames, leading to inconsistencies with real-world physical interactions. To address these limitations, we propose MTV-World, an embodied wor… ▽ More

    Submitted 19 November, 2025; v1 submitted 16 November, 2025; originally announced November 2025.

    Comments: 15 pages, 23 figures

  37. arXiv:2511.12838  [pdf, ps, other

    cs.LG cs.AI

    Connectivity-Guided Sparsification of 2-FWL GNNs: Preserving Full Expressivity with Improved Efficiency

    Authors: Rongqin Chen, Fan Mo, Pak Lon Ip, Shenghui Zhang, Dan Wu, Ye Li, Leong Hou U

    Abstract: Higher-order Graph Neural Networks (HOGNNs) based on the 2-FWL test achieve superior expressivity by modeling 2- and 3-node interactions, but at $\mathcal{O}(n^3)$ computational cost. However, this computational burden is typically mitigated by existing efficiency methods at the cost of reduced expressivity. We propose \textbf{Co-Sparsify}, a connectivity-aware sparsification framework that elimin… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

  38. arXiv:2511.12791  [pdf, ps, other

    cs.LG cs.AI

    Optimal Look-back Horizon for Time Series Forecasting in Federated Learning

    Authors: Dahao Tang, Nan Yang, Yanli Li, Zhiyu Zhu, Zhibo Jin, Dong Yuan

    Abstract: Selecting an appropriate look-back horizon remains a fundamental challenge in time series forecasting (TSF), particularly in the federated learning scenarios where data is decentralized, heterogeneous, and often non-independent. While recent work has explored horizon selection by preserving forecasting-relevant information in an intrinsic space, these approaches are primarily restricted to central… ▽ More

    Submitted 17 November, 2025; v1 submitted 16 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI-26 as Oral Presentation

  39. arXiv:2511.12609  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

    Authors: Yunxin Li, Xinyu Chen, Shenyuan Jiang, Haoyuan Shi, Zhenyu Liu, Xuanyu Zhang, Nanhao Deng, Zhenran Xu, Yicheng Ma, Meishan Zhang, Baotian Hu, Min Zhang

    Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the Qwen2.5-7B dense architecture, we build Uni-MoE-2.0-Omni from scratch through three core contributions: dynamic-capacity Mixture-of-Experts (MoE) design, a progressi… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: 47 pages,10 Figures, Project Website: https://idealistxy.github.io/Uni-MoE-v2.github.io/; Codes: https://github.com/HITsz-TMG/Uni-MoE

  40. arXiv:2511.12603  [pdf, ps, other

    cs.LG cs.AI eess.SY

    PID-controlled Langevin Dynamics for Faster Sampling of Generative Models

    Authors: Hongyi Chen, Jianhai Shu, Jingtao Ding, Yong Li, Xiao-Ping Zhang

    Abstract: Langevin dynamics sampling suffers from extremely low generation speed, fundamentally limited by numerous fine-grained iterations to converge to the target distribution. We introduce PID-controlled Langevin Dynamics (PIDLD), a novel sampling acceleration algorithm that reinterprets the sampling process using control-theoretic principles. By treating energy gradients as feedback signals, PIDLD comb… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: NeurIPS 2025 poster paper

  41. arXiv:2511.12590  [pdf, ps, other

    cs.CV cs.AI

    Fine-Grained Representation for Lane Topology Reasoning

    Authors: Guoqing Xu, Yiheng Li, Yang Yang

    Abstract: Precise modeling of lane topology is essential for autonomous driving, as it directly impacts navigation and control decisions. Existing methods typically represent each lane with a single query and infer topological connectivity based on the similarity between lane queries. However, this kind of design struggles to accurately model complex lane structures, leading to unreliable topology predictio… ▽ More

    Submitted 18 November, 2025; v1 submitted 16 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

  42. arXiv:2511.12410  [pdf, ps, other

    cs.CV

    Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection

    Authors: Xi Xiao, Zhuxuanzi Wang, Mingqiao Mo, Chen Liu, Chenrui Ma, Yanshu Li, Smita Krishnaswamy, Xiao Wang, Tianyang Wang

    Abstract: The deployment of automated pavement defect detection is often hindered by poor cross-domain generalization. Supervised detectors achieve strong in-domain accuracy but require costly re-annotation for new environments, while standard self-supervised methods capture generic features and remain vulnerable to domain shift. We propose \ours, a self-supervised framework that \emph{visually probes} targ… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

    Comments: Accepted by WACV 2026

  43. arXiv:2511.12385  [pdf, ps, other

    cs.CR cs.SE

    GenSIaC: Toward Security-Aware Infrastructure-as-Code Generation with Large Language Models

    Authors: Yikun Li, Matteo Grella, Daniel Nahmias, Gal Engelberg, Dan Klein, Giancarlo Guizzardi, Thijs van Ede, Andrea Continella

    Abstract: In recent years, Infrastructure as Code (IaC) has emerged as a critical approach for managing and provisioning IT infrastructure through code and automation. IaC enables organizations to create scalable and consistent environments, effectively managing servers and development settings. However, the growing complexity of cloud infrastructures has led to an increased risk of misconfigurations and se… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  44. arXiv:2511.12149  [pdf, ps, other

    cs.CR cs.AI cs.CV

    AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

    Authors: Jiayu Li, Yunhan Zhao, Xiang Zheng, Zonghuan Xu, Yige Li, Xingjun Ma, Yu-Gang Jiang

    Abstract: Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is t… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  45. arXiv:2511.12009  [pdf, ps, other

    cs.DC

    High-Performance N-Queens Solver on GPU: Iterative DFS with Zero Bank Conflicts

    Authors: Guangchao Yao, Yali Li

    Abstract: The counting of solutions to the N-Queens problem is a classic NP-complete problem with extremely high computational complexity. As of now, the academic community has rigorously verified the number of solutions only up to N <= 26. In 2016, the research team led by PreuBer solved the 27-Queens problem using FPGA hardware, which took approximately one year, though the result remains unverified indep… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  46. arXiv:2511.11999  [pdf, ps, other

    cs.SE

    WITNESS: A lightweight and practical approach to fine-grained predictive mutation testing

    Authors: Zeyu Lu, Peng Zhang, Chun Yong Chong, Shan Gao, Yibiao Yang, Yanhui Li, Lin Chen, Yuming Zhou

    Abstract: Existing fine-grained predictive mutation testing studies predominantly rely on deep learning, which faces two critical limitations in practice: (1) Exorbitant computational costs. The deep learning models adopted in these studies demand significant computational resources for training and inference acceleration. This introduces high costs and undermines the cost-reduction goal of predictive mutat… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  47. arXiv:2511.11896  [pdf, ps, other

    cs.CR cs.AI cs.SE

    VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

    Authors: Youpeng Li, Fuxun Yu, Xinda Wang

    Abstract: The widespread reliance on open-source software dramatically increases the risk of vulnerability exploitation, underscoring the need for effective and scalable vulnerability detection (VD). Existing VD techniques, whether traditional machine learning-based or LLM-based approaches like prompt engineering, supervised fine-tuning, or off-policy preference optimization, remain fundamentally limited in… ▽ More

    Submitted 18 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

  48. arXiv:2511.11710  [pdf, ps, other

    cs.CV eess.IV

    Target-Balanced Score Distillation

    Authors: Zhou Xu, Qi Wang, Yuxiao Yang, Luyuan Zhang, Zhang Liang, Yang Li

    Abstract: Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape disto… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  49. arXiv:2511.11692  [pdf, ps, other

    cs.LG cs.AI cs.CV

    AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation

    Authors: Jiayin Zhu, Linlin Yang, Yicong Li, Angela Yao

    Abstract: Optimization-based text-to-3D methods distill guidance from 2D generative models via Score Distillation Sampling (SDS), but implicitly treat this guidance as static. This work shows that ignoring source dynamics yields inconsistent trajectories that suppress or merge semantic cues, leading to "semantic over-smoothing" artifacts. As such, we reformulate text-to-3D optimization as mapping a dynamica… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026. Project page: https://jyzhu.top/AnchorDS_Webpage/

  50. arXiv:2511.11690  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models

    Authors: Fei Song, Yi Li, Rui Wang, Jiahuan Zhou, Changwen Zheng, Jiangmeng Li

    Abstract: Test-time prompt tuning for vision-language models has demonstrated impressive generalization capabilities under zero-shot settings. However, tuning the learnable prompts solely based on unlabeled test data may induce prompt optimization bias, ultimately leading to suboptimal performance on downstream tasks. In this work, we analyze the underlying causes of prompt optimization bias from both the m… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI2026