这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 5,190 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17613  [pdf, ps, other

    cs.CV

    InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling

    Authors: Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin, Ya-Qin Zhang, David Forsyth, Hao Zhao, Shenlong Wang

    Abstract: We present InvRGB+L, a novel inverse rendering model that reconstructs large, relightable, and dynamic scenes from a single RGB+LiDAR sequence. Conventional inverse graphics methods rely primarily on RGB observations and use LiDAR mainly for geometric information, often resulting in suboptimal material estimates due to visible light interference. We find that LiDAR's intensity values-captured with… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025

  2. arXiv:2507.17448  [pdf, ps, other

    cs.CE cs.AI physics.chem-ph

    Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

    Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu

    Abstract: Retrosynthesis planning, essential in organic synthesis and drug discovery, has greatly benefited from recent AI-driven advancements. Nevertheless, existing methods frequently face limitations in both applicability and explainability. Traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Preprint

  3. arXiv:2507.17265  [pdf, ps, other

    cs.GR cs.HC

    Visualization-Driven Illumination for Density Plots

    Authors: Xin Chen, Yunhai Wang, Huaiwei Bao, Kecheng Lu, Jaemin Jo, Chi-Wing Fu, Jean-Daniel Fekete

    Abstract: We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  4. arXiv:2507.17242  [pdf

    cs.HC eess.SP q-bio.NC

    High-Density EEG Enables the Fastest Visual Brain-Computer Interfaces

    Authors: Gege Ming, Weihua Pei, Sen Tian, Xiaogang Chen, Xiaorong Gao, Yijun Wang

    Abstract: Brain-computer interface (BCI) technology establishes a direct communication pathway between the brain and external devices. Current visual BCI systems suffer from insufficient information transfer rates (ITRs) for practical use. Spatial information, a critical component of visual perception, remains underexploited in existing systems because the limited spatial resolution of recording methods hin… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  5. arXiv:2507.17220  [pdf, ps, other

    cs.CV cs.RO

    PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models

    Authors: Jiansong Wan, Chengming Zhou, Jinkua Liu, Xiangge Huang, Xiaoyu Chen, Xiaohan Yi, Qisen Yang, Baiting Zhu, Xin-Qiang Cai, Lixing Liu, Rushuai Yang, Chuheng Zhang, Sherif Abdelfattah, Hayong Shin, Pushi Zhang, Li Zhao, Jiang Bian

    Abstract: Recent studies have explored pretrained (foundation) models for vision-based robotic navigation, aiming to achieve generalizable navigation and positive transfer across diverse environments while enhancing zero-shot performance in unseen settings. In this work, we introduce PIG-Nav (Pretrained Image-Goal Navigation), a new approach that further investigates pretraining strategies for vision-based… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  6. arXiv:2507.17147  [pdf, ps, other

    cs.CL

    CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards

    Authors: Cheng Liu, Yifei Lu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li

    Abstract: Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs). Existing approaches typically rely on prompt engineering or supervised fine-tuning to enable models to imitate character behaviors in specific scenarios, but often neglect the underlying \emph{cognitive} mechanisms driving these behaviors. Inspired by cognitive psychology, we… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  7. arXiv:2507.16782  [pdf, ps, other

    cs.CV

    Task-Specific Zero-shot Quantization-Aware Training for Object Detection

    Authors: Changhao Li, Xinrui Chen, Ji Wang, Kang Zhao, Jianfei Chen

    Abstract: Quantization is a key technique to reduce network size and computational complexity by representing the network parameters with a lower precision. Traditional quantization methods rely on access to original training data, which is often restricted due to privacy concerns or security challenges. Zero-shot Quantization (ZSQ) addresses this by using synthetic data generated from pre-trained models, e… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  8. arXiv:2507.16696  [pdf, ps, other

    cs.LG cs.AI cs.MM cs.SD

    FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

    Authors: Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian, Xie Chen, Cheng Lu, Jia Liu

    Abstract: With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scalin… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 11 pages, 6 figures

  9. arXiv:2507.16666  [pdf, ps, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface-Enabled Green and Secure Offloading for Mobile Edge Computing Networks

    Authors: Tong-Xing Zheng, Xinji Wang, Xin Chen, Di Mao, Jia Shi, Cunhua Pan, Chongwen Huang, Haiyang Ding, Zan Li

    Abstract: This paper investigates a multi-user uplink mobile edge computing (MEC) network, where the users offload partial tasks securely to an access point under the non-orthogonal multiple access policy with the aid of a reconfigurable intelligent surface (RIS) against a multi-antenna eavesdropper. We formulate a non-convex optimization problem of minimizing the total energy consumption subject to secure… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 15 pages, 9 figures, accepted by IEEE Internet of Things Journal

  10. arXiv:2507.16534  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG

    Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

    Authors: Shanghai AI Lab, :, Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi , et al. (13 additional authors not shown)

    Abstract: To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, this report presents a comprehensive assessment of their frontier risks. Drawing on the E-T-C analysis (deployment environment, threat source, enabling capability) from the Frontier AI Risk Management Framework (v1.0) (SafeWork-F1-Framework), we identify critical risks in seven areas:… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 97 pages, 37 figures

  11. arXiv:2507.16518  [pdf, ps, other

    cs.CV cs.CL cs.LG

    C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

    Authors: Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang

    Abstract: Recent advances in multimodal large language models (MLLMs) have shown impressive reasoning capabilities. However, further enhancing existing MLLMs necessitates high-quality vision-language datasets with carefully curated task complexities, which are both costly and challenging to scale. Although recent self-improving models that iteratively refine themselves offer a feasible solution, they still… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  12. arXiv:2507.16271  [pdf, ps, other

    cs.CL

    Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction

    Authors: Tianyun Zhong, Guozhao Mo, Yanjiang Liu, Yihan Chen, Lingdi Kong, Xuanang Chen, Yaojie Lu, Hongyu Lin, Ben He, Le Sun

    Abstract: With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). However, most LLMs generate paragraph-style answers that are chaotic, disorganized, and untraceable. To bridge this gap, we introduce the Arranged and Organized Extraction Benchmark (AOE), a new bilingual bench… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  13. arXiv:2507.16158  [pdf, ps, other

    cs.CV

    AMMNet: An Asymmetric Multi-Modal Network for Remote Sensing Semantic Segmentation

    Authors: Hui Ye, Haodong Chen, Zeke Zexi Hu, Xiaoming Chen, Yuk Ying Chung

    Abstract: Semantic segmentation in remote sensing (RS) has advanced significantly with the incorporation of multi-modal data, particularly the integration of RGB imagery and the Digital Surface Model (DSM), which provides complementary contextual and structural information about the ground object. However, integrating RGB and DSM often faces two major limitations: increased computational complexity due to a… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  14. arXiv:2507.15428  [pdf, ps, other

    cs.CV cs.AI

    EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent

    Authors: Jiaao Li, Kaiyuan Li, Chen Gao, Yong Li, Xinlei Chen

    Abstract: Egomotion videos are first-person recordings where the view changes continuously due to the agent's movement. As they serve as the primary visual input for embodied AI agents, making egomotion video reasoning more efficient is therefore essential for real-world deployment. Recent advances in vision-language models have enabled strong multimodal reasoning capabilities, but their computational cost… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  15. arXiv:2507.15301  [pdf, ps, other

    cs.IT

    A Novel Two-Dimensional Smoothing Algorithm

    Authors: Xufeng Chen, Liang Yan, Xiaoshan Gao

    Abstract: Smoothing and filtering two-dimensional sequences are fundamental tasks in fields such as computer vision. Conventional filtering algorithms often rely on the selection of the filtering window, limiting their applicability in certain scenarios. To this end, we propose a novel Two-Dimensional Smoothing (TDS) algorithm for the smoothing and filtering problem of two-dimensional sequences. Typically,… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  16. arXiv:2507.15223  [pdf, ps, other

    cs.CV

    Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel

    Authors: Siqi Chen, Guoqing Zhang, Jiahao Lai, Bingzhi Shen, Sihong Zhang, Caixia Dong, Xuejin Chen, Yang Li

    Abstract: Advancements in 3D vision have increased the impact of blood vessel modeling on medical applications. However, accurately representing the complex geometry and topology of blood vessels remains a challenge due to their intricate branching patterns, curvatures, and irregular shapes. In this study, we propose a hierarchical part-based frame work for 3D vessel generation that separates the global bin… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  17. arXiv:2507.14902  [pdf, ps, other

    cs.IR cs.CV

    U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs

    Authors: Xiaojie Li, Chu Li, Shi-Zhe Chen, Xi Chen

    Abstract: Universal multimodal retrieval (UMR), which aims to address complex retrieval tasks where both queries and candidates span diverse modalities, has been significantly advanced by the emergence of MLLMs. While state-of-the-art MLLM-based methods in the literature predominantly adopt contrastive learning principles, they often differ in their specific training recipes. Despite their success, the mech… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: Technical Report (in progress)

  18. arXiv:2507.14855  [pdf, ps, other

    cs.CV

    An Uncertainty-aware DETR Enhancement Framework for Object Detection

    Authors: Xingshu Chen, Sicheng Yu, Chong Cheng, Hao Wang, Ting Tian

    Abstract: This paper investigates the problem of object detection with a focus on improving both the localization accuracy of bounding boxes and explicitly modeling prediction uncertainty. Conventional detectors rely on deterministic bounding box regression, ignoring uncertainty in predictions and limiting model robustness. In this paper, we propose an uncertainty-aware enhancement framework for DETR-based… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  19. arXiv:2507.14801  [pdf, ps, other

    cs.CV

    Exploring Scalable Unified Modeling for General Low-Level Vision

    Authors: Xiangyu Chen, Kaiwen Zhu, Yuandong Pu, Shuo Cao, Xiaohui Li, Wenlong Zhang, Yihao Liu, Yu Qiao, Jiantao Zhou, Chao Dong

    Abstract: Low-level vision involves a wide spectrum of tasks, including image restoration, enhancement, stylization, and feature extraction, which differ significantly in both task formulation and output domains. To address the challenge of unified modeling across such diverse tasks, we propose a Visual task Prompt-based Image Processing (VPIP) framework that leverages input-target image pairs as visual pro… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

  20. arXiv:2507.14520  [pdf, ps, other

    cs.AI

    What if Othello-Playing Language Models Could See?

    Authors: Xinyi Chen, Yifei Yuan, Jiaang Li, Serge Belongie, Maarten de Rijke, Anders Søgaard

    Abstract: Language models are often said to face a symbol grounding problem. While some argue that world understanding can emerge from text alone, others suggest grounded learning is more efficient. We explore this through Othello, where the board state defines a simplified, rule-based world. Building on prior work, we introduce VISOTHELLO, a multi-modal model trained on move histories and board images. Usi… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

    Comments: ICML 2025 Assessing World Models Workshop

  21. arXiv:2507.14447  [pdf, ps, other

    cs.AI cs.CL

    Routine: A Structural Planning Framework for LLM Agent System in Enterprise

    Authors: Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han, Linyan Huang, Gang Li, Jingjing Mo, Haowen Hu

    Abstract: The deployment of agent systems in an enterprise environment is often hindered by several challenges: common models lack domain-specific process knowledge, leading to disorganized plans, missing key tools, and poor execution stability. To address this, this paper introduces Routine, a multi-step agent planning framework designed with a clear structure, explicit instructions, and seamless parameter… ▽ More

    Submitted 22 July, 2025; v1 submitted 18 July, 2025; originally announced July 2025.

    Comments: 26 pages, 8 figures, 5 tables

  22. arXiv:2507.13681  [pdf, ps, other

    cs.CL cs.AI

    LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

    Authors: Haoyang Li, Zhanchao Xu, Yiming Li, Xuejia Chen, Darian Li, Anxin Tian, Qingfa Xiao, Cheng Deng, Jun Wang, Qing Li, Lei Chen, Mingxuan Yuan

    Abstract: Multi-turn dialogues are essential in many real-world applications of large language models, such as chatbots and virtual assistants. As conversation histories become longer, existing large language models face increasing computational and memory challenges, which hinder their ability to provide efficient and responsive interactions. Most current acceleration methods either compress the context or… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  23. arXiv:2507.13095  [pdf, ps, other

    cs.SE

    A Conceptual Framework for Requirements Engineering of Pretrained-Model-Enabled Systems

    Authors: Dongming Jin, Zhi Jin, Linyu Li, Xiaohong Chen

    Abstract: Recent advances in large pretrained models have led to their widespread integration as core components in modern software systems. The trend is expected to continue in the foreseeable future. Unlike traditional software systems governed by deterministic logic, systems powered by pretrained models exhibit distinctive and emergent characteristics, such as ambiguous capability boundaries, context-dep… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 5pages, 1 figure

  24. arXiv:2507.12930  [pdf, ps, other

    cs.CL cs.AI

    Making Language Model a Hierarchical Classifier and Generator

    Authors: Yihong Wang, Zhonglin Jiang, Ningyuan Xi, Yue Zhao, Qingqing Gu, Xiyuan Chen, Hao Wu, Sheng Xu, Hange Zhou, Yong Chen, Luo Ji

    Abstract: Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  25. arXiv:2507.12704  [pdf, ps, other

    cs.LG cs.IR

    PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

    Authors: Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, Charles Rosenberg

    Abstract: User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: RecSys 2025

  26. arXiv:2507.12470  [pdf, other

    cs.DS

    DNA Probe Computing System for Solving NP-Complete Problems

    Authors: Jin Xu, XiaoLong Shi, Xin Chen, Fang Wang, Sirui Li, Pali Ye, Boliang Zhang, Di Deng, Zheng Kou, Xiaoli Qiang

    Abstract: Efficiently solving NP-complete problems-such as protein structure prediction, cryptographic decryption, and vulnerability detection-remains a central challenge in computer science. Traditional electronic computers, constrained by the Turing machine's one-dimensional data processing and sequential operations, struggle to address these issues effectively. To overcome this bottleneck, computational… ▽ More

    Submitted 20 April, 2025; originally announced July 2025.

    Comments: 11 pages, 4 figures

  27. arXiv:2507.12194  [pdf, ps, other

    cs.RO

    UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization

    Authors: Hongming Shen, Xun Chen, Yulin Hui, Zhenyu Wu, Wei Wang, Qiyang Lyu, Tianchen Deng, Danwei Wang

    Abstract: Existing LGL methods typically consider only partial information (e.g., geometric features) from LiDAR observations or are designed for homogeneous LiDAR sensors, overlooking the uniformity in LGL. In this work, a uniform LGL method is proposed, termed UniLGL, which simultaneously achieves spatial and material uniformity, as well as sensor-type uniformity. The key idea of the proposed method is to… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  28. arXiv:2507.12168  [pdf, ps, other

    cs.GR

    Shape Adaptation for 3D Hairstyle Retargeting

    Authors: Lu Yu, Zhong Ren, Youyi Zheng, Xiang Chen, Kun Zhou

    Abstract: It is demanding to author an existing hairstyle for novel characters in games and VR applications. However, it is a non-trivial task for artists due to the complicated hair geometries and spatial interactions to preserve. In this paper, we present an automatic shape adaptation method to retarget 3D hairstyles. We formulate the adaptation process as a constrained optimization problem, where all the… ▽ More

    Submitted 18 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

  29. arXiv:2507.12092  [pdf, ps, other

    eess.IV cs.CV

    Benchmarking and Explaining Deep Learning Cortical Lesion MRI Segmentation in Multiple Sclerosis

    Authors: Nataliia Molchanova, Alessandro Cagol, Mario Ocampo-Pineda, Po-Jui Lu, Matthias Weigel, Xinjie Chen, Erin Beck, Charidimos Tsagkas, Daniel Reich, Colin Vanden Bulcke, Anna Stolting, Serena Borrelli, Pietro Maggi, Adrien Depeursinge, Cristina Granziera, Henning Mueller, Pedro M. Gordaliza, Meritxell Bach Cuadra

    Abstract: Cortical lesions (CLs) have emerged as valuable biomarkers in multiple sclerosis (MS), offering high diagnostic specificity and prognostic relevance. However, their routine clinical integration remains limited due to subtle magnetic resonance imaging (MRI) appearance, challenges in expert annotation, and a lack of standardized automated methods. We propose a comprehensive multi-centric benchmark o… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  30. arXiv:2507.12083  [pdf, ps, other

    cs.CV cs.RO

    Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics

    Authors: Muleilan Pei, Shaoshuai Shi, Xuesong Chen, Xu Liu, Shaojie Shen

    Abstract: Motion forecasting for on-road traffic agents presents both a significant challenge and a critical necessity for ensuring safety in autonomous driving systems. In contrast to most existing data-driven approaches that directly predict future trajectories, we rethink this task from a planning perspective, advocating a "First Reasoning, Then Forecasting" strategy that explicitly incorporates behavior… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  31. arXiv:2507.11988  [pdf, ps, other

    cs.AI

    Aime: Towards Fully-Autonomous Multi-Agent Framework

    Authors: Yexuan Shi, Mingyu Wang, Yunxiang Cao, Hongjie Lai, Junjian Lan, Xin Han, Yu Wang, Jie Geng, Zhenan Li, Zihao Xia, Xiang Chen, Chen Li, Jian Xu, Wenbo Duan, Yuanshuo Zhu

    Abstract: Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) are emerging as a powerful paradigm for solving complex, multifaceted problems. However, the potential of these systems is often constrained by the prevalent plan-and-execute framework, which suffers from critical limitations: rigid plan execution, static agent capabilities, and inefficient communication. These weaknesses hinder the… ▽ More

    Submitted 16 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: 14 pages, 1 figures,

  32. arXiv:2507.11911  [pdf, ps, other

    cs.HC cs.IR cs.LG

    AFPM: Alignment-based Frame Patch Modeling for Cross-Dataset EEG Decoding

    Authors: Xiaoqing Chen, Siyang Li, Dongrui Wu

    Abstract: Electroencephalogram (EEG) decoding models for brain-computer interfaces (BCIs) struggle with cross-dataset learning and generalization due to channel layout inconsistencies, non-stationary signal distributions, and limited neurophysiological prior integration. To address these issues, we propose a plug-and-play Alignment-Based Frame-Patch Modeling (AFPM) framework, which has two main components:… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  33. arXiv:2507.11875  [pdf, ps, other

    cs.CL

    DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation

    Authors: Tianyou Huang, Xinglu Chen, Jingshen Zhang, Xinying Qiu, Ruiying Niu

    Abstract: This paper introduces DualReward, a novel reinforcement learning framework for automatic distractor generation in cloze tests. Unlike conventional approaches that rely primarily on supervised learning or static generative models, our method employs a dual reward structure with adaptive scaling that differentiates between human-created gold standard distractors and model-generated candidates. The f… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: Accepted to CCL 2025

  34. arXiv:2507.11839  [pdf, ps, other

    cs.LG q-bio.QM

    Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM

    Authors: Chengyue Gong, Xinshi Chen, Yuxuan Zhang, Yuxuan Song, Hao Zhou, Wenzhi Xiao

    Abstract: Lightweight inference is critical for biomolecular structure prediction and other downstream tasks, enabling efficient real-world deployment and inference-time scaling for large-scale applications. In this work, we address the challenge of balancing model efficiency and prediction accuracy by making several key modifications, 1) Multi-step AF3 sampler is replaced by a few-step ODE sampler, signifi… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  35. arXiv:2507.11099  [pdf, ps, other

    cs.CV

    A Survey on Interpretability in Visual Recognition

    Authors: Qiyang Wan, Chengzhi Gao, Ruiping Wang, Xilin Chen

    Abstract: In recent years, visual recognition methods have advanced significantly, finding applications across diverse fields. While researchers seek to understand the mechanisms behind the success of these models, there is also a growing impetus to deploy them in critical areas like autonomous driving and medical diagnostics to better diagnose failures, which promotes the development of interpretability re… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: 20 pages, 7 figures, 2 tables. Under review

  36. arXiv:2507.10934  [pdf, ps, other

    cs.DB cs.LG

    Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

    Authors: Xinyuan Liu, Jiahui Chen, Bocheng Hu, Yu Sun, Xinyang Chen, Shaoxu Song

    Abstract: Data quality remains an important challenge in data-driven systems, as errors in tabular data can severely compromise downstream analytics and machine learning performance. Although numerous error detection algorithms have been proposed, the lack of diverse, real-world error datasets limits comprehensive evaluation. Manual error annotation is both time-consuming and inconsistent, motivating the ex… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  37. arXiv:2507.09955  [pdf, ps, other

    cs.AI

    DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models

    Authors: Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong, Jingping Liu, Yanghua Xiao, Huajun Chen, Qing-Long Han, Yang Tang

    Abstract: DeepSeek, a Chinese Artificial Intelligence (AI) startup, has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages. This paper begins by reviewing the evolution of large AI models focusing on paradigm shifts, the mainstream Large Language Model (LLM) paradigm, and the DeepSeek paradigm. Subsequently, the paper… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  38. arXiv:2507.09469  [pdf, ps, other

    cs.RO

    mmE-Loc: Facilitating Accurate Drone Landing with Ultra-High-Frequency Localization

    Authors: Haoyang Wang, Jingao Xu, Xinyu Luo, Ting Zhang, Xuecheng Chen, Ruiyang Duan, Jialong Chen, Yunhao Liu, Jianfeng Zheng, Weijie Hong, Xinlei Chen

    Abstract: For precise, efficient, and safe drone landings, ground platforms should real-time, accurately locate descending drones and guide them to designated spots. While mmWave sensing combined with cameras improves localization accuracy, lower sampling frequency of traditional frame cameras compared to mmWave radar creates bottlenecks in system throughput. In this work, we upgrade traditional frame camer… ▽ More

    Submitted 14 July, 2025; v1 submitted 12 July, 2025; originally announced July 2025.

    Comments: 17 pages, 34 figures. Journal extended version of arXiv:2502.14992

  39. arXiv:2507.09184  [pdf, ps, other

    cs.CV

    MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models

    Authors: Qiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen, Xuyao Zhang, Dahan Wang

    Abstract: Hallucinations pose a significant challenge in Large Vision Language Models (LVLMs), with misalignment between multimodal features identified as a key contributing factor. This paper reveals the negative impact of the long-term decay in Rotary Position Encoding (RoPE), used for positional modeling in LVLMs, on multimodal alignment. Concretely, under long-term decay, instruction tokens exhibit unev… ▽ More

    Submitted 22 July, 2025; v1 submitted 12 July, 2025; originally announced July 2025.

    Comments: Accepted in ACM MM 2025

  40. arXiv:2507.09068  [pdf, ps, other

    cs.CV cs.AI cs.IR cs.LG cs.MM

    Infinite Video Understanding

    Authors: Dell Zhang, Xiangyu Chen, Jixiang Luo, Mengxi Jia, Changzhi Sun, Ruilong Ren, Jingren Liu, Hao Sun, Xuelong Li

    Abstract: The rapid advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have ushered in remarkable progress in video understanding. However, a fundamental challenge persists: effectively processing and comprehending video content that extends beyond minutes or hours. While recent efforts like Video-XL-2 have demonstrated novel architectural solutions for extreme efficiency,… ▽ More

    Submitted 23 July, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

  41. arXiv:2507.09009  [pdf

    cs.LG cs.AI

    Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography

    Authors: Zhengxiao He, Huayu Li, Geng Yuan, William D. S. Killgore, Stuart F. Quan, Chen X. Chen, Ao Li

    Abstract: Methods: We developed a self-supervised deep learning model that extracts meaningful patterns from multi-modal signals (Electroencephalography (EEG), Electrocardiography (ECG), and respiratory signals). The model was trained on data from 4,398 participants. Projection scores were derived by contrasting embeddings from individuals with and without CVD outcomes. External validation was conducted in… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  42. arXiv:2507.08885  [pdf, ps, other

    cs.RO cs.AI

    AirScape: An Aerial Generative World Model with Motion Controllability

    Authors: Baining Zhao, Rongze Tang, Mingyuan Jia, Ziyou Wang, Fanghang Man, Xin Zhang, Yu Shang, Weichen Zhang, Chen Gao, Wei Wu, Xin Wang, Xinlei Chen, Yong Li

    Abstract: How to enable robots to predict the outcomes of their own motion intentions in three-dimensional space has been a fundamental problem in embodied intelligence. To explore more general spatial imagination capabilities, here we present AirScape, the first world model designed for six-degree-of-freedom aerial agents. AirScape predicts future observation sequences based on current visual inputs and mo… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  43. arXiv:2507.08854  [pdf, ps, other

    physics.chem-ph cs.LG

    DiffNMR: Diffusion Models for Nuclear Magnetic Resonance Spectra Elucidation

    Authors: Qingsong Yang, Binglan Wu, Xuwei Liu, Bo Chen, Wei Li, Gen Long, Xin Chen, Mingjun Xiao

    Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is a central characterization method for molecular structure elucidation, yet interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. In this work, we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model fo… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  44. arXiv:2507.08772  [pdf, ps, other

    cs.CV

    From One to More: Contextual Part Latents for 3D Generation

    Authors: Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

    Abstract: Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: Project page: https://hkdsc.github.io/project/copart

  45. arXiv:2507.08297  [pdf, ps, other

    cs.CL

    KAT-V1: Kwai-AutoThink Technical Report

    Authors: Zizheng Zhan, Ken Deng, Huaixi Tang, Wen Xiang, Kun Wu, Weihao Li, Wenqiang Zhu, Jingxuan Xu, Lecheng Huang, Zongxian Feng, Shaojie Wang, Shangpeng Yan, Xuxing Chen, Jiaheng Liu, Zhongyuan Peng, Zuchen Gao, Haoyang Huang, Xiaojiang Zhang, Jinghui Wang, Zheng Lin, Mengtong Li, Huiming Wang, Ziqi Zhan, Yanan Wu, Yuanxing Zhang , et al. (5 additional authors not shown)

    Abstract: We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks, where an automatic thinking training paradigm is proposed to dynamically switch between reasoning and non-reasoning modes based on task complexity. Specifically, first, we construct the dual-regime dataset based on a novel tagging pipeline and a… ▽ More

    Submitted 21 July, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

  46. arXiv:2507.08182  [pdf, ps, other

    cs.LG

    CTRLS: Chain-of-Thought Reasoning via Latent State-Transition

    Authors: Junda Wu, Yuxin Xiong, Xintong Li, Zhengmian Hu, Tong Yu, Rui Wang, Xiang Chen, Jingbo Shang, Julian McAuley

    Abstract: Chain-of-thought (CoT) reasoning enables large language models (LLMs) to break down complex problems into interpretable intermediate steps, significantly enhancing model transparency and performance in reasoning tasks. However, conventional CoT methods rely on heuristic sampling without structured modeling of reasoning transitions, constraining their ability to systematically explore and discover… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: 10 pages

  47. arXiv:2507.08178  [pdf, ps, other

    eess.IV cs.CV

    Cracking Instance Jigsaw Puzzles: An Alternative to Multiple Instance Learning for Whole Slide Image Analysis

    Authors: Xiwen Chen, Peijie Qiu, Wenhui Zhu, Hao Wang, Huayu Li, Xuanzhao Dong, Xiaotong Sun, Xiaobing Yu, Yalin Wang, Abolfazl Razi, Aristeidis Sotiras

    Abstract: While multiple instance learning (MIL) has shown to be a promising approach for histopathological whole slide image (WSI) analysis, its reliance on permutation invariance significantly limits its capacity to effectively uncover semantic correlations between instances within WSIs. Based on our empirical and theoretical investigations, we argue that approaches that are not permutation-invariant but… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV2025

  48. arXiv:2507.07957  [pdf, ps, other

    cs.CL cs.AI

    MIRIX: Multi-Agent Memory System for LLM-Based Agents

    Authors: Yu Wang, Xi Chen

    Abstract: Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  49. arXiv:2507.07317  [pdf, ps, other

    cs.CV

    ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

    Authors: Sherry X. Chen, Yi Wei, Luowei Zhou, Suren Kumar

    Abstract: Recent advances in instruction-guided image editing underscore the need for effective automated evaluation. While Vision-Language Models (VLMs) have been explored as judges, open-source models struggle with alignment, and proprietary models lack transparency and cost efficiency. Additionally, no public training datasets exist to fine-tune open-source VLMs, only small benchmarks with diverse evalua… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: International Conference on Computer Vision (ICCV) 2025

  50. arXiv:2507.07131  [pdf

    eess.IV cs.CV q-bio.TO

    Wrist bone segmentation in X-ray images using CT-based simulations

    Authors: Youssef ElTantawy, Alexia Karantana, Xin Chen

    Abstract: Plain X-ray is one of the most common image modalities for clinical diagnosis (e.g. bone fracture, pneumonia, cancer screening, etc.). X-ray image segmentation is an essential step for many computer-aided diagnostic systems, yet it remains challenging. Deep-learning-based methods have achieved superior performance in medical image segmentation tasks but often require a large amount of high-quality… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 4 pages