这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 3,166 results for author: Chen, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17479  [pdf, ps, other

    cs.CV cs.LG

    SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving

    Authors: Chuang Chen, Xiaolin Qin, Jing Hu, Wenyi Ge

    Abstract: Upsampling LiDAR point clouds in autonomous driving scenarios remains a significant challenge due to the inherent sparsity and complex 3D structures of the data. Recent studies have attempted to address this problem by converting the complex 3D spatial scenes into 2D image super-resolution tasks. However, due to the sparse and blurry feature representation of range images, accurately reconstructin… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  2. arXiv:2507.17131  [pdf, ps, other

    cs.LG cs.AI

    Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

    Authors: Yufei He, Ruoyu Li, Alex Chen, Yue Liu, Yulin Chen, Yuan Sui, Cheng Chen, Yi Zhu, Luca Luo, Frank Yang, Bryan Hooi

    Abstract: Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. Current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot effectively adapt to new knowledge during actual operation. To address this limitation, we propose the Adaptive R… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  3. arXiv:2507.16713  [pdf, ps, other

    cs.RO cs.AI cs.CL

    Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

    Authors: Guowei Lan, Kaixian Qu, René Zurbrügg, Changan Chen, Christopher E. Mower, Haitham Bou-Ammar, Marco Hutter

    Abstract: Vision-language models (VLMs) have been widely adopted in robotics to enable autonomous planning. However, grounding VLMs, originally trained on internet data, to diverse real-world robots remains a challenge. This paper presents ExpTeach, a framework that grounds VLMs to physical robots by building a self-generated memory of real-world experiences. In ExpTeach, the VLM autonomously plans actions,… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  4. arXiv:2507.15502  [pdf, ps, other

    cs.HC

    FollowUpBot: An LLM-Based Conversational Robot for Automatic Postoperative Follow-up

    Authors: Chen Chen, Jianing Yin, Jiannong Cao, Zhiyuan Wen, Mingjin Zhang, Weixun Gao, Xiang Wang, Haihua Shu

    Abstract: Postoperative follow-up plays a crucial role in monitoring recovery and identifying complications. However, traditional approaches, typically involving bedside interviews and manual documentation, are time-consuming and labor-intensive. Although existing digital solutions, such as web questionnaires and intelligent automated calls, can alleviate the workload of nurses to a certain extent, they eit… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  5. A Steel Surface Defect Detection Method Based on Lightweight Convolution Optimization

    Authors: Cong Chen, Ming Chen, Hoileong Lee, Yan Li, Jiyang Yu

    Abstract: Surface defect detection of steel, especially the recognition of multi-scale defects, has always been a major challenge in industrial manufacturing. Steel surfaces not only have defects of various sizes and shapes, which limit the accuracy of traditional image processing and detection methods in complex environments. However, traditional defect detection methods face issues of insufficient accurac… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Journal ref: International Journal of Advanced Computer Science and Applications (IJACSA), 16(6), 2025

  6. arXiv:2507.15364  [pdf, ps, other

    eess.SP cs.AI cs.LG

    EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network

    Authors: Ruifeng Zheng, Cong Chen, Shuang Wang, Yiming Liu, Lin You, Jindong Lu, Ruizhe Zhu, Guodao Zhang, Kejie Huang

    Abstract: Epilepsy is a chronic, noncommunicable brain disorder, and sudden seizure onsets can significantly impact patients' quality of life and health. However, wearable seizure-predicting devices are still limited, partly due to the bulky size of EEG-collecting devices. To relieve the problem, we proposed a novel two-stage channel-aware Set Transformer Network that could perform seizure prediction with f… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  7. arXiv:2507.14677  [pdf, ps, other

    cs.LG

    Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

    Authors: Yiming Xu, Zhen Peng, Bin Shi, Xu Hua, Bo Dong, Song Wang, Chen Chen

    Abstract: The superiority of graph contrastive learning (GCL) has prompted its application to anomaly detection tasks for more powerful risk warning systems. Unfortunately, existing GCL-based models tend to excessively prioritize overall detection performance while neglecting robustness to structural imbalance, which can be problematic for many real-world networks following power-law degree distributions. P… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

    Comments: Accepted by AAAI2025

  8. arXiv:2507.14497  [pdf, ps, other

    cs.CV cs.CL

    Efficient Whole Slide Pathology VQA via Token Compression

    Authors: Weimin Lyu, Qingqiao Hu, Kehan Qi, Zhan Shi, Wentao Huang, Saumya Gupta, Chao Chen

    Abstract: Whole-slide images (WSIs) in pathology can reach up to 10,000 x 10,000 pixels, posing significant challenges for multimodal large language model (MLLM) due to long context length and high computational demands. Previous methods typically focus on patch-level analysis or slide-level classification using CLIP-based models with multi-instance learning, but they lack the generative capabilities needed… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

  9. arXiv:2507.14467  [pdf, ps, other

    math.DS cs.LG

    Learning Stochastic Hamiltonian Systems via Stochastic Generating Function Neural Network

    Authors: Chen Chen, Lijin Wang, Yanzhao Cao, Xupeng Cheng

    Abstract: In this paper we propose a novel neural network model for learning stochastic Hamiltonian systems (SHSs) from observational data, termed the stochastic generating function neural network (SGFNN). SGFNN preserves symplectic structure of the underlying stochastic Hamiltonian system and produces symplectic predictions. Our model utilizes the autoencoder framework to identify the randomness of the lat… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  10. arXiv:2507.14430  [pdf, ps, other

    cs.CL

    X-Intelligence 3.0: Training and Evaluating Reasoning LLM for Semiconductor Display

    Authors: Xiaolin Yan, Yangxing Liu, Jiazhang Zheng, Chi Liu, Mingyu Du, Caisheng Chen, Haoyang Liu, Ming Ding, Yuan Li, Qiuping Liao, Linfeng Li, Zhili Mei, Siyu Wan, Li Li, Ruyi Zhong, Jiangling Yu, Xule Liu, Huihui Hu, Jiameng Yue, Ruohui Cheng, Qi Yang, Liangqing Wu, Ke Zhu, Chi Zhang, Chufei Jing , et al. (31 additional authors not shown)

    Abstract: Large language models (LLMs) have recently achieved significant advances in reasoning and demonstrated their advantages in solving challenging problems. Yet, their effectiveness in the semiconductor display industry remains limited due to a lack of domain-specific training and expertise. To bridge this gap, we present X-Intelligence 3.0, the first high-performance reasoning model specifically deve… ▽ More

    Submitted 22 July, 2025; v1 submitted 18 July, 2025; originally announced July 2025.

    Comments: Technical Report

  11. arXiv:2507.13814  [pdf, ps, other

    cs.MA

    CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education

    Authors: Jianing Zhao, Peng Gao, Jiannong Cao, Zhiyuan Wen, Chen Chen, Jianing Yin, Ruosong Yang, Bo Yuan

    Abstract: Large Language Models (LLMs) have demonstrated considerable potential in improving coding education by providing support for code writing, explanation, and debugging. However, existing LLM-based approaches generally fail to assess students' abilities, design learning plans, provide personalized material aligned with individual learning goals, and enable interactive learning. Current work mostly us… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 4 pages, 4 figures. Demo video available at: https://youtu.be/9iIVmTT4CVk

  12. arXiv:2507.13575  [pdf, ps, other

    cs.LG cs.AI

    Apple Intelligence Foundation Language Models: Tech Report 2025

    Authors: Hanzhi Zhou, Erik Hornberger, Pengsheng Guo, Xiyou Zhou, Saiwen Wang, Xin Wang, Yifei He, Xuankai Chang, Rene Rauch, Louis D'hauwe, John Peebles, Alec Doane, Kohen Chia, Jenna Thibodeau, Zi-Yi Dou, Yuanyang Zhang, Ruoming Pang, Reed Li, Zhifeng Chen, Jeremy Warner, Zhaoyang Xu, Sophy Lee, David Mizrahi, Ramsey Tantawi, Chris Chaney , et al. (370 additional authors not shown)

    Abstract: We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transform… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  13. arXiv:2507.12889  [pdf, ps, other

    cs.CV

    Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context

    Authors: Mengke Song, Yuge Xie, Qi Cui, Luming Li, Xinyu Liu, Guotao Wang, Chenglizhao Chen, Shanchen Pang

    Abstract: Emotion recognition,as a step toward mind reading,seeks to infer internal states from external cues.Most existing methods rely on explicit signals-such as facial expressions,speech,or gestures-that reflect only bodily responses and overlook the influence of environmental context.These cues are often voluntary,easy to mask,and insufficient for capturing deeper,implicit emotions. Physiological signa… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  14. arXiv:2507.11848  [pdf, ps, other

    cs.HC cs.AI q-bio.QM

    Interactive Hybrid Rice Breeding with Parametric Dual Projection

    Authors: Changjian Chen, Pengcheng Wang, Fei Lyu, Zhuo Tang, Li Yang, Long Wang, Yong Cai, Feng Yu, Kenli Li

    Abstract: Hybrid rice breeding crossbreeds different rice lines and cultivates the resulting hybrids in fields to select those with desirable agronomic traits, such as higher yields. Recently, genomic selection has emerged as an efficient way for hybrid rice breeding. It predicts the traits of hybrids based on their genes, which helps exclude many undesired hybrids, largely reducing the workload of field cu… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  15. arXiv:2507.11558  [pdf, ps, other

    cs.CV cs.AI

    Reprogramming Vision Foundation Models for Spatio-Temporal Forecasting

    Authors: Changlu Chen, Yanbin Liu, Chaoxi Niu, Ling Chen, Tianqing Zhu

    Abstract: Foundation models have achieved remarkable success in natural language processing and computer vision, demonstrating strong capabilities in modeling complex patterns. While recent efforts have explored adapting large language models (LLMs) for time-series forecasting, LLMs primarily capture one-dimensional sequential dependencies and struggle to model the richer spatio-temporal (ST) correlations e… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  16. arXiv:2507.11152  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Latent Space Consistency for Sparse-View CT Reconstruction

    Authors: Duoyou Chen, Yunqing Chen, Can Zhang, Zhou Wang, Cheng Chen, Ruoxiu Xiao

    Abstract: Computed Tomography (CT) is a widely utilized imaging modality in clinical settings. Using densely acquired rotational X-ray arrays, CT can capture 3D spatial features. However, it is confronted with challenged such as significant time consumption and high radiation exposure. CT reconstruction methods based on sparse-view X-ray images have garnered substantial attention from researchers as they pr… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: ACMMM2025 Accepted

  17. arXiv:2507.11090  [pdf, ps, other

    cs.SI

    Enhance Stability of Network by Edge Anchor

    Authors: Hongbo Qiu, Renjie Sun, Chen chen, Xiaoyang Wang

    Abstract: With the rapid growth of online social networks, strengthening their stability has emerged as a key research focus. This study aims to identify influential relationships that significantly impact community stability. In this paper, we introduce and explore the anchor trussness reinforcement problem to reinforce the overall user engagement of networks by anchoring some edges. Specifically, for a gi… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  18. arXiv:2507.11075  [pdf

    cs.CV cs.AI

    Joint angle model based learning to refine kinematic human pose estimation

    Authors: Chang Peng, Yifei Zhou, Huifeng Xi, Shiqing Huang, Chuangye Chen, Jianming Yang, Bao Yang, Zhenyu Jiang

    Abstract: Marker-free human pose estimation (HPE) has found increasing applications in various fields. Current HPE suffers from occasional errors in keypoint recognition and random fluctuation in keypoint trajectories when analyzing kinematic human poses. The performance of existing deep learning-based models for HPE refinement is considerably limited by inaccurate training datasets in which the keypoints a… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    ACM Class: I.4.9; I.5.4; J.3

  19. arXiv:2507.10928  [pdf, ps, other

    cs.NI cs.DC

    Arcturus: A Cloud Overlay Network for Global Accelerator with Enhanced Performance and Stability

    Authors: Matthew Yang Liu, Chuang Chen, Pengcheng Lv, Hui Guo, Yanan Zhang, Cong Wang, Yusen Li, Zhenyu Li, Yu-Chu Tian

    Abstract: Global Accelerator (GA) services play a vital role in ensuring low-latency, high-reliability communication for real-time interactive applications. However, existing GA offerings are tightly bound to specific cloud providers, resulting in high costs, rigid deployment, and limited flexibility, especially for large-scale or budget-sensitive deployments. Arcturus is a cloud-native GA framework that re… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  20. arXiv:2507.10017  [pdf, ps, other

    cs.DB

    Efficient Temporal Simple Path Graph Generation

    Authors: Zhiyang Tang, Yanping Wu, Xiangjun Zai, Chen Chen, Xiaoyang Wang, Ying Zhang

    Abstract: Interactions between two entities often occur at specific timestamps, which can be modeled as a temporal graph. Exploring the relationships between vertices based on temporal paths is one of the fundamental tasks. In this paper, we conduct the first research to propose and investigate the problem of generating the temporal simple path graph (tspG), which is the subgraph consisting of all temporal… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  21. arXiv:2507.09140  [pdf, ps, other

    cs.GR

    Interactive Drawing Guidance for Anime Illustrations with Diffusion Model

    Authors: Chuang Chen, Xiaoxuan Xie, Yongming Zhang, Tianyu Zhang, Haoran Xie

    Abstract: Creating high-quality anime illustrations presents notable challenges, particularly for beginners, due to the intricate styles and fine details inherent in anime art. We present an interactive drawing guidance system specifically designed for anime illustrations to address this issue. It offers real-time guidance to help users refine their work and streamline the creative process. Our system is bu… ▽ More

    Submitted 12 July, 2025; originally announced July 2025.

    Comments: 9 pages, 7 figures. In proceedings of NICOGRAPH International 2025

  22. arXiv:2507.09010  [pdf, ps, other

    cs.AR cs.AI

    Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference

    Authors: Chun-Ting Chen, HanGyeol Mun, Jian Meng, Mohamed S. Abdelfattah, Jae-sun Seo

    Abstract: Edge inference for large language models (LLM) offers secure, low-latency, and cost-effective inference solutions. We emphasize that an edge accelerator should achieve high area efficiency and minimize external memory access (EMA) during the memory-bound decode stage, while maintaining high energy efficiency during the compute intensive prefill stage. This paper proposes an edge LLM inference acce… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: Accepted as a conference paper at the 2025 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

  23. arXiv:2507.09009  [pdf

    cs.LG cs.AI

    Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography

    Authors: Zhengxiao He, Huayu Li, Geng Yuan, William D. S. Killgore, Stuart F. Quan, Chen X. Chen, Ao Li

    Abstract: Methods: We developed a self-supervised deep learning model that extracts meaningful patterns from multi-modal signals (Electroencephalography (EEG), Electrocardiography (ECG), and respiratory signals). The model was trained on data from 4,398 participants. Projection scores were derived by contrasting embeddings from individuals with and without CVD outcomes. External validation was conducted in… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  24. arXiv:2507.08784  [pdf, ps, other

    cs.LG math.OC

    Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees

    Authors: Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan

    Abstract: Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized a… ▽ More

    Submitted 20 July, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

    Comments: 18 pages, 5 figures

  25. arXiv:2507.08749  [pdf, ps, other

    cs.LG

    Modeling Partially Observed Nonlinear Dynamical Systems and Efficient Data Assimilation via Discrete-Time Conditional Gaussian Koopman Network

    Authors: Chuanqi Chen, Zhongrui Wang, Nan Chen, Jin-Long Wu

    Abstract: A discrete-time conditional Gaussian Koopman network (CGKN) is developed in this work to learn surrogate models that can perform efficient state forecast and data assimilation (DA) for high-dimensional complex dynamical systems, e.g., systems governed by nonlinear partial differential equations (PDEs). Focusing on nonlinear partially observed systems that are common in many engineering and earth s… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  26. arXiv:2507.08513  [pdf, ps, other

    cs.GR cs.CV

    Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation

    Authors: Liu He, Xiao Zeng, Yizhi Song, Albert Y. C. Chen, Lu Xia, Shashwat Verma, Sankalp Dayal, Min Sun, Cheng-Hao Kuo, Daniel Aliaga

    Abstract: Multimodal Large Language Models (MLLMs) struggle with accurately capturing camera-object relations, especially for object orientation, camera viewpoint, and camera shots. This stems from the fact that existing MLLMs are trained on images with limited diverse camera-object relations and corresponding textual descriptions. To address this, we propose a synthetic generation pipeline to create large-… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  27. arXiv:2507.06607  [pdf, ps, other

    cs.CL cs.LG

    Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

    Authors: Liliang Ren, Congcong Chen, Haoran Xu, Young Jin Kim, Adam Atkinson, Zheng Zhan, Jiankai Sun, Baolin Peng, Liyuan Liu, Shuohang Wang, Hao Cheng, Jianfeng Gao, Weizhu Chen, Yelong Shen

    Abstract: Recent advances in language modeling have demonstrated the effectiveness of State Space Models (SSMs) for efficient sequence modeling. While hybrid architectures such as Samba and the decoder-decoder architecture, YOCO, have shown promising performance gains over Transformers, prior works have not investigated the efficiency potential of representation sharing between SSM layers. In this paper, we… ▽ More

    Submitted 16 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

  28. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  29. arXiv:2507.06244  [pdf

    cs.CR cs.PF

    A Comparative Study and Implementation of Key Derivation Functions Standardized by NIST and IEEE

    Authors: Abel C. H. Chen

    Abstract: Since many applications and services require pseudorandom numbers (PRNs), it is feasible to generate specific PRNs under given key values and input messages using Key Derivation Functions (KDFs). These KDFs are primarily constructed based on Message Authentication Codes (MACs), where the MAC serves as a core component in the generation of pseudorandom numbers. In light of this, the study first exa… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

    Comments: in Chinese language

  30. arXiv:2507.05613  [pdf

    cs.AI

    Domain adaptation of large language models for geotechnical applications

    Authors: Lei Fan, Fangxue Liu, Cheng Chen

    Abstract: Recent developments in large language models (LLMs) are opening up new opportunities in geotechnical engineering and engineering geology. While general-purpose LLMs possess broad capabilities, effective application in geotechnics often requires domain-specific adaptation. Such tailored LLMs are increasingly employed to streamline geotechnical workflows. This paper presents the first survey of the… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  31. arXiv:2507.05588  [pdf

    cs.CV

    Semi-Supervised Defect Detection via Conditional Diffusion and CLIP-Guided Noise Filtering

    Authors: Shuai Li, Shihan Chen, Wanru Geng, Zhaohua Xu, Xiaolu Liu, Can Dong, Zhen Tian, Changlin Chen

    Abstract: In the realm of industrial quality inspection, defect detection stands as a critical component, particularly in high-precision, safety-critical sectors such as automotive components aerospace, and medical devices. Traditional methods, reliant on manual inspection or early image processing algorithms, suffer from inefficiencies, high costs, and limited robustness. This paper introduces a semi-super… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  32. arXiv:2507.05227  [pdf, ps, other

    cs.RO cs.CV cs.LG cs.MM eess.SY

    NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

    Authors: Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu

    Abstract: Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM Multimedia 2025

  33. arXiv:2507.04999  [pdf, ps, other

    cs.CV

    Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport

    Authors: Qinkai Yu, Jianyang Xie, Yitian Zhao, Cheng Chen, Lijun Zhang, Liming Chen, Jun Cheng, Lu Liu, Yalin Zheng, Yanda Meng

    Abstract: Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography (OCT) to provide a comprehensive view of ocular pathologies. However, the uneven global distribution of healthcare resources often results in real-world clinical scenarios encountering incomplete multimodal data, which significantly compromises diagnostic accuracy. Existing commonly used p… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025

  34. arXiv:2507.04984  [pdf, ps, other

    cs.CV

    TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

    Authors: Zonglin Lyu, Chen Chen

    Abstract: Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion model… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  35. arXiv:2507.04736  [pdf, ps, other

    cs.AI cs.AR cs.PL

    ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

    Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, Ying Wang

    Abstract: Large Language Models (LLMs) show significant potential for automating Register-Transfer Level (RTL) code generation. However, current approaches face a critical challenge: they can not simultaneously optimize for functional correctness and hardware quality (Power, Performance, Area - PPA). Methods based on supervised fine-tuning often generate functionally correct but PPA-suboptimal code, lacking… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  36. arXiv:2507.04509  [pdf, ps, other

    cs.CV cs.AI

    MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization

    Authors: Zhendong Xiao, Wu Wei, Shujie Ji, Shan Yang, Changhao Chen

    Abstract: Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera's position and orientation (6-DoF) from images and is essential for applications in augmented reality (AR), mixed reality (MR), autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods that regress camera pose from images in a single scene, wh… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: PRCV

  37. arXiv:2507.04294  [pdf, ps, other

    cs.IR

    BiFair: A Fairness-aware Training Framework for LLM-enhanced Recommender Systems via Bi-level Optimization

    Authors: Jiaming Zhang, Yuyuan Li, Yiqun Xu, Li Zhang, Xiaohua Feng, Zhifei Ren, Chaochao Chen

    Abstract: Large Language Model-enhanced Recommender Systems (LLM-enhanced RSs) have emerged as a powerful approach to improving recommendation quality by leveraging LLMs to generate item representations. Despite these advancements, the integration of LLMs raises severe fairness concerns. Existing studies reveal that LLM-based RSs exhibit greater unfairness than traditional RSs, yet fairness issues in LLM-en… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  38. arXiv:2507.04107  [pdf, ps, other

    cs.CV

    VICI: VLM-Instructed Cross-view Image-localisation

    Authors: Xiaohan Zhang, Tavis Shore, Chen Chen, Oscar Mendez, Simon Hadfield, Safwan Wshah

    Abstract: In this paper, we present a high-performing solution to the UAVM 2025 Challenge, which focuses on matching narrow FOV street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic s… ▽ More

    Submitted 22 July, 2025; v1 submitted 5 July, 2025; originally announced July 2025.

  39. arXiv:2507.03704  [pdf, ps, other

    cs.CL cs.AI

    Controlling Thinking Speed in Reasoning Models

    Authors: Zhengkai Lin, Zhihang Fu, Ze Chen, Chao Chen, Liang Xie, Wenxiao Wang, Deng Cai, Zheng Wang, Jieping Ye

    Abstract: Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, opt… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  40. arXiv:2507.02824  [pdf, ps, other

    eess.SP cs.AI cs.LG

    DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

    Authors: Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang

    Abstract: In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The tradit… ▽ More

    Submitted 3 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

    Comments: 5 pages, 4 figures, 2 tables, accepted by 2024 IEEE Globecom Workshops

  41. arXiv:2507.02768  [pdf, ps, other

    eess.AS cs.CL cs.SD

    DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang , et al. (3 additional authors not shown)

    Abstract: We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following, without requiring task-specific audio instruction-tuning. Recent LALMs typically augment Large Language Models (LLMs) with auditory capabilities by training on large-scale, manually curated or LLM-synthesized audio-instruction datasets. However, these… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Model and code available at: https://github.com/kehanlu/DeSTA2.5-Audio

  42. arXiv:2507.01923  [pdf, ps, other

    cs.CL

    Decision-Oriented Text Evaluation

    Authors: Yu-Shiang Huang, Chuan-Ju Wang, Chung-Chi Chen

    Abstract: Natural language generation (NLG) is increasingly deployed in high-stakes domains, yet common intrinsic evaluation methods, such as n-gram overlap or sentence plausibility, weakly correlate with actual decision-making efficacy. We propose a decision-oriented framework for evaluating generated text by directly measuring its influence on human and large language model (LLM) decision outcomes. Using… ▽ More

    Submitted 3 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  43. arXiv:2507.01535  [pdf, ps, other

    cs.CV

    TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking

    Authors: Bingxi Liu, Calvin Chen, Junhao Li, Guyang Yu, Haoqian Song, Xuchen Liu, Jinqiang Cui, Hong Zhang

    Abstract: The Vision Transformer (ViT) model has long struggled with the challenge of quadratic complexity, a limitation that becomes especially critical in unmanned aerial vehicle (UAV) tracking systems, where data must be processed in real time. In this study, we explore the recently proposed State-Space Model, Mamba, leveraging its computational efficiency and capability for long-sequence modeling to eff… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 12 pages

  44. arXiv:2507.01422  [pdf, ps, other

    cs.CV cs.AI

    DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal

    Authors: Wenjie Liu, Bingshu Wang, Ze Wang, C. L. Philip Chen

    Abstract: Document shadow removal is a crucial task in the field of document image enhancement. However, existing methods tend to remove shadows with constant color background and ignore color shadows. In this paper, we first design a diffusion model in latent space for document image shadow removal, called DocShaDiffusion. It translates shadow images from pixel space to latent space, enabling the model to… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  45. arXiv:2507.00839  [pdf, ps, other

    cs.DB

    RapidStore: An Efficient Dynamic Graph Storage System for Concurrent Queries

    Authors: Chiyu Hao, Jixian Su, Shixuan Sun, Hao Zhang, Sen Gao, Jianwen Zhao, Chenyi Zhang, Jieru Zhao, Chen Chen, Minyi Guo

    Abstract: Dynamic graph storage systems are essential for real-time applications such as social networks and recommendation, where graph data continuously evolves. However, they face significant challenges in efficiently handling concurrent read and write operations. We find that existing methods suffer from write queries interfering with read efficiency, substantial time and space overhead due to per-edge… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 17 pages, 18 figures

  46. arXiv:2507.00398  [pdf, ps, other

    eess.IV cs.CV

    Accurate and Efficient Fetal Birth Weight Estimation from 3D Ultrasound

    Authors: Jian Wang, Qiongying Ni, Hongkui Yu, Ruixuan Yao, Jinqiao Ying, Bin Zhang, Xingyi Yang, Jin Peng, Jiongquan Chen, Junxuan Yu, Wenlong Shi, Chaoyu Chen, Zhongnuo Yan, Mingyuan Luo, Gaocheng Cai, Dong Ni, Jing Lu, Xin Yang

    Abstract: Accurate fetal birth weight (FBW) estimation is essential for optimizing delivery decisions and reducing perinatal mortality. However, clinical methods for FBW estimation are inefficient, operator-dependent, and challenging to apply in cases of complex fetal anatomy. Existing deep learning methods are based on 2D standard ultrasound (US) images or videos that lack spatial information, limiting the… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  47. arXiv:2506.23999  [pdf, ps, other

    cs.RO

    Predictive Risk Analysis and Safe Trajectory Planning for Intelligent and Connected Vehicles

    Authors: Zeyu Han, Mengchi Cai, Chaoyi Chen, Qingwen Meng, Guangwei Wang, Ying Liu, Qing Xu, Jianqiang Wang, Keqiang Li

    Abstract: The safe trajectory planning of intelligent and connected vehicles is a key component in autonomous driving technology. Modeling the environment risk information by field is a promising and effective approach for safe trajectory planning. However, existing risk assessment theories only analyze the risk by current information, ignoring future prediction. This paper proposes a predictive risk analys… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  48. arXiv:2506.23388  [pdf, ps, other

    cs.GR cs.CG cs.MS math.MG

    Escher Tile Deformation via Closed-Form Solution

    Authors: Crane He Chen, Vladimir G. Kim

    Abstract: We present a real-time deformation method for Escher tiles -- interlocking organic forms that seamlessly tessellate the plane following symmetry rules. We formulate the problem as determining a periodic displacement field. The goal is to deform Escher tiles without introducing gaps or overlaps. The resulting displacement field is obtained in closed form by an analytical solution. Our method proces… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Journal ref: SIGGRAPH 2025

  49. arXiv:2506.23100  [pdf, ps, other

    cs.SE

    Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search

    Authors: Jiayi Zhang, Kai Huang, Jian Zhang, Yang Liu, Chunyang Chen

    Abstract: Automated Program Repair (APR) techniques aim to automatically fix buggy programs. Among these, Large Language Model-based (LLM-based) approaches have shown great promise. Recent advances demonstrate that directly leveraging LLMs can achieve leading results. However, these techniques remain suboptimal in generating contextually relevant and accurate patches, as they often overlook repair ingredien… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICSE 2026. Jiayi Zhang and Kai Huang contributed equally to this work

  50. arXiv:2506.23009  [pdf, ps, other

    cs.CV

    MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models

    Authors: Jian Chen, Wenye Ma, Penghang Liu, Wei Wang, Tengwei Song, Ming Li, Chenguang Wang, Ruiyi Zhang, Changyou Chen

    Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable visual reasoning abilities in natural images, text-rich documents, and graphic designs. However, their ability to interpret music sheets remains underexplored. To bridge this gap, we introduce MusiXQA, the first comprehensive dataset for evaluating and advancing MLLMs in music sheet understanding. MusiXQA features high-quality synth… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.