这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 5,389 results for author: Yang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17089  [pdf, ps, other

    cs.CV cs.RO

    IONext: Unlocking the Next Era of Inertial Odometry

    Authors: Shanshan Zhang, Siyue Wang, Tianshui Wen, Qi Zhang, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspi… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  2. arXiv:2507.16865  [pdf, ps, other

    cs.RO

    ResKACNNet: A Residual ChebyKAN Network for Inertial Odometry

    Authors: Shanshan Zhang, Tianshui Wen, Siyue Wang, Qi Zhang, Ziheng Zhou, Huiru Zheng, Lingxiang Zheng, Yu Yang

    Abstract: Inertial Measurement Unit (IMU) has become a key technology for achieving low-cost and precise positioning. However, traditional CNN-based inertial positioning methods struggle to capture the nonlinear motion characteristics and long-term dependencies in IMU data. To address this limitation, we propose a novel inertial positioning network with a generic backbone called ResChebyKAN, which leverages… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  3. arXiv:2507.16672  [pdf

    cs.LG cs.AI

    Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs

    Authors: Yushang Zhao, Huijie Shen, Dannier Li, Lu Chang, Chengrui Zhou, Yinuo Yang

    Abstract: Generative, explainable, and flexible recommender systems, derived using Large Language Models (LLM) are promising and poorly adapted to the cold-start user situation, where there is little to no history of interaction. The current solutions i.e. supervised fine-tuning and collaborative filtering are dense-user-item focused and would be expensive to maintain and update. This paper introduces a met… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  4. arXiv:2507.16632  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Step-Audio 2 Technical Report

    Authors: Boyong Wu, Chao Yan, Chen Hu, Cheng Yi, Chengli Feng, Fei Tian, Feiyu Shen, Gang Yu, Haoyang Zhang, Jingbei Li, Mingrui Chen, Peng Liu, Wang You, Xiangyu Tony Zhang, Xingyuan Li, Xuerui Yang, Yayue Deng, Yechang Huang, Yuxin Li, Yuxin Zhang, Zhao You, Brian Li, Changyi Wan, Hanpeng Hu, Jiangjie Zhen , et al. (84 additional authors not shown)

    Abstract: This paper presents Step-Audio~2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating a latent audio encoder and reasoning-centric reinforcement learning (RL), Step-Audio 2 achieves promising performance in automatic speech recognition (ASR) and audio understanding. To facilitate genuine end-to-end speech convers… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  5. arXiv:2507.16433  [pdf, ps, other

    stat.ME cs.LG

    Adaptive Multi-task Learning for Multi-sector Portfolio Optimization

    Authors: Qingliang Fan, Ruike Wu, Yanrong Yang

    Abstract: Accurate transfer of information across multiple sectors to enhance model estimation is both significant and challenging in multi-sector portfolio optimization involving a large number of assets in different classes. Within the framework of factor modeling, we propose a novel data-adaptive multi-task learning methodology that quantifies and learns the relatedness among the principal temporal subsp… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  6. arXiv:2507.16414  [pdf, ps, other

    cs.AI

    Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

    Authors: Hongyi Tang, Zhihao Zhu, Yi Yang

    Abstract: The performance of large language models (LLMs) is closely tied to their training data, which can include copyrighted material or private information, raising legal and ethical concerns. Additionally, LLMs face criticism for dataset contamination and internalizing biases. To address these issues, the Pre-Training Data Detection (PDD) task was proposed to identify if specific data was included in a… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  7. arXiv:2507.16331  [pdf, ps, other

    cs.CL

    Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

    Authors: Chuanhao Yan, Fengdi Che, Xuhan Huang, Xu Xu, Xin Li, Yizhi Li, Xingwei Qu, Jingzhe Shi, Zhuangzhuang He, Chenghua Lin, Yaodong Yang, Binhang Yuan, Hang Zhao, Yu Qiao, Bowen Zhou, Jie Fu

    Abstract: Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither reliable nor scalable. In fact, the prevalent large proprietary models could hardly generate verifiable programs. A promising yet largely uncharted alternative is… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  8. arXiv:2507.16213  [pdf, ps, other

    cs.CV cs.AI

    Advancing Visual Large Language Model for Multi-granular Versatile Perception

    Authors: Wentao Xiang, Haoxian Tan, Cong Wei, Yujie Zhong, Dengjie Li, Yujiu Yang

    Abstract: Perception is a fundamental task in the field of computer vision, encompassing a diverse set of subtasks that can be systematically categorized into four distinct groups based on two dimensions: prediction type and instruction type. Notably, existing researches often focus solely on a limited subset of these potential combinations, which constrains their applicability and versatility across variou… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: To appear in ICCV 2025

  9. arXiv:2507.16121  [pdf, ps, other

    cs.RO

    DWSFormer: A Lightweight Inertial Odometry Network for Complex Motion Modeling

    Authors: Shanshan Zhang, Qi Zhang, Siyue Wang, Tianshui Wen, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Inertial odometry (IO) directly estimates the position of a carrier from inertial sensor measurements and serves as a core technology for the widespread deployment of consumer grade localization systems. While existing IO methods can accurately reconstruct simple and near linear motion trajectories, they often fail to account for drift errors caused by complex motion patterns such as turning. This… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  10. arXiv:2507.16120  [pdf, ps, other

    cs.RO

    FTIN: Frequency-Time Integration Network for Inertial Odometry

    Authors: Shanshan Zhang, Qi Zhang, Siyue Wang, Tianshui Wen, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: In recent years, machine learning has achieved significant advancements in inertial odometry. However, most existing inertial odometry methods primarily rely on CNNs in the time domain. These methods often struggle to capture long-term dependency in inertial measurement unit data, thereby constraining the potential for further improvements in localization accuracy. To address these issues, we prop… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  11. arXiv:2507.15863  [pdf, ps, other

    cs.CL cs.AI

    eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs

    Authors: Isaac Shi, Zeyuan Li, Fan Liu, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi

    Abstract: We present the DEREK (Deep Extraction & Reasoning Engine for Knowledge) Module, a secure and scalable Retrieval-Augmented Generation pipeline designed specifically for enterprise document question answering. Designed and implemented by eSapiens, the system ingests heterogeneous content (PDF, Office, web), splits it into 1,000-token overlapping chunks, and indexes them in a hybrid HNSW+BM25 store.… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

    Comments: 8 pages;1 figure;5 tables

  12. arXiv:2507.15800  [pdf, ps, other

    eess.SP cs.IT

    Fluid Antenna-enabled Near-Field Integrated Sensing, Computing and Semantic Communication for Emerging Applications

    Authors: Yinchao Yang, Jingxuan Zhou, Zhaohui Yang, Mohammad Shikh-Bahaei

    Abstract: The integration of sensing and communication (ISAC) is a key enabler for next-generation technologies. With high-frequency bands and large-scale antenna arrays, the Rayleigh distance extends, necessitating near-field (NF) models where waves are spherical. Although NF-ISAC improves both sensing and communication, it also poses challenges such as high data volume and potential privacy risks. To addr… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted by IEEE Transactions on Cognitive Communications and Networking

  13. arXiv:2507.15493  [pdf, ps, other

    cs.RO cs.AI cs.CV

    GR-3 Technical Report

    Authors: Chilam Cheang, Sijin Chen, Zhongren Cui, Yingdong Hu, Liqun Huang, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Xiao Ma, Hao Niu, Wenxuan Ou, Wanli Peng, Zeyu Ren, Haixin Shi, Jiawen Tian, Hongtao Wu, Xin Xiao, Yuyang Xiao, Jiafeng Xu, Yichu Yang

    Abstract: We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model. It showcases exceptional capabilities in generalizing to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effec… ▽ More

    Submitted 22 July, 2025; v1 submitted 21 July, 2025; originally announced July 2025.

    Comments: Tech report. Authors are listed in alphabetical order. Project page: https://seed.bytedance.com/GR3/

  14. arXiv:2507.15293  [pdf, ps, other

    cs.RO

    RepILN: Reparameterized Inertial Localization Network

    Authors: Shanshan Zhang, Tianshui Wen, Siyue Wang, Qi Zhang, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Inertial localization is regarded as a promising positioning solution for consumer-grade IoT devices due to its cost-effectiveness and independence from external infrastructure. However, data-driven inertial localization methods often rely on increasingly complex network architectures to improve accuracy, which challenges the limited computational resources of IoT devices. Moreover, these methods… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  15. arXiv:2507.15257  [pdf, ps, other

    cs.CV

    MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP

    Authors: Pei An, Jiaqi Yang, Muyao Peng, You Yang, Qiong Liu, Xiaolin Wu, Liangliang Nan

    Abstract: Image-to-point-cloud (I2P) registration is a fundamental problem in computer vision, focusing on establishing 2D-3D correspondences between an image and a point cloud. The differential perspective-n-point (PnP) has been widely used to supervise I2P registration networks by enforcing the projective constraints on 2D-3D correspondences. However, differential PnP is highly sensitive to noise and outl… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  16. arXiv:2507.15150  [pdf, ps, other

    cs.CV

    Event-based Graph Representation with Spatial and Motion Vectors for Asynchronous Object Detection

    Authors: Aayush Atul Verma, Arpitsinh Vaghela, Bharatesh Chakravarthi, Kaustav Chanda, Yezhou Yang

    Abstract: Event-based sensors offer high temporal resolution and low latency by generating sparse, asynchronous data. However, converting this irregular data into dense tensors for use in standard neural networks diminishes these inherent advantages, motivating research into graph representations. While such methods preserve sparsity and support asynchronous inference, their performance on downstream tasks… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  17. arXiv:2507.15066  [pdf, ps, other

    cs.LG cs.AI cs.MM

    Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback

    Authors: Yiyuan Yang, Zichuan Liu, Lei Song, Kai Ying, Zhiguang Wang, Tom Bamford, Svitlana Vyetrenko, Jiang Bian, Qingsong Wen

    Abstract: Time series anomaly detection is critical across various domains, yet current approaches often limit analysis to mere binary anomaly classification without detailed categorization or further explanatory reasoning. To address these limitations, we propose a novel task, Time-series Reasoning for Anomaly (Time-RA) that transforms classical time series anomaly detection from a discriminative into a ge… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: Under review. 19 pages, 8 figures, 12 tables

  18. arXiv:2507.14743  [pdf, ps, other

    cs.CV

    InterAct-Video: Reasoning-Rich Video QA for Urban Traffic

    Authors: Joseph Raj Vishal, Rutuja Patil, Manas Srinivas Gowda, Katha Naik, Yezhou Yang, Bharatesh Chakravarthi

    Abstract: Traffic monitoring is crucial for urban mobility, road safety, and intelligent transportation systems (ITS). Deep learning has advanced video-based traffic monitoring through video question answering (VideoQA) models, enabling structured insight extraction from traffic videos. However, existing VideoQA models struggle with the complexity of real-world traffic scenes, where multiple concurrent even… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

  19. arXiv:2507.14485  [pdf, ps, other

    cs.CV cs.AI

    Benefit from Reference: Retrieval-Augmented Cross-modal Point Cloud Completion

    Authors: Hongye Hou, Liu Zhan, Yang Yang

    Abstract: Completing the whole 3D structure based on an incomplete point cloud is a challenging task, particularly when the residual point cloud lacks typical structural characteristics. Recent methods based on cross-modal learning attempt to introduce instance images to aid the structure feature learning. However, they still focus on each particular input class, limiting their generation abilities. In this… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

  20. arXiv:2507.14088  [pdf, ps, other

    cs.LG

    DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration

    Authors: Xiyun Li, Yining Ding, Yuhua Jiang, Yunlong Zhao, Runpeng Xie, Shuang Xu, Yuanhua Ni, Yiqin Yang, Bo Xu

    Abstract: Real-time human-artificial intelligence (AI) collaboration is crucial yet challenging, especially when AI agents must adapt to diverse and unseen human behaviors in dynamic scenarios. Existing large language model (LLM) agents often fail to accurately model the complex human mental characteristics such as domain intentions, especially in the absence of direct communication. To address this limitat… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Journal ref: cogsci-2025

  21. arXiv:2507.14046  [pdf, ps, other

    eess.IV cs.CV cs.LG

    D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging

    Authors: Hao Fang, Hao Yu, Sihao Teng, Tao Zhang, Siyi Yuan, Huaiwu He, Zhe Liu, Yunjie Yang

    Abstract: Unsupervised learning methods, such as Deep Image Prior (DIP), have shown great potential in tomographic imaging due to their training-data-free nature and high generalization capability. However, their reliance on numerous network parameter iterations results in high computational costs, limiting their practical application, particularly in complex 3D or time-sequence tomographic imaging tasks. T… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 11 pages, 9 figures

  22. arXiv:2507.14031  [pdf, ps, other

    cs.CV cs.ET cs.LG

    QuantEIT: Ultra-Lightweight Quantum-Assisted Inference for Chest Electrical Impedance Tomography

    Authors: Hao Fang, Sihao Teng, Hao Yu, Siyi Yuan, Huaiwu He, Zhe Liu, Yunjie Yang

    Abstract: Electrical Impedance Tomography (EIT) is a non-invasive, low-cost bedside imaging modality with high temporal resolution, making it suitable for bedside monitoring. However, its inherently ill-posed inverse problem poses significant challenges for accurate image reconstruction. Deep learning (DL)-based approaches have shown promise but often rely on complex network architectures with a large numbe… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 10 pages, 12 figures

  23. arXiv:2507.13685  [pdf, ps, other

    cs.LG

    Kolmogorov-Arnold Networks-based GRU and LSTM for Loan Default Early Prediction

    Authors: Yue Yang, Zihan Su, Ying Zhang, Chang Chuan Goh, Yuxiang Lin, Anthony Graham Bellotti, Boon Giin Lee

    Abstract: This study addresses a critical challenge in time series anomaly detection: enhancing the predictive capability of loan default models more than three months in advance to enable early identification of default events, helping financial institutions implement preventive measures before risk events materialize. Existing methods have significant drawbacks, such as their lack of accuracy in early pre… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  24. arXiv:2507.13575  [pdf, ps, other

    cs.LG cs.AI

    Apple Intelligence Foundation Language Models: Tech Report 2025

    Authors: Hanzhi Zhou, Erik Hornberger, Pengsheng Guo, Xiyou Zhou, Saiwen Wang, Xin Wang, Yifei He, Xuankai Chang, Rene Rauch, Louis D'hauwe, John Peebles, Alec Doane, Kohen Chia, Jenna Thibodeau, Zi-Yi Dou, Yuanyang Zhang, Ruoming Pang, Reed Li, Zhifeng Chen, Jeremy Warner, Zhaoyang Xu, Sophy Lee, David Mizrahi, Ramsey Tantawi, Chris Chaney , et al. (370 additional authors not shown)

    Abstract: We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transform… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  25. arXiv:2507.13540  [pdf, ps, other

    cs.LG

    Provable Low-Frequency Bias of In-Context Learning of Representations

    Authors: Yongyi Yang, Hidenori Tanaka, Wei Hu

    Abstract: In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates. Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations. However, the mechanisms by which LLMs… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  26. arXiv:2507.13388  [pdf

    cs.GR

    DLSF: Dual-Layer Synergistic Fusion for High-Fidelity Image Syn-thesis

    Authors: Zhen-Qi Chen, Yuan-Fu Yang

    Abstract: With the rapid advancement of diffusion-based generative models, Stable Diffusion (SD) has emerged as a state-of-the-art framework for high-fidelity im-age synthesis. However, existing SD models suffer from suboptimal feature aggregation, leading to in-complete semantic alignment and loss of fine-grained details, especially in highly textured and complex scenes. To address these limitations, we pr… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  27. arXiv:2507.13370  [pdf, ps, other

    cs.SI cs.AI cs.MA

    H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance

    Authors: Shijun Guo, Haoran Xu, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyi Zhang, Yishan Song, Jiwei Chen

    Abstract: The openness of social media enables the free exchange of opinions, but it also presents challenges in guiding opinion evolution towards global consensus. Existing methods often directly modify user views or enforce cross-group connections. These intrusive interventions undermine user autonomy, provoke psychological resistance, and reduce the efficiency of global consensus. Additionally, due to th… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  28. arXiv:2507.13344  [pdf, ps, other

    cs.CV

    Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

    Authors: Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yifan Yang, Yujun Shen, Hujun Bao, Xiaowei Zhou

    Abstract: This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel viewpoints. However, the generated videos from these models often lack spatio-temporal consistency, thus degrading view synthesis quality. In this paper, we propose a… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Project page: https://diffuman4d.github.io/

  29. arXiv:2507.13260  [pdf, ps, other

    cs.CV cs.AI

    Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

    Authors: Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen

    Abstract: A prevalent approach in Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViT) involves freezing the majority of the backbone parameters and solely learning low-rank adaptation weight matrices to accommodate downstream tasks. These low-rank matrices are commonly derived through the multiplication structure of down-projection and up-projection matrices, exemplified by metho… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: This paper is accepted by ICCV 2025

  30. arXiv:2507.13140  [pdf, ps, other

    cs.NI

    RIDAS: A Multi-Agent Framework for AI-RAN with Representation- and Intention-Driven Agents

    Authors: Kuiyuan Ding, Caili Guo, Yang Yang, Jianzhang Guo

    Abstract: Sixth generation (6G) networks demand tight integration of artificial intelligence (AI) into radio access networks (RANs) to meet stringent quality of service (QoS) and resource efficiency requirements. Existing solutions struggle to bridge the gap between high level user intents and the low level, parameterized configurations required for optimal performance. To address this challenge, we propose… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 6 pages, 7 figures

  31. arXiv:2507.12841  [pdf, ps, other

    cs.CV

    AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

    Authors: Yiming Ren, Zhiqiang Lin, Yu Li, Gao Meng, Weiyun Wang, Junjie Wang, Zicheng Lin, Jifeng Dai, Yujiu Yang, Wenhai Wang, Ruihang Chu

    Abstract: Controllable captioning is essential for precise multimodal alignment and instruction following, yet existing models often lack fine-grained control and reliable evaluation protocols. To address this gap, we present the AnyCap Project, an integrated solution spanning model, dataset, and evaluation. We introduce AnyCapModel (ACM), a lightweight plug-and-play framework that enhances the controllabil… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  32. arXiv:2507.12832  [pdf, ps, other

    cs.CV cs.AI cs.LG

    MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results

    Authors: Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran

    Abstract: Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1)… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: This paper is the official challenge report for SMOT4SB and is published in the proceedings of MVA 2025 (19th International Conference on Machine Vision and Applications). Official challenge page: https://www.mva-org.jp/mva2025/challenge

  33. arXiv:2507.12795  [pdf, ps, other

    cs.CV cs.AI

    City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning

    Authors: Penglei Sun, Yaoxian Song, Xiangru Zhu, Xiang Liu, Qiang Wang, Yue Liu, Changqun Xia, Tiefeng Li, Yang Yang, Xiaowen Chu

    Abstract: Scene understanding enables intelligent agents to interpret and comprehend their environment. While existing large vision-language models (LVLMs) for scene understanding have primarily focused on indoor household tasks, they face two significant limitations when applied to outdoor large-scale scene understanding. First, outdoor scenarios typically encompass larger-scale environments observed throu… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  34. arXiv:2507.12780  [pdf, ps, other

    cs.CV cs.LG

    Compact Vision Transformer by Reduction of Kernel Complexity

    Authors: Yancheng Wang, Yingzhen Yang

    Abstract: Self-attention and transformer architectures have become foundational components in modern deep learning. Recent efforts have integrated transformer blocks into compact neural architectures for computer vision, giving rise to various efficient vision transformers. In this work, we introduce Transformer with Kernel Complexity Reduction, or KCR-Transformer, a compact transformer block equipped with… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  35. arXiv:2507.12714  [pdf, ps, other

    cs.CV cs.GR

    NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement

    Authors: Yang Yang, Dongni Mao, Hiroaki Santo, Yasuyuki Matsushita, Fumio Okura

    Abstract: We develop a neural parametric model for 3D leaves for plant modeling and reconstruction that are essential for agriculture and computer graphics. While neural parametric models are actively studied for humans and animals, plant leaves present unique challenges due to their diverse shapes and flexible deformation. To this problem, we introduce a neural parametric model for leaves, NeuraLeaf. Capit… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV 2025), Project: https://neuraleaf-yang.github.io/

  36. arXiv:2507.12508  [pdf, ps, other

    cs.CV cs.AI cs.RO

    MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

    Authors: Yuncong Yang, Jiageng Liu, Zheyuan Zhang, Siyuan Zhou, Reuben Tan, Jianwei Yang, Yilun Du, Chuang Gan

    Abstract: Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as anticipating how a scene will look after an egocentric motion: they perceive 2D images but lack an internal model of 3D dynamics. We therefore propose MindJourney, a… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Project Page: https://umass-embodied-agi.github.io/MindJourney

  37. arXiv:2507.12472  [pdf, ps, other

    cs.SE cs.CL

    A Survey of AIOps in the Era of Large Language Models

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S. Yu, Ying Li

    Abstract: As large language models (LLMs) grow increasingly sophisticated and pervasive, their application to various Artificial Intelligence for IT Operations (AIOps) tasks has garnered significant attention. However, a comprehensive understanding of the impact, potential, and limitations of LLMs in AIOps remains in its infancy. To address this gap, we conducted a detailed survey of LLM4AIOps, focusing on… ▽ More

    Submitted 22 June, 2025; originally announced July 2025.

    Comments: Accepted By CSUR, an extended version of "A Survey of AIOps for Failure Management in the Era of Large Language Models" [arXiv:2406.11213]

  38. arXiv:2507.12356  [pdf, ps, other

    cs.CL cs.HC cs.SD

    Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception

    Authors: Liu He, Yuanchao Li, Rui Feng, XinRan Han, Yin-Long Liu, Yuwei Yang, Zude Zhu, Jiahong Yuan

    Abstract: Gender bias has been widely observed in speech perception tasks, influenced by the fundamental voicing differences between genders. This study reveals a gender bias in the perception of Alzheimer's Disease (AD) speech. In a perception experiment involving 16 Chinese listeners evaluating both Chinese and Greek speech, we identified that male speech was more frequently identified as AD, with this bi… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: 12 pages, 5 figures, conference or other essential info

  39. arXiv:2507.12103  [pdf, ps, other

    cs.CV cs.CY

    DeepShade: Enable Shade Simulation by Text-conditioned Image Generation

    Authors: Longchao Da, Xiangrui Liu, Mithun Shivakoti, Thirulogasankar Pranav Kutralingam, Yezhou Yang, Hua Wei

    Abstract: Heatwaves pose a significant threat to public health, especially as global warming intensifies. However, current routing systems (e.g., online maps) fail to incorporate shade information due to the difficulty of estimating shades directly from noisy satellite imagery and the limited availability of training data for generative models. In this paper, we address these challenges through two main con… ▽ More

    Submitted 23 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: 7pages, 4 figures. Accepted to IJCAI 2025

    MSC Class: 68T45; 68U10; 62H35 ACM Class: I.2.10; I.4.8; I.5.1

  40. arXiv:2507.11910  [pdf, ps, other

    cs.CV

    SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring

    Authors: Kaustav Chanda, Aayush Atul Verma, Arpitsinh Vaghela, Yezhou Yang, Bharatesh Chakravarthi

    Abstract: Event-based sensors have emerged as a promising solution for addressing challenging conditions in pedestrian and traffic monitoring systems. Their low-latency and high dynamic range allow for improved response time in safety-critical situations caused by distracted walking or other unusual movements. However, the availability of data covering such scenarios remains limited. To address this gap, we… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Accepted at the 28th IEEE International Conference on Intelligent Transportation Systems (ITSC 2025)

  41. arXiv:2507.11903  [pdf, ps, other

    cs.HC cs.MM

    Unveiling the Visual Rhetoric of Persuasive Cartography: A Case Study of the Design of Octopus Maps

    Authors: Daocheng Lin, Yifan Wang, Yutong Yang, Xingyu Lan

    Abstract: When designed deliberately, data visualizations can become powerful persuasive tools, influencing viewers' opinions, values, and actions. While researchers have begun studying this issue (e.g., to evaluate the effects of persuasive visualization), we argue that a fundamental mechanism of persuasion resides in rhetorical construction, a perspective inadequately addressed in current visualization re… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  42. arXiv:2507.11865  [pdf, ps, other

    cs.LG

    A Policy-Improved Deep Deterministic Policy Gradient Framework for the Discount Order Acceptance Strategy of Ride-hailing Drivers

    Authors: Hanwen Dai, Chang Gao, Fang He, Congyuan Ji, Yanni Yang

    Abstract: The rapid expansion of platform integration has emerged as an effective solution to mitigate market fragmentation by consolidating multiple ride-hailing platforms into a single application. To address heterogeneous passenger preferences, third-party integrators provide Discount Express service delivered by express drivers at lower trip fares. For the individual platform, encouraging broader partic… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  43. arXiv:2507.11841  [pdf, ps, other

    cs.HC cs.CY

    "Mapping What I Feel": Understanding Affective Geovisualization Design Through the Lens of People-Place Relationships

    Authors: Xingyu Lan, Yutong Yang, Yifan Wang

    Abstract: Affective visualization design is an emerging research direction focused on communicating and influencing emotion through visualization. However, as revealed by previous research, this area is highly interdisciplinary and involves theories and practices from diverse fields and disciplines, thus awaiting analysis from more fine-grained angles. To address this need, this work focuses on a pioneering… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  44. arXiv:2507.11597  [pdf

    cs.CY cs.AI cs.HC

    AI, Humans, and Data Science: Optimizing Roles Across Workflows and the Workforce

    Authors: Richard Timpone, Yongwei Yang

    Abstract: AI is transforming research. It is being leveraged to construct surveys, synthesize data, conduct analysis, and write summaries of the results. While the promise is to create efficiencies and increase quality, the reality is not always as clear cut. Leveraging our framework of Truth, Beauty, and Justice (TBJ) which we use to evaluate AI, machine learning and computational models for effective and… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: Paper prepared for the 2025 European Survey Research Association Conference; 30 pages, 5 tables and 4 figures

  45. arXiv:2507.11470  [pdf, ps, other

    cs.HC

    REVA: Supporting LLM-Generated Programming Feedback Validation at Scale Through User Attention-based Adaptation

    Authors: Xiaohang Tang, Sam Wong, Zicheng He, Yalong Yang, Yan Chen

    Abstract: This paper introduces REVA, a human-AI system that expedites instructor review of voluminous AI-generated programming feedback by sequencing submissions to minimize cognitive context shifts and propagating instructor-driven revisions across semantically similar instances. REVA introduces a novel approach to human-AI collaboration in educational feedback by adaptively learning from instructors' att… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  46. arXiv:2507.11352  [pdf, ps, other

    cs.AI cs.FL

    Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces

    Authors: Yunhao Yang, Neel P. Bhatt, Christian Ellis, Alvaro Velasquez, Zhangyang Wang, Ufuk Topcu

    Abstract: Logistics operators, from battlefield coordinators rerouting airlifts ahead of a storm to warehouse managers juggling late trucks, often face life-critical decisions that demand both domain expertise and rapid and continuous replanning. While popular methods like integer programming yield logistics plans that satisfy user-defined logical constraints, they are slow and assume an idealized mathemati… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  47. arXiv:2507.11097  [pdf, ps, other

    cs.CL

    The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

    Authors: Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang

    Abstract: Diffusion-based large language models (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs, offering faster inference and greater interactivity via parallel decoding and bidirectional modeling. However, despite strong performance in code generation and text infilling, we identify a fundamental safety concern: existing alignment mechanisms fail to safeguard dLLMs against c… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: 21 pages, 9 figures, work in progress

  48. arXiv:2507.10630  [pdf

    cs.AI cs.SE

    Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs

    Authors: Ye Yang, Xue Xiao, Ping Yin, Taotao Xie

    Abstract: API calls by large language models (LLMs) offer a cutting-edge approach for data analysis. However, their ability to effectively utilize tools via API calls remains underexplored in knowledge-intensive domains like meteorology. This paper introduces KG2data, a system that integrates knowledge graphs, LLMs, ReAct agents, and tool-use technologies to enable intelligent data acquisition and query han… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  49. arXiv:2507.10427  [pdf, ps, other

    cs.HC cs.RO

    Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads

    Authors: Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova

    Abstract: Socially Assistive Robotics (SAR) has shown promise in supporting emotion regulation for neurodivergent children. Recently, there has been increasing interest in leveraging advanced technologies to assist parents in co-regulating emotions with their children. However, limited research has explored the integration of large language models (LLMs) with SAR to facilitate emotion co-regulation between… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

    Comments: Submission for the IROS 2025 conference

  50. arXiv:2507.10313  [pdf

    cs.SD eess.AS

    DQLoRA: A Lightweight Domain-Aware Denoising ASR via Adapter-guided Distillation

    Authors: Yiru Yang

    Abstract: We present a demo of DQLoRA, an Adapter-Guided Distillation framework for robust speech recognition under low-resource and noisy conditions. Our method employs a frozen Whisper model as the teacher to provide semantic supervision, and a lightweight Wav2Vec2 student equipped with QLoRA-based Adapters. Training is conducted on the FLEURS dataset augmented with DNS-style noise. The student is optimiz… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.