这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 2,329 results for author: Xu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17688  [pdf, ps, other

    cs.HC cs.LG

    Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills

    Authors: Mohammad Nur Hossain Khan, David creswell, Jordan Albert, Patrick O'Connell, Shawn Fallon, Mathew Polowitz, Xuhai "orson" Xu, Bashima islam

    Abstract: Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill deve… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Accepted in Proc. ACM Interact. Mob. Wearable Ubiquitous Technology (IMWUT)

  2. arXiv:2507.17577  [pdf, ps, other

    cs.CV cs.CR cs.LG

    Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors

    Authors: Chen Ma, Xinjie Xu, Shuyu Cheng, Qi Xuan

    Abstract: One of the most practical and challenging types of black-box adversarial attacks is the hard-label attack, where only the top-1 predicted label is available. One effective approach is to search for the optimal ray direction from the benign image that minimizes the $\ell_p$-norm distance to the adversarial region. The unique advantage of this approach is that it transforms the hard-label attack int… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Published at ICLR 2025 (Spotlight paper)

    ACM Class: I.2.6; I.5.1; G.1.6

  3. arXiv:2507.17554  [pdf, ps, other

    cs.CV

    An h-space Based Adversarial Attack for Protection Against Few-shot Personalization

    Authors: Xide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu

    Abstract: The versatility of diffusion models in generating customized images from few samples raises significant privacy concerns, particularly regarding unauthorized modifications of private content. This concerning issue has renewed the efforts in developing protection mechanisms based on adversarial attacks, which generate effective perturbations to poison diffusion models. Our work is motivated by the… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: 32 pages, 15 figures. Accepted by ACM Multimedia 2025

  4. Exploring Spatial Diversity for Region-based Active Learning

    Authors: Lile Cai, Xun Xu, Lining Zhang, Chuan-Sheng Foo

    Abstract: State-of-the-art methods for semantic segmentation are based on deep neural networks trained on large-scale labeled datasets. Acquiring such datasets would incur large annotation costs, especially for dense pixel-level prediction tasks like semantic segmentation. We consider region-based active learning as a strategy to reduce annotation costs while maintaining high performance. In this setting, b… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: published in IEEE Transactions on Image Processing, 2021

  5. Exploring Active Learning for Semiconductor Defect Segmentation

    Authors: Lile Cai, Ramanpreet Singh Pahwa, Xun Xu, Jie Wang, Richard Chang, Lining Zhang, Chuan-Sheng Foo

    Abstract: The development of X-Ray microscopy (XRM) technology has enabled non-destructive inspection of semiconductor structures for defect identification. Deep learning is widely used as the state-of-the-art approach to perform visual analysis tasks. However, deep learning based models require large amount of annotated data to train. This can be time-consuming and expensive to obtain especially for dense… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: accepted to ICIP 2022

  6. arXiv:2507.17006  [pdf, ps, other

    quant-ph cs.CR math-ph

    Quantitative Quantum Soundness for Bipartite Compiled Bell Games via the Sequential NPA Hierarchy

    Authors: Igor Klep, Connor Paddock, Marc-Olivier Renou, Simon Schmidt, Lucas Tendick, Xiangling Xu, Yuming Zhao

    Abstract: Compiling Bell games under cryptographic assumptions replaces the need for physical separation, allowing nonlocality to be probed with a single untrusted device. While Kalai et al. (STOC'23) showed that this compilation preserves quantum advantages, its quantitative quantum soundness has remained an open problem. We address this gap with two primary contributions. First, we establish the first qua… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 41 pages, 1 figure; comments welcome. We refer to Cui, Falor, Natarajan, and Zhang for an independent parallel work on the same topic

  7. arXiv:2507.16779  [pdf, ps, other

    eess.IV cs.CV

    Improving U-Net Confidence on TEM Image Data with L2-Regularization, Transfer Learning, and Deep Fine-Tuning

    Authors: Aiden Ochoa, Xinyuan Xu, Xing Wang

    Abstract: With ever-increasing data volumes, it is essential to develop automated approaches for identifying nanoscale defects in transmission electron microscopy (TEM) images. However, compared to features in conventional photographs, nanoscale defects in TEM images exhibit far greater variation due to the complex contrast mechanisms and intricate defect structures. These challenges often result in much le… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: Accepted into the ICCV 2025 CV4MS Workshop

  8. arXiv:2507.16389  [pdf, ps, other

    cs.CV cs.AI

    From Flat to Round: Redefining Brain Decoding with Surface-Based fMRI and Cortex Structure

    Authors: Sijin Yu, Zijiao Chen, Wenxuan Wu, Shengxian Chen, Zhongliang Liu, Jingxin Nie, Xiaofen Xing, Xiangmin Xu, Xin Zhang

    Abstract: Reconstructing visual stimuli from human brain activity (e.g., fMRI) bridges neuroscience and computer vision by decoding neural representations. However, existing methods often overlook critical brain structure-function relationships, flattening spatial information and neglecting individual anatomical variations. To address these issues, we propose (1) a novel sphere tokenizer that explicitly mod… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 18 pages, 14 figures, ICCV Findings 2025

  9. arXiv:2507.16331  [pdf, ps, other

    cs.CL

    Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

    Authors: Chuanhao Yan, Fengdi Che, Xuhan Huang, Xu Xu, Xin Li, Yizhi Li, Xingwei Qu, Jingzhe Shi, Zhuangzhuang He, Chenghua Lin, Yaodong Yang, Binhang Yuan, Hang Zhao, Yu Qiao, Bowen Zhou, Jie Fu

    Abstract: Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither reliable nor scalable. In fact, the prevalent large proprietary models could hardly generate verifiable programs. A promising yet largely uncharted alternative is… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  10. arXiv:2507.16238  [pdf, ps, other

    cs.CV

    Positive Style Accumulation: A Style Screening and Continuous Utilization Framework for Federated DG-ReID

    Authors: Xin Xu, Chaoyue Ren, Wei Liu, Wenke Huang, Bin Yang, Zhixi Yu, Kui Jiang

    Abstract: The Federated Domain Generalization for Person re-identification (FedDG-ReID) aims to learn a global server model that can be effectively generalized to source and target domains through distributed source domain data. Existing methods mainly improve the diversity of samples through style transformation, which to some extent enhances the generalization performance of the model. However, we discove… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 10 pages, 3 figures, accepted at ACM MM 2025, Submission ID: 4394

    ACM Class: I.4.9; I.2.10

  11. arXiv:2507.15770  [pdf, ps, other

    cs.AI

    A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining

    Authors: Yifan Shen, Zihan Zhao, Xiao Xue, Yuwei Guo, Qun Ma, Deyu Zhou, Ming Zhang

    Abstract: With the rise of service computing, cloud computing, and IoT, service ecosystems are becoming increasingly complex. The intricate interactions among intelligent agents make abnormal emergence analysis challenging, as traditional causal methods focus on individual trajectories. Large language models offer new possibilities for Agent-Based Modeling (ABM) through Chain-of-Thought (CoT) reasoning to r… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  12. arXiv:2507.15205  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

    Authors: Xinran Li, Xiujuan Xu, Jiaqi Qiao

    Abstract: Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: Accepted by the 28th European Conference on Artificial Intelligence (ECAI 2025)

  13. arXiv:2507.15106  [pdf, ps, other

    cs.AI cs.RO

    From Kicking to Causality: Simulating Infant Agency Detection with a Robust Intrinsic Reward

    Authors: Xia Xu, Jochen Triesch

    Abstract: While human infants robustly discover their own causal efficacy, standard reinforcement learning agents remain brittle, as their reliance on correlation-based rewards fails in noisy, ecologically valid scenarios. To address this, we introduce the Causal Action Influence Score (CAIS), a novel intrinsic reward rooted in causal inference. CAIS quantifies an action's influence by measuring the 1-Wasse… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: 13 pages, 5 figures

    MSC Class: F.2.2

  14. PaperBridge: Crafting Research Narratives through Human-AI Co-Exploration

    Authors: Runhua Zhang, Yang Ouyang, Leixian Shen, Yuying Tang, Xiaojuan Ma, Huamin Qu, Xian Xu

    Abstract: Researchers frequently need to synthesize their own publications into coherent narratives that demonstrate their scholarly contributions. To suit diverse communication contexts, exploring alternative ways to organize one's work while maintaining coherence is particularly challenging, especially in interdisciplinary fields like HCI where individual researchers' publications may span diverse domains… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

    Comments: Conditionally accepted by UIST'25

  15. arXiv:2507.14503  [pdf, ps, other

    cs.LG cs.CV

    Generative Distribution Distillation

    Authors: Jiequan Cui, Beier Zhu, Qingshan Xu, Xiaogang Xu, Pengguang Chen, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

    Abstract: In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the \textit{Generative Distribution Distillation (GenDD)} framework. A naive \textit{GenDD} baseline encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a \textit{Split Tokenizatio… ▽ More

    Submitted 19 July, 2025; originally announced July 2025.

    Comments: Technique report

  16. arXiv:2507.13359  [pdf, ps, other

    cs.CV

    Open-Vocabulary Object Detection in UAV Imagery: A Review and Future Perspectives

    Authors: Yang Zhou, Junjie Li, CongYang Ou, Dawei Yan, Haokui Zhang, Xizhe Xue

    Abstract: Due to its extensive applications, aerial image object detection has long been a hot topic in computer vision. In recent years, advancements in Unmanned Aerial Vehicles (UAV) technology have further propelled this field to new heights, giving rise to a broader range of application requirements. However, traditional UAV aerial object detection methods primarily focus on detecting predefined categor… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 27 pages, 5 figures

  17. arXiv:2507.13123  [pdf, ps, other

    cs.SE

    Detecting LLM-generated Code with Subtle Modification by Adversarial Training

    Authors: Xin Yin, Xinrui Li, Chao Ni, Xiaodan Xu, Xiaohu Yang

    Abstract: With the rapid development of Large Language Models (LLMs), their powerful code-generation capabilities have been widely applied in tasks like code completion and automated development, demonstrating the value of improving coding efficiency. However, the extensive use of LLM-generated code also raises several new challenges. On the one hand, issues such as the regulation of code provenance, copyri… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  18. arXiv:2507.12952  [pdf, ps, other

    cs.CV

    LoViC: Efficient Long Video Generation with Context Compression

    Authors: Jiaxiu Jiang, Wenbo Li, Jingjing Ren, Yuping Qiu, Yong Guo, Xiaogang Xu, Han Wu, Wangmeng Zuo

    Abstract: Despite recent advances in diffusion transformers (DiTs) for text-to-video generation, scaling to long-duration content remains challenging due to the quadratic complexity of self-attention. While prior efforts -- such as sparse attention and temporally autoregressive models -- offer partial relief, they often compromise temporal coherence or scalability. We introduce LoViC, a DiT-based framework… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Project page: https://jiangjiaxiu.github.io/lovic/

  19. arXiv:2507.11761  [pdf, ps, other

    cs.CV cs.AI

    Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

    Authors: Fan Shi, Bin Li, Xiangyang Xue

    Abstract: Abstract visual reasoning (AVR) enables humans to quickly discover and generalize abstract rules to new scenarios. Designing intelligent systems with human-like AVR abilities has been a long-standing topic in the artificial intelligence community. Deep AVR solvers have recently achieved remarkable success in various AVR tasks. However, they usually use task-specific designs or parameters in differ… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  20. arXiv:2507.11261  [pdf, ps, other

    cs.CV

    ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

    Authors: Ronggang Huang, Haoxin Yang, Yan Cai, Xuemiao Xu, Huaidong Zhang, Shengfeng He

    Abstract: 3D visual grounding aims to identify and localize objects in a 3D space based on textual descriptions. However, existing methods struggle with disentangling targets from anchors in complex multi-anchor queries and resolving inconsistencies in spatial descriptions caused by perspective variations. To tackle these challenges, we propose ViewSRD, a framework that formulates 3D visual grounding as a s… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  21. arXiv:2507.09094  [pdf, ps, other

    cs.NI eess.SP

    Transformer based Collaborative Reinforcement Learning for Fluid Antenna System (FAS)-enabled 3D UAV Positioning

    Authors: Xiaoren Xu, Hao Xu, Dongyu Wei, Walid Saad, Mehdi Bennis, Mingzhe Chen

    Abstract: In this paper, a novel Three dimensional (3D) positioning framework of fluid antenna system (FAS)-enabled unmanned aerial vehicles (UAVs) is developed. In the proposed framework, a set of controlled UAVs cooperatively estimate the real-time 3D position of a target UAV. Here, the active UAV transmits a measurement signal to the passive UAVs via the reflection from the target UAV. Each passive UAV e… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  22. arXiv:2507.09076  [pdf, ps, other

    cs.CL cs.AI

    Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation

    Authors: Jialong Mai, Xiaofen Xing, Yawei Li, Zhipeng Li, Jingyuan Xing, Xiangmin Xu

    Abstract: Recent research has focused on applying speech large language model (SLLM) to improve speech emotion recognition (SER). However, the inherently high frame rate in speech modality severely limits the signal processing and understanding capabilities of SLLM. For example, a SLLM with a 4K context window can only process 80 seconds of audio at 50Hz feature sampling rate before reaching its capacity li… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: submitted to EMNLP 2025

    MSC Class: 68T50 ACM Class: I.2.7; H.5.2

  23. arXiv:2507.08648  [pdf, ps, other

    cs.CV cs.AI

    DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images

    Authors: Haoran Sun, Haoyu Bian, Shaoning Zeng, Yunbo Rao, Xu Xu, Lin Mei, Jianping Gou

    Abstract: Common knowledge indicates that the process of constructing image datasets usually depends on the time-intensive and inefficient method of manual collection and annotation. Large models offer a solution via data generation. Nonetheless, real-world data are obviously more valuable comparing to artificially intelligence generated data, particularly in constructing image datasets. For this reason, we… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  24. arXiv:2507.08403  [pdf, ps, other

    cs.NI cs.AI cs.DC cs.LG eess.SY

    Towards AI-Native RAN: An Operator's Perspective of 6G Day 1 Standardization

    Authors: Nan Li, Qi Sun, Lehan Wang, Xiaofei Xu, Jinri Huang, Chunhui Liu, Jing Gao, Yuhong Huang, Chih-Lin I

    Abstract: Artificial Intelligence/Machine Learning (AI/ML) has become the most certain and prominent feature of 6G mobile networks. Unlike 5G, where AI/ML was not natively integrated but rather an add-on feature over existing architecture, 6G shall incorporate AI from the onset to address its complexity and support ubiquitous AI applications. Based on our extensive mobile network operation and standardizati… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  25. arXiv:2507.07323  [pdf, ps, other

    cs.LG

    Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning

    Authors: Dongyu Wei, Xiaoren Xu, Yuchen Liu, H. Vincent Poor, Mingzhe Chen

    Abstract: In this paper, deceptive signal-assisted private split learning is investigated. In our model, several edge devices jointly perform collaborative training, and some eavesdroppers aim to collect the model and data information from devices. To prevent the eavesdroppers from collecting model and data information, a subset of devices can transmit deceptive signals. Therefore, it is necessary to determ… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  26. arXiv:2507.07320  [pdf, ps, other

    cs.LG

    Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy

    Authors: Dongyu Wei, Xiaoren Xu, Shiwen Mao, Mingzhe Chen

    Abstract: In this paper, a secure and communication-efficient clustered federated learning (CFL) design is proposed. In our model, several base stations (BSs) with heterogeneous task-handling capabilities and multiple users with non-independent and identically distributed (non-IID) data jointly perform CFL training incorporating differential privacy (DP) techniques. Since each BS can process only a subset o… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  27. arXiv:2507.07257  [pdf, ps, other

    cs.AI astro-ph.IM cs.CL cs.MA

    Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

    Authors: Licong Xu, Milind Sarkar, Anto I. Lonappan, Íñigo Zubeldia, Pablo Villanueva-Domingo, Santiago Casas, Christian Fidler, Chetana Amancharla, Ujjwal Tiwari, Adrian Bayer, Chadi Ait Ekioui, Miles Cranmer, Adrian Dimitrov, James Fergusson, Kahaan Gandhi, Sven Krippendorf, Andrew Laverick, Julien Lesgourgues, Antony Lewis, Thomas Meier, Blake Sherwin, Kristen Surrao, Francisco Villaescusa-Navarro, Chi Wang, Xueqing Xu , et al. (1 additional authors not shown)

    Abstract: We present a multi-agent system for automation of scientific research tasks, cmbagent (https://github.com/CMBAgents/cmbagent). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific paper… ▽ More

    Submitted 11 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

    Comments: Accepted contribution to the ICML 2025 Workshop on Machine Learning for Astrophysics. Code: https://github.com/CMBAgents/cmbagent Videos: https://www.youtube.com/@cmbagent HuggingFace: https://huggingface.co/spaces/astropilot-ai/cmbagent Cloud: https://cmbagent.cloud

  28. arXiv:2507.07155  [pdf, ps, other

    astro-ph.IM astro-ph.CO cs.AI

    Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics

    Authors: Xueqing Xu, Boris Bolliet, Adrian Dimitrov, Andrew Laverick, Francisco Villaescusa-Navarro, Licong Xu, Íñigo Zubeldia

    Abstract: We evaluate 9 Retrieval Augmented Generation (RAG) agent configurations on 105 Cosmology Question-Answer (QA) pairs that we built specifically for this purpose.The RAG configurations are manually evaluated by a human expert, that is, a total of 945 generated answers were assessed. We find that currently the best RAG agent configuration is with OpenAI embedding and generative model, yielding 91.4\%… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Accepted contribution (spotlight) to the ICML 2025 Workshop on Machine Learning for Astrophysics; codes: https://huggingface.co/datasets/ASTROANTS/CosmoPaperQA, https://github.com/CMBAgents/cmbagent, https://github.com/CMBAgents/scirag

  29. arXiv:2507.06719  [pdf, ps, other

    cs.CV cs.RO

    A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding

    Authors: Zhenyang Liu, Sixiao Zheng, Siyu Chen, Cairong Zhao, Longfei Liang, Xiangyang Xue, Yanwei Fu

    Abstract: Open-vocabulary 3D visual grounding aims to localize target objects based on free-form language queries, which is crucial for embodied AI applications such as autonomous navigation, robotics, and augmented reality. Learning 3D language fields through neural representations enables accurate understanding of 3D scenes from limited viewpoints and facilitates the localization of target objects in comp… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  30. arXiv:2507.06710  [pdf, ps, other

    cs.RO

    Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

    Authors: Zhenyang Liu, Yikai Wang, Kuanning Wang, Longfei Liang, Xiangyang Xue, Yanwei Fu

    Abstract: Visual imitation learning is effective for robots to learn versatile tasks. However, many existing methods rely on behavior cloning with supervised historical trajectories, limiting their 3D spatial and 4D spatiotemporal awareness. Consequently, these methods struggle to capture the 3D structures and 4D spatiotemporal relationships necessary for real-world deployment. In this work, we propose 4D D… ▽ More

    Submitted 13 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

  31. arXiv:2507.06510  [pdf, ps, other

    cs.CV

    Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection

    Authors: Yupeng Hu, Changxing Ding, Chang Sun, Shaoli Huang, Xiangmin Xu

    Abstract: Open vocabulary Human-Object Interaction (HOI) detection is a challenging task that detects all <human, verb, object> triplets of interest in an image, even those that are not pre-defined in the training set. Existing approaches typically rely on output features generated by large Vision-Language Models (VLMs) to enhance the generalization ability of interaction representations. However, the visua… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: ICCV 2025

  32. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  33. arXiv:2507.05577  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA

    Authors: Shashank Verma, Fengyi Jiang, Xiangning Xue

    Abstract: Biomedical semantic question answering rooted in information retrieval can play a crucial role in keeping up to date with vast, rapidly evolving and ever-growing biomedical literature. A robust system can help researchers, healthcare professionals and even layman users access relevant knowledge grounded in evidence. The BioASQ 2025 Task13b Challenge serves as an important benchmark, offering a com… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Paper submitted to CLEF 2025 CEUR-WS

  34. arXiv:2507.05268  [pdf, ps, other

    q-bio.NC cs.CV eess.SY

    Cross-Subject DD: A Cross-Subject Brain-Computer Interface Algorithm

    Authors: Xiaoyuan Li, Xinru Xue, Bohan Zhang, Ye Sun, Shoushuo Xi, Gang Liu

    Abstract: Brain-computer interface (BCI) based on motor imagery (MI) enables direct control of external devices by decoding the electroencephalogram (EEG) generated in the brain during imagined movements. However, due to inter-individual variability in brain activity, existing BCI models exhibit poor adaptability across subjects, thereby limiting their generalizability and widespread application. To address… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 20 pages, 9 figures

  35. arXiv:2507.05260  [pdf, ps, other

    cs.CV cs.LG cs.RO

    Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

    Authors: Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu

    Abstract: LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory A… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: ICCV 2025; 26 pages, 12 figures, 10 tables; Code at http://github.com/Xiangxu-0103/LiMA

  36. arXiv:2507.04451  [pdf, ps, other

    cs.CV

    CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

    Authors: Zheyuan Liu, Munan Ning, Qihui Zhang, Shuo Yang, Zhongrui Wang, Yiwei Yang, Xianzhe Xu, Yibing Song, Weihua Chen, Fan Wang, Li Yuan

    Abstract: Current text-to-image (T2I) generation models struggle to align spatial composition with the input text, especially in complex scenes. Even layout-based approaches yield suboptimal spatial control, as their generation process is decoupled from layout planning, making it difficult to refine the layout during synthesis. We present CoT-Diff, a framework that brings step-by-step CoT-style reasoning in… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  37. arXiv:2507.03407  [pdf

    cs.AI q-bio.QM

    Artificial intelligence in drug discovery: A comprehensive review with a case study on hyperuricemia, gout arthritis, and hyperuricemic nephropathy

    Authors: Junwei Su, Cheng Xin, Ao Shang, Shan Wu, Zhenzhen Xie, Ruogu Xiong, Xiaoyu Xu, Cheng Zhang, Guang Chen, Yau-Tuen Chan, Guoyi Tang, Ning Wang, Yong Xu, Yibin Feng

    Abstract: This paper systematically reviews recent advances in artificial intelligence (AI), with a particular focus on machine learning (ML), across the entire drug discovery pipeline. Due to the inherent complexity, escalating costs, prolonged timelines, and high failure rates of traditional drug discovery methods, there is a critical need to comprehensively understand how AI/ML can be effectively integra… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  38. arXiv:2507.02479  [pdf, ps, other

    cs.CV cs.AI

    CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

    Authors: Teng Fu, Yuwen Chen, Zhuofan Chen, Mengyang Zhao, Bin Li, Xiangyang Xue

    Abstract: Multi-object tracking is a classic field in computer vision. Among them, pedestrian tracking has extremely high application value and has become the most popular research category. Existing methods mainly use motion or appearance information for tracking, which is often difficult in complex scenarios. For the motion information, mutual occlusions between objects often prevent updating of the motio… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  39. arXiv:2507.02373  [pdf, ps, other

    cs.CV

    UVLM: Benchmarking Video Language Model for Underwater World Understanding

    Authors: Xizhe Xue, Yang Zhou, Dawei Yan, Ying Li, Haokui Zhang, Rong Xiao

    Abstract: Recently, the remarkable success of large language models (LLMs) has achieved a profound impact on the field of artificial intelligence. Numerous advanced works based on LLMs have been proposed and applied in various scenarios. Among them, video language models (VidLMs) are particularly widely used. However, existing works primarily focus on terrestrial scenarios, overlooking the highly demanding… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 13 pages, 4 figures, 3 tables

  40. arXiv:2507.02057  [pdf, ps, other

    cs.CR cs.AI

    MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation

    Authors: Lu Yan, Zhuo Zhang, Xiangzhe Xu, Shengwei An, Guangyu Shen, Zhou Xuan, Xuan Chen, Xiangyu Zhang

    Abstract: Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individ… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  41. arXiv:2507.02029  [pdf, ps, other

    cs.RO

    RoboBrain 2.0 Technical Report

    Authors: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Mengfei Du, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Junkai Zhao, Xiaojie Zhang, Shanyu Rong, Huaihai Lyu, Zhengliang Cai , et al. (27 additional authors not shown)

    Abstract: We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain… ▽ More

    Submitted 14 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  42. arXiv:2507.01485  [pdf, ps, other

    cs.RO cs.AI cs.MA q-bio.QM

    BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

    Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), a… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  43. arXiv:2507.01424  [pdf, ps, other

    cs.RO

    TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control

    Authors: Zhenyang Liu, Yongchong Gu, Sixiao Zheng, Xiangyang Xue, Yanwei Fu

    Abstract: Recent advancements in vision-language models (VLMs) for common-sense reasoning have led to the development of vision-language-action (VLA) models, enabling robots to perform generalized manipulation. Although existing autoregressive VLA methods design a specific architecture like dual-system to leverage large-scale pretrained knowledge, they tend to capture static information, often neglecting th… ▽ More

    Submitted 3 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  44. arXiv:2507.01376  [pdf, ps, other

    cs.AI

    AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing

    Authors: Yinwang Ren, Yangyang Liu, Tang Ji, Xun Xu

    Abstract: AI agents are autonomous systems designed to perceive, reason, and act within dynamic environments. With the rapid advancements in generative AI (GenAI), large language models (LLMs) and multimodal large language models (MLLMs) have significantly improved AI agents' capabilities in semantic comprehension, complex reasoning, and autonomous decision-making. At the same time, the rise of Agentic AI h… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Submitted to JMS(March 2025)

  45. arXiv:2507.00432  [pdf, ps, other

    cs.AI cs.CL

    Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

    Authors: Maggie Huan, Yuetai Li, Tuney Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue

    Abstract: Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting? To answer this question, we evaluate over 20 open-weight reasoning-tu… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  46. arXiv:2507.00401  [pdf, ps, other

    cs.CV cs.LG

    Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains

    Authors: Xin Xu, Eibe Frank, Geoffrey Holmes

    Abstract: We investigate cross-domain few-shot learning under the constraint that fine-tuning of backbones (i.e., feature extractors) is impossible or infeasible -- a scenario that is increasingly common in practical use cases. Handling the low-quality and static embeddings produced by frozen, "black-box" backbones leads to a problem representation of few-shot classification as a series of multiple instance… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  47. arXiv:2507.00389  [pdf, ps, other

    cs.CL

    Causal Prompting for Implicit Sentiment Analysis with Large Language Models

    Authors: Jing Ren, Wenhao Zhou, Bowen Li, Mujie Liu, Nguyen Linh Dan Le, Jiade Cen, Liping Chen, Ziqi Xu, Xiwei Xu, Xiaodong Li

    Abstract: Implicit Sentiment Analysis (ISA) aims to infer sentiment that is implied rather than explicitly stated, requiring models to perform deeper reasoning over subtle contextual cues. While recent prompting-based methods using Large Language Models (LLMs) have shown promise in ISA, they often rely on majority voting over chain-of-thought (CoT) reasoning paths without evaluating their causal validity, m… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  48. arXiv:2507.00042  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Catastrophic Forgetting Mitigation via Discrepancy-Weighted Experience Replay

    Authors: Xinrun Xu, Jianwen Yang, Qiuhong Zhang, Zhanbiao Lian, Zhiming Ding, Shan Jiang

    Abstract: Continually adapting edge models in cloud-edge collaborative object detection for traffic monitoring suffers from catastrophic forgetting, where models lose previously learned knowledge when adapting to new data distributions. This is especially problematic in dynamic traffic environments characterised by periodic variations (e.g., day/night, peak hours), where past knowledge remains valuable. Exi… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

    Comments: ICANN 2025

  49. arXiv:2506.24005  [pdf, ps, other

    cs.LG

    Provably Efficient and Agile Randomized Q-Learning

    Authors: He Wang, Xingyu Xu, Yuejie Chi

    Abstract: While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing provable algorithms either suffer from computational intractability or rely on stage-wise policy updates which reduce responsiveness and slow down the learning p… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  50. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025