这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 617 results for author: Yan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15550  [pdf, ps, other

    cs.RO

    UltraDP: Generalizable Carotid Ultrasound Scanning with Force-Aware Diffusion Policy

    Authors: Ruoqu Chen, Xiangjie Yan, Kangchen Lv, Gao Huang, Zheng Li, Xiang Li

    Abstract: Ultrasound scanning is a critical imaging technique for real-time, non-invasive diagnostics. However, variations in patient anatomy and complex human-in-the-loop interactions pose significant challenges for autonomous robotic scanning. Existing ultrasound scanning robots are commonly limited to relatively low generalization and inefficient data utilization. To overcome these limitations, we presen… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  2. arXiv:2511.15203  [pdf, ps, other

    cs.CR cs.AI

    Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

    Authors: Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang

    Abstract: Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (S… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  3. arXiv:2511.14227  [pdf, ps, other

    cs.AI cs.LG

    DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home

    Authors: Yuxiang Wang, Siwen Wang, Haowei Han, Ao Wang, Boya Liu, Yong Zhao, Chengbo Wu, Bin Zhu, Bin Qin, Xiaokai Zhou, Xiao Yan, Jiawei Jiang, Bo Du

    Abstract: Operation recommendation for IoT devices refers to generating personalized device operations for users based on their context, such as historical operations, environment information, and device status. This task is crucial for enhancing user satisfaction and corporate profits. Existing recommendation models struggle with complex operation logic, diverse user preferences, and sensitive to suboptima… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  4. arXiv:2511.11587  [pdf

    cs.HC cs.AI cs.CE cs.GR cs.MA

    MedBuild AI: An Agent-Based Hybrid Intelligence Framework for Reshaping Agency in Healthcare Infrastructure Planning through Generative Design for Medical Architecture

    Authors: Yiming Zhang, Yuejia Xu, Ziyao Wang, Xin Yan, Xiaosai Hao

    Abstract: Globally, disparities in healthcare infrastructure remain stark, leaving countless communities without access to even basic services. Traditional infrastructure planning is often slow and inaccessible, and although many architects are actively delivering humanitarian and aid-driven hospital projects worldwide, these vital efforts still fall far short of the sheer scale and urgency of demand. This… ▽ More

    Submitted 18 November, 2025; v1 submitted 17 October, 2025; originally announced November 2025.

    Comments: 24 pages, 16 figures. Submitted to the IJAC Special Issue "Rebalance and Reciprocity"

    MSC Class: 68T07; 68T40 ACM Class: I.2.10; J.2

  5. arXiv:2511.11406  [pdf, ps, other

    cs.CV

    Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis

    Authors: Feng-Qi Cui, Jinyang Huang, Ziyu Jia, Xinyu Li, Xin Yan, Xiaokang Zhou, Meng Wang

    Abstract: Video-based Affective Computing (VAC), vital for emotion analysis and human-computer interaction, suffers from model instability and representational degradation due to complex emotional dynamics. Since the meaning of different emotional fluctuations may differ under different emotional contexts, the core limitation is the lack of a hierarchical structural mechanism to disentangle distinct affecti… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  6. arXiv:2511.11380  [pdf, ps, other

    cs.LG

    When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering

    Authors: Jiangkai Long, Yanran Zhu, Chang Tang, Kun Sun, Yuanyuan Liu, Xuesong Yan

    Abstract: Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation,… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: AAAI'2026 poster paper. 12 pages, 8 figures

  7. arXiv:2511.09262  [pdf, ps, other

    cs.DB cs.DC

    CheetahGIS: Architecting a Scalable and Efficient Streaming Spatial Query Processing System

    Authors: Jiaping Cao, Ting Sun, Man Lung Yiu, Xiao Yan, Bo Tang

    Abstract: Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful F… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  8. arXiv:2511.08704  [pdf, ps, other

    cs.CV cs.LG

    Rethinking generative image pretraining: How far are we from scaling up next-pixel prediction?

    Authors: Xinchen Yan, Chen Liang, Lijun Yu, Adams Wei Yu, Yifeng Lu, Quoc V. Le

    Abstract: This paper investigates the scaling properties of autoregressive next-pixel prediction, a simple, end-to-end yet under-explored framework for unified vision models. Starting with images at resolutions of 32x32, we train a family of Transformers using IsoFlops profiles across compute budgets up to 7e19 FLOPs and evaluate three distinct target metrics: next-pixel prediction objective, ImageNet class… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  9. arXiv:2511.06048  [pdf, ps, other

    cs.CL cs.LG

    Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts

    Authors: Xinyuan Yan, Shusen Liu, Kowshik Thopalli, Bei Wang

    Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensio… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: 8 pages (5 main paper+3 refernce), 2 figures, pulished at Mechanistic Interpretability Workshop at NeurIPS 2025

  10. arXiv:2511.03410  [pdf, ps, other

    cs.CL

    Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG

    Authors: Longpeng Qiu, Ting Li, Shuai Mao, Nan Yang, Xiaohui Yan

    Abstract: Input errors in question-answering (QA) systems often lead to incorrect responses. Large language models (LLMs) struggle with this task, frequently failing to interpret user intent (misinterpretation) or unnecessarily altering the original question's structure (over-correction). We propose QuestionRAG, a framework that tackles these problems. To address misinterpretation, it enriches the input wit… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

    Comments: EMNLP2025 Industry Track

  11. arXiv:2510.27243  [pdf, ps, other

    cs.DB

    Approximate Diverse $k$-nearest Neighbor Search in Vector Database

    Authors: Jiachen Zhao, Xiao Yan, Eric Lo

    Abstract: Approximate $k$-nearest neighbor search (A$k$-NNS) is a core operation in vector databases, underpinning applications such as retrieval-augmented generation (RAG) and image retrieval. In these scenarios, users often prefer diverse result sets to minimize redundancy and enhance information value. However, existing greedy-based diverse methods frequently yield sub-optimal results, failing to adequat… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

  12. arXiv:2510.27141  [pdf, ps, other

    cs.DB cs.IR

    Compass: General Filtered Search across Vector and Structured Data

    Authors: Chunxiao Ye, Xiao Yan, Eric Lo

    Abstract: The increasing prevalence of hybrid vector and relational data necessitates efficient, general support for queries that combine high-dimensional vector search with complex relational filtering. However, existing filtered search solutions are fundamentally limited by specialized indices, which restrict arbitrary filtering and hinder integration with general-purpose DBMSs. This work introduces \text… ▽ More

    Submitted 11 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  13. arXiv:2510.27077  [pdf

    cs.CL

    Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models

    Authors: Jiasen Zheng, Huajun Zhang, Xu Yan, Ran Hao, Chong Peng

    Abstract: This paper addresses the limitations of large-scale language models in safety alignment and robustness by proposing a fine-tuning method that combines contrastive distillation with noise-robust training. The method freezes the backbone model and transfers the knowledge boundaries of the teacher model to the student model through distillation, thereby improving semantic consistency and alignment ac… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

  14. arXiv:2510.26996  [pdf, ps, other

    cs.CV

    MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation

    Authors: Arghavan Rezvani, Xiangyi Yan, Anthony T. Wu, Kun Han, Pooya Khosravi, Xiaohui Xie

    Abstract: In this study, we propose MoME, a Mixture of Visual Language Medical Experts, for Medical Image Segmentation. MoME adapts the successful Mixture of Experts (MoE) paradigm, widely used in Large Language Models (LLMs), for medical vision-language tasks. The architecture enables dynamic expert selection by effectively utilizing multi-scale visual features tailored to the intricacies of medical imager… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

  15. arXiv:2510.20952  [pdf, ps, other

    cs.LG

    LLM-Integrated Bayesian State Space Models for Multimodal Time-Series Forecasting

    Authors: Sungjun Cho, Changho Shin, Suenggwan Jo, Xinya Yan, Shourjo Aditya Chaudhuri, Frederic Sala

    Abstract: Forecasting in the real world requires integrating structured time-series data with unstructured textual information, but existing methods are architecturally limited by fixed input/output horizons and are unable to model or quantify uncertainty. We address this challenge by introducing LLM-integrated Bayesian State space models (LBS), a novel probabilistic framework for multimodal temporal foreca… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: 15 pages, 8 figures

  16. arXiv:2510.20162  [pdf, ps, other

    cs.CV

    TOMCAT: Test-time Comprehensive Knowledge Accumulation for Compositional Zero-Shot Learning

    Authors: Xudong Yan, Songhe Feng

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize novel attribute-object compositions based on the knowledge learned from seen ones. Existing methods suffer from performance degradation caused by the distribution shift of label space at test time, which stems from the inclusion of unseen compositions recombined from attributes and objects. To overcome the challenge, we propose a novel appr… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: Accepted to NeurIPS 2025

  17. arXiv:2510.18855  [pdf, ps, other

    cs.CL cs.AI

    Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

    Authors: Ling Team, Anqi Shen, Baihui Li, Bin Hu, Bin Jing, Cai Chen, Chao Huang, Chao Zhang, Chaokun Yang, Cheng Lin, Chengyao Wen, Congqi Li, Deng Zhao, Dingbo Yuan, Donghai You, Fagui Mao, Fanzhuang Meng, Feng Xu, Guojie Li, Guowei Wang, Hao Dai, Haonan Zheng, Hong Liu, Jia Guo, Jiaming Liu , et al. (79 additional authors not shown)

    Abstract: We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To… ▽ More

    Submitted 25 October, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

    Comments: Technical Report

  18. arXiv:2510.18342  [pdf, ps, other

    cs.AI

    ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection

    Authors: Peng Tang, Xiaoxiao Yan, Xiaobin Hu, Yuning Cui, Donghao Luo, Jiangning Zhang, Pengcheng Xu, Jinlong Peng, Qingdong He, Feiyue Huang, Song Xue, Tobias Lasser

    Abstract: Multi-class unsupervised anomaly detection (MUAD) has garnered growing research interest, as it seeks to develop a unified model for anomaly detection across multiple classes, i.e., eliminating the need to train separate models for distinct objects and thereby saving substantial computational resources. Under the MUAD setting, while advanced Transformer-based architectures have brought significant… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

    Comments: Under Review

  19. arXiv:2510.15162  [pdf, ps, other

    cs.CV cs.CL

    Train a Unified Multimodal Data Quality Classifier with Synthetic Data

    Authors: Weizhi Wang, Rongmei Lin, Shiyang Li, Colin Lockard, Ritesh Sarkhel, Sanket Lokegaonkar, Jingbo Shang, Xifeng Yan, Nasser Zalmout, Xian Li

    Abstract: The Multimodal Large Language Models (MLLMs) are continually pre-trained on a mixture of image-text caption data and interleaved document data, while the high-quality data filtering towards image-text interleaved document data is under-explored. We propose to train an efficient MLLM as a Unified Mulitmodal Data Quality Classifier to Filter both high-quality image-text caption and interleaved data… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

    Comments: EMNLP 2025 Findings

  20. arXiv:2510.13139  [pdf, ps, other

    cs.CY cs.CE cs.CL cs.MA

    Addressing the alignment problem in transportation policy making: an LLM approach

    Authors: Xiaoyu Yan, Tianxing Dai, Yu Marco Nie

    Abstract: A key challenge in transportation planning is that the collective preferences of heterogeneous travelers often diverge from the policies produced by model-driven decision tools. This misalignment frequently results in implementation delays or failures. Here, we investigate whether large language models (LLMs), noted for their capabilities in reasoning and simulating human decision-making, can help… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

  21. arXiv:2510.12157  [pdf, ps, other

    cs.LG

    Self-Verifying Reflection Helps Transformers with CoT Reasoning

    Authors: Zhongwei Yu, Wannian Xia, Xue Yan, Bo Xu, Haifeng Zhang, Yali Du, Jun Wang

    Abstract: Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning fr… ▽ More

    Submitted 14 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS2025

  22. arXiv:2510.11958  [pdf, ps, other

    cs.CL cs.AI

    Direct Multi-Token Decoding

    Authors: Xuan Luo, Weizhi Wang, Xifeng Yan

    Abstract: Decoder-only transformers have become the standard architecture for large language models (LLMs) due to their strong performance. Recent studies suggest that, in pre-trained LLMs, early, middle, and late layers may serve distinct roles: Early layers focus on understanding the input context, middle layers handle task-specific processing, and late layers convert abstract representations into output… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  23. arXiv:2510.08521  [pdf, ps, other

    cs.AI

    FlowSearch: Advancing deep research with dynamic structured knowledge flow

    Authors: Yusong Hu, Runmin Ma, Yue Fan, Jinxin Shi, Zongsheng Cao, Yuhao Zhou, Jiakang Yuan, Xiangchao Yan, Wenlong Zhang, Lei Bai, Bo Zhang

    Abstract: Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to dri… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  24. arXiv:2510.08511  [pdf, ps, other

    cs.AI cs.CL cs.LG

    AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents

    Authors: Shangheng Du, Xiangchao Yan, Dengyang Jiang, Jiakang Yuan, Yusong Hu, Xin Li, Liang He, Bo Zhang, Lei Bai

    Abstract: Large language models (LLMs) have shown impressive performance in general programming tasks. However, in Machine Learning Engineering (MLE) scenarios such as AutoML and Kaggle competitions, achieving high performance depends heavily on expert intervention and repeated adjustments rather than simply generating correct code. When applied directly to these tasks, LLMs often lack fine-grained domain p… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  25. arXiv:2510.07905  [pdf, ps, other

    eess.IV cs.CV cs.MM

    SatFusion: A Unified Framework for Enhancing Satellite IoT Images via Multi-Temporal and Multi-Source Data Fusion

    Authors: Yufei Tong, Guanjie Cheng, Peihan Wu, Yicheng Zhu, Kexu Lu, Feiyi Chen, Meng Xi, Junqin Huang, Xueqiang Yan, Junfan Wang, Shuiguang Deng

    Abstract: With the rapid advancement of the digital society, the proliferation of satellites in the Satellite Internet of Things (Sat-IoT) has led to the continuous accumulation of large-scale multi-temporal and multi-source images across diverse application scenarios. However, existing methods fail to fully exploit the complementary information embedded in both temporal and source dimensions. For example,… ▽ More

    Submitted 4 November, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

  26. arXiv:2510.05975  [pdf, ps, other

    cs.DS

    Fast-Convergent Proximity Graphs for Approximate Nearest Neighbor Search

    Authors: Binhong Li, Xiao Yan, Shangqi Lu

    Abstract: Approximate nearest neighbor (ANN) search in high-dimensional metric spaces is a fundamental problem with many applications. Over the past decade, proximity graph (PG)-based indexes have demonstrated superior empirical performance over alternatives. However, these methods often lack theoretical guarantees regarding the quality of query results, especially in the worst-case scenarios. In this paper… ▽ More

    Submitted 13 October, 2025; v1 submitted 7 October, 2025; originally announced October 2025.

    Comments: Accepted to ACM SIGMOD 2026. This is the preprint version before camera-ready submission

  27. arXiv:2510.03274  [pdf, ps, other

    cs.LG cs.AI

    Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models

    Authors: Tianao Zhang, Zhiteng Li, Xianglong Yan, Haotong Qin, Yong Guo, Yulun Zhang

    Abstract: Diffusion large language models (dLLMs), which offer bidirectional context and flexible masked-denoising generation, are emerging as a compelling alternative to autoregressive (AR) LLMs. However, like AR LLMs, their model sizes continue to grow, motivating weight compression for deployment. Although post-training quantization (PTQ) is effective for AR LLMs, directly transferring it to dLLMs at 2-b… ▽ More

    Submitted 27 September, 2025; originally announced October 2025.

  28. arXiv:2510.03267  [pdf, ps, other

    cs.LG cs.AI

    PT$^2$-LLM: Post-Training Ternarization for Large Language Models

    Authors: Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Kaicheng Yang, Haotong Qin, Ruobing Xie, Xingwu Sun, Yulun Zhang

    Abstract: Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering substantial size reduction and high computational efficiency. However, its potential in the post-training quantization (PTQ) setting remains underexplored, due to the c… ▽ More

    Submitted 26 September, 2025; originally announced October 2025.

  29. arXiv:2509.26340  [pdf, ps, other

    cs.LG

    Memory-Driven Self-Improvement for Decision Making with Large Language Models

    Authors: Xue Yan, Zijing Ou, Mengyue Yang, Yan Song, Haifeng Zhang, Yingzhen Li, Jun Wang

    Abstract: Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memo… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  30. arXiv:2509.25538  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci cs.AI

    Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization

    Authors: Marcus Schwarting, Logan Ward, Nathaniel Hudson, Xiaoli Yan, Ben Blaiszik, Santanu Chaudhuri, Eliu Huerta, Ian Foster

    Abstract: Generative AI poses both opportunities and risks for solving inverse design problems in the sciences. Generative tools provide the ability to expand and refine a search space autonomously, but do so at the cost of exploring low-quality regions until sufficiently fine tuned. Here, we propose a queue prioritization algorithm that combines generative modeling and active learning in the context of a d… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  31. arXiv:2509.23582  [pdf, ps, other

    cs.CV

    RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

    Authors: Kaicheng Yang, Xun Zhang, Haotong Qin, Yucheng Lin, Kaisen Yang, Xianglong Yan, Yulun Zhang

    Abstract: Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for image generation, demonstrating superior scalability and performance over U-Net architectures. However, their practical deployment is hindered by substantial computational and memory costs. While Quantization-Aware Training (QAT) has shown promise for U-Nets, its application to DiTs faces unique challenges, primarily du… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: The code and models will be available at https://github.com/racoonykc/RobuQ

  32. arXiv:2509.22393  [pdf, ps, other

    cs.CV

    Text Adversarial Attacks with Dynamic Outputs

    Authors: Wenqiang Wang, Siyuan Liang, Xiao Yan, Xiaochun Cao

    Abstract: Text adversarial attack methods are typically designed for static scenarios with fixed numbers of output labels and a predefined label space, relying on extensive querying of the victim model (query-based attacks) or the surrogate model (transfer-based attacks). To address this gap, we introduce the Textual Dynamic Outputs Attack (TDOA) method, which employs a clustering-based surrogate model trai… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  33. arXiv:2509.20427  [pdf, ps, other

    cs.CV

    Seedream 4.0: Toward Next-generation Multimodal Image Generation

    Authors: Team Seedream, :, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, Zhiyao Guo, Xiaoxia Hou, Weilin Huang, Yixuan Huang, Xiaowen Jian, Huafeng Kuang, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yanzuo Lu, Zhengxiong Luo, Tongtong Ou, Guang Shi, Yichun Shi , et al. (26 additional authors not shown)

    Abstract: We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient diffusion transformer with a powerful VAE which also can reduce the number of image tokens considerably. This allows for efficient training of our model, and en… ▽ More

    Submitted 28 September, 2025; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: Seedream 4.0 Technical Report

  34. arXiv:2509.19628  [pdf, ps, other

    cs.CE cs.CL q-fin.CP

    Multimodal Language Models with Modality-Specific Experts for Financial Forecasting from Interleaved Sequences of Text and Time Series

    Authors: Ross Koval, Nicholas Andrews, Xifeng Yan

    Abstract: Text and time series data offer complementary views of financial markets: news articles provide narrative context about company events, while stock prices reflect how markets react to those events. However, despite their complementary nature, effectively integrating these interleaved modalities for improved forecasting remains challenging. In this work, we propose a unified neural architecture tha… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: Preprint

    ACM Class: I.2.7; J.4

  35. arXiv:2509.16588  [pdf, ps, other

    cs.CV cs.AI cs.RO

    SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving

    Authors: Haiming Zhang, Yiyao Zhu, Wending Zhou, Xu Yan, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li

    Abstract: Sparse Perception Models (SPMs) adopt a query-driven paradigm that forgoes explicit dense BEV or volumetric construction, enabling highly efficient computation and accelerated inference. In this paper, we introduce SQS, a novel query-based splatting pre-training specifically designed to advance SPMs in autonomous driving. SQS introduces a plug-in module that predicts 3D Gaussian representations fr… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

    Comments: NeurIPS 2025 (Spotlight)

  36. arXiv:2509.16552  [pdf, ps, other

    cs.CV cs.RO

    ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting

    Authors: Xiaoyang Yan, Muleilan Pei, Shaojie Shen

    Abstract: 3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving. Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead, but they remain constrained by insufficient multi-view spatial interaction and limited multi-frame temporal consistency. To overcome these issues, in this paper, we pro… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

  37. arXiv:2509.13832  [pdf, ps, other

    cs.RO

    UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography

    Authors: Teng Wang, Haojun Jiang, Yuxuan Wang, Zhenguo Sun, Xiangjie Yan, Xiang Li, Gao Huang

    Abstract: Carotid ultrasound is crucial for the assessment of cerebrovascular health, particularly the internal carotid artery (ICA). While previous research has explored automating carotid ultrasound, none has tackled the challenging ICA. This is primarily due to its deep location, tortuous course, and significant individual variations, which greatly increase scanning complexity. To address this, we propos… ▽ More

    Submitted 8 October, 2025; v1 submitted 17 September, 2025; originally announced September 2025.

  38. arXiv:2509.13164  [pdf, ps, other

    cs.RO eess.SY

    TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving

    Authors: Jiawei Wang, Haowei Sun, Xintao Yan, Shuo Feng, Jun Gao, Henry X. Liu

    Abstract: Safe and scalable deployment of end-to-end (E2E) autonomous driving requires extensive and diverse data, particularly safety-critical events. Existing data are mostly generated from simulators with a significant sim-to-real gap or collected from on-road testing that is costly and unsafe. This paper presents TeraSim-World, an automated pipeline that synthesizes realistic and geographically diverse… ▽ More

    Submitted 17 September, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: 8 pages, 6 figures

  39. arXiv:2509.12815  [pdf, ps, other

    cs.CV

    Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

    Authors: Biwen Lei, Yang Li, Xinhai Liu, Shuhui Yang, Lixin Xu, Jingwei Huang, Ruining Tang, Haohan Weng, Jian Liu, Jing Xu, Zhen Zhou, Yiling Zhu, Jiankai Xing, Jiachen Xu, Changfeng Ma, Xinhao Yan, Yunhan Yang, Chunshi Wang, Duoteng Xu, Xueqi Ma, Yuguang Chen, Jing Li, Mingxin Yang, Sheng Zhang, Yifei Feng , et al. (75 additional authors not shown)

    Abstract: The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: Technical Report

  40. arXiv:2509.12519  [pdf, ps, other

    cs.CE cs.CL q-fin.CP

    Context-Aware Language Models for Forecasting Market Impact from Sequences of Financial News

    Authors: Ross Koval, Nicholas Andrews, Xifeng Yan

    Abstract: Financial news plays a critical role in the information diffusion process in financial markets and is a known driver of stock prices. However, the information in each news article is not necessarily self-contained, often requiring a broader understanding of the historical news coverage for accurate interpretation. Further, identifying and incorporating the most relevant contextual information pres… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: Preprint

    ACM Class: I.2.7; J.4

  41. arXiv:2509.12086  [pdf, ps, other

    cs.DB cs.DS cs.IR

    SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation

    Authors: Hui Li, Shiyuan Deng, Xiao Yan, Xiangyu Zhi, James Cheng

    Abstract: Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: 13 pages, 12 figures, accepted by SIGMOD

  42. arXiv:2509.08643  [pdf, ps, other

    cs.GR cs.CV

    X-Part: high fidelity and structure coherent shape decomposition

    Authors: Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, Chunchao Guo

    Abstract: Generating 3D shapes at part level is pivotal for downstream applications such as mesh retopology, UV mapping, and 3D printing. However, existing part-based generation methods often lack sufficient controllability and suffer from poor semantically meaningful decomposition. To this end, we introduce X-Part, a controllable generative model designed to decompose a holistic 3D object into semantically… ▽ More

    Submitted 23 September, 2025; v1 submitted 10 September, 2025; originally announced September 2025.

    Comments: Tech Report, Project Page: https://yanxinhao.github.io/Projects/X-Part/

  43. arXiv:2509.07019  [pdf, ps, other

    cs.LG cs.AI

    An efficient deep reinforcement learning environment for flexible job-shop scheduling

    Authors: Xinquan Wu, Xuefeng Yan, Mingqiang Wei, Donghai Guan

    Abstract: The Flexible Job-shop Scheduling Problem (FJSP) is a classical combinatorial optimization problem that has a wide-range of applications in the real world. In order to generate fast and accurate scheduling solutions for FJSP, various deep reinforcement learning (DRL) scheduling methods have been developed. However, these methods are mainly focused on the design of DRL scheduling Agent, overlooking… ▽ More

    Submitted 6 September, 2025; originally announced September 2025.

  44. arXiv:2509.06784  [pdf, ps, other

    cs.CV

    P3-SAM: Native 3D Part Segmentation

    Authors: Changfeng Ma, Yang Li, Xinhao Yan, Jiachen Xu, Yunhan Yang, Chunshi Wang, Zibo Zhao, Yanwen Guo, Zhuo Chen, Chunchao Guo

    Abstract: Segmenting 3D assets into their constituent parts is crucial for enhancing 3D understanding, facilitating model reuse, and supporting various applications such as part generation. However, current methods face limitations such as poor robustness when dealing with complex objects and cannot fully automate the process. In this paper, we propose a native 3D point-promptable part segmentation model te… ▽ More

    Submitted 25 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

    Comments: Tech Report. Project Page: https://murcherful.github.io/P3-SAM/

  45. arXiv:2509.06341  [pdf, ps, other

    cs.AI

    Evaluating Multi-Turn Bargain Skills in LLM-Based Seller Agent

    Authors: Issue Yishu Wang, Kakam Chong, Xiaofeng Wang, Xu Yan, DeXin Kong, Chen Ju, Ming Chen, Shuai Xiao, Shuguang Han, jufeng chen

    Abstract: In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining eff… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  46. arXiv:2509.05469  [pdf, ps, other

    cs.AI cs.CV cs.CY cs.HC

    From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation

    Authors: Chenguang Wang, Xiang Yan, Yilong Dai, Ziyi Wang, Susu Xu

    Abstract: Realistic visual renderings of street-design scenarios are essential for public engagement in active transportation planning. Traditional approaches are labor-intensive, hindering collective deliberation and collaborative decision-making. While AI-assisted generative design shows transformative potential by enabling rapid creation of design scenarios, existing generative approaches typically requi… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: 21 pages, 8 figures

  47. arXiv:2509.02036  [pdf

    cs.CL cs.AI

    DeepSeek performs better than other Large Language Models in Dental Cases

    Authors: Hexian Zhang, Xinyu Yan, Yanqi Yang, Lijian Jin, Ping Yang, Junwen Wang

    Abstract: Large language models (LLMs) hold transformative potential in healthcare, yet their capacity to interpret longitudinal patient narratives remains inadequately explored. Dentistry, with its rich repository of structured clinical data, presents a unique opportunity to rigorously assess LLMs' reasoning abilities. While several commercial LLMs already exist, DeepSeek, a model that gained significant a… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: Abstract word count: 171; Total word count: 3130; Total number of tables: 2; Total number of figures: 3; Number of references: 32

  48. DiskJoin: Large-scale Vector Similarity Join with SSD

    Authors: Yanqi Chen, Xiao Yan, Alexandra Meliou, Eric Lo

    Abstract: Similarity join--a widely used operation in data science--finds all pairs of items that have distance smaller than a threshold. Prior work has explored distributed computation methods to scale similarity join to large data volumes but these methods require a cluster deployment, and efficiency suffers from expensive inter-machine communication. On the other hand, disk-based solutions are more cost-… ▽ More

    Submitted 10 October, 2025; v1 submitted 25 August, 2025; originally announced August 2025.

    Comments: Accepted at SIGMOD 2026

  49. arXiv:2508.17700  [pdf

    cs.LG

    Adaptive Ensemble Learning with Gaussian Copula for Load Forecasting

    Authors: Junying Yang, Gang Lu, Xiaoqing Yan, Peng Xia, Di Wu

    Abstract: Machine learning (ML) is capable of accurate Load Forecasting from complete data. However, there are many uncertainties that affect data collection, leading to sparsity. This article proposed a model called Adaptive Ensemble Learning with Gaussian Copula to deal with sparsity, which contains three modules: data complementation, ML construction, and adaptive ensemble. First, it applies Gaussian Cop… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  50. arXiv:2508.16263  [pdf, ps, other

    cs.DB cs.IR

    Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study

    Authors: Mocheng Li, Xiao Yan, Baotong Lu, Yue Zhang, James Cheng, Chenhao Ma

    Abstract: With the growing integration of structured and unstructured data, new methods have emerged for performing similarity searches on vectors while honoring structured attribute constraints, i.e., a process known as Filtering Approximate Nearest Neighbor (Filtering ANN) search. Since many of these algorithms have only appeared in recent years and are designed to work with a variety of base indexing met… ▽ More

    Submitted 20 September, 2025; v1 submitted 22 August, 2025; originally announced August 2025.

    Comments: 15 pages, 15 figures, Accepted at SIGMOD 2026