这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 4,197 results for author: Zhang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17448  [pdf, ps, other

    cs.CE cs.AI physics.chem-ph

    Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

    Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu

    Abstract: Retrosynthesis planning, essential in organic synthesis and drug discovery, has greatly benefited from recent AI-driven advancements. Nevertheless, existing methods frequently face limitations in both applicability and explainability. Traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Preprint

  2. arXiv:2507.17271  [pdf, ps, other

    cs.SE

    Seed&Steer: Guiding Large Language Models with Compilable Prefix and Branch Signals for Unit Test Generation

    Authors: Shuaiyu Zhou, Zhengran Zeng, Xiaoling Zhou, Rui Xie, Shikun Zhang, Wei Ye

    Abstract: Unit tests play a vital role in the software development lifecycle. Recent advances in Large Language Model (LLM)-based approaches have significantly improved automated test generation, garnering attention from both academia and industry. We revisit LLM-based unit test generation from a novel perspective by decoupling prefix generation and assertion generation. To characterize their respective cha… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  3. arXiv:2507.17205  [pdf, ps, other

    cs.CV

    VBCD: A Voxel-Based Framework for Personalized Dental Crown Design

    Authors: Linda Wei, Chang Liu, Wenran Zhang, Zengji Zhang, Shaoting Zhang, Hongsheng Li

    Abstract: The design of restorative dental crowns from intraoral scans is labor-intensive for dental technicians. To address this challenge, we propose a novel voxel-based framework for automated dental crown design (VBCD). The VBCD framework generates an initial coarse dental crown from voxelized intraoral scans, followed by a fine-grained refiner incorporating distance-aware supervision to improve accurac… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  4. arXiv:2507.17089  [pdf, ps, other

    cs.CV cs.RO

    IONext: Unlocking the Next Era of Inertial Odometry

    Authors: Shanshan Zhang, Siyue Wang, Tianshui Wen, Qi Zhang, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspi… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  5. arXiv:2507.16865  [pdf, ps, other

    cs.RO

    ResKACNNet: A Residual ChebyKAN Network for Inertial Odometry

    Authors: Shanshan Zhang, Tianshui Wen, Siyue Wang, Qi Zhang, Ziheng Zhou, Huiru Zheng, Lingxiang Zheng, Yu Yang

    Abstract: Inertial Measurement Unit (IMU) has become a key technology for achieving low-cost and precise positioning. However, traditional CNN-based inertial positioning methods struggle to capture the nonlinear motion characteristics and long-term dependencies in IMU data. To address this limitation, we propose a novel inertial positioning network with a generic backbone called ResChebyKAN, which leverages… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  6. arXiv:2507.16696  [pdf, ps, other

    cs.LG cs.AI cs.MM cs.SD

    FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

    Authors: Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian, Xie Chen, Cheng Lu, Jia Liu

    Abstract: With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scalin… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 11 pages, 6 figures

  7. arXiv:2507.16663  [pdf, ps, other

    cs.CL cs.AI

    Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs

    Authors: Yujin Han, Hao Chen, Andi Han, Zhiheng Wang, Xinyu Lin, Yingya Zhang, Shiwei Zhang, Difan Zou

    Abstract: Despite efforts to unify multimodal generation and understanding tasks in a single model, we show these MLLMs exhibit self-contradiction where generation produces images deemed misaligned with input prompts based on the model's own understanding. We define a Nonunified score that quantifies such self-contradiction. Our empirical results reveal that the self-contradiction mainly arises from weak ge… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 19 pages, 9 figures, 3 tables

  8. arXiv:2507.16336  [pdf, ps, other

    cond-mat.mtrl-sci cond-mat.dis-nn cs.CC cs.LG

    Constructing material network representations for intelligent amorphous alloys design

    Authors: S. -Y. Zhang, J. Tian, S. -L. Liu, H. -M. Zhang, H. -Y. Bai, Y. -C. Hu, W. -H. Wang

    Abstract: Designing high-performance amorphous alloys is demanding for various applications. But this process intensively relies on empirical laws and unlimited attempts. The high-cost and low-efficiency nature of the traditional strategies prevents effective sampling in the enormous material space. Here, we propose material networks to accelerate the discovery of binary and ternary amorphous alloys. The ne… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 5 figures

  9. arXiv:2507.16190  [pdf, ps, other

    cs.SD eess.AS

    LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement

    Authors: Haoyin Yan, Jie Zhang, Chengqian Jiang, Shuang Zhang

    Abstract: Multichannel speech enhancement (SE) aims to restore clean speech from noisy measurements by leveraging spatiotemporal signal features. In ad-hoc array conditions, microphone invariance (MI) requires systems to handle different microphone numbers and array geometries. From a practical perspective, multichannel recordings inevitably increase the computational burden for edge-device applications, hi… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  10. arXiv:2507.16124  [pdf, ps, other

    cs.RO cs.AI

    Benchmarking LLM Privacy Recognition for Social Robot Decision Making

    Authors: Dakota Sullivan, Shirley Zhang, Jennica Li, Heather Kirkorian, Bilge Mutlu, Kassem Fawaz

    Abstract: Social robots are embodied agents that interact with people while following human communication norms. These robots interact using verbal and non-verbal cues, and share the physical environments of people. While social robots have previously utilized rule-based systems or probabilistic models for user interaction, the rapid evolution of large language models (LLMs) presents new opportunities to de… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 18 pages, 7 figures. Dakota Sullivan and Shirley Zhang contributed equally to this work

  11. arXiv:2507.16121  [pdf, ps, other

    cs.RO

    DWSFormer: A Lightweight Inertial Odometry Network for Complex Motion Modeling

    Authors: Shanshan Zhang, Qi Zhang, Siyue Wang, Tianshui Wen, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Inertial odometry (IO) directly estimates the position of a carrier from inertial sensor measurements and serves as a core technology for the widespread deployment of consumer grade localization systems. While existing IO methods can accurately reconstruct simple and near linear motion trajectories, they often fail to account for drift errors caused by complex motion patterns such as turning. This… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  12. arXiv:2507.16120  [pdf, ps, other

    cs.RO

    FTIN: Frequency-Time Integration Network for Inertial Odometry

    Authors: Shanshan Zhang, Qi Zhang, Siyue Wang, Tianshui Wen, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: In recent years, machine learning has achieved significant advancements in inertial odometry. However, most existing inertial odometry methods primarily rely on CNNs in the time domain. These methods often struggle to capture long-term dependency in inertial measurement unit data, thereby constraining the potential for further improvements in localization accuracy. To address these issues, we prop… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  13. arXiv:2507.16116  [pdf, ps, other

    cs.CV

    PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

    Authors: Yaofang Liu, Yumeng Ren, Aitor Artola, Yuxuan Hu, Xiaodong Cun, Xiaotong Zhao, Alan Zhao, Raymond H. Chan, Suiyun Zhang, Rui Liu, Dandan Tu, Jean-Michel Morel

    Abstract: The rapid advancement of video diffusion models has been hindered by fundamental limitations in temporal modeling, particularly the rigid synchronization of frame evolution imposed by conventional scalar timestep variables. While task-specific adaptations and autoregressive models have sought to address these challenges, they remain constrained by computational inefficiency, catastrophic forgettin… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Code is open-sourced at https://github.com/Yaofang-Liu/Pusa-VidGen

  14. arXiv:2507.15724  [pdf, ps, other

    cs.CV

    A Practical Investigation of Spatially-Controlled Image Generation with Transformers

    Authors: Guoxuan Xia, Harleen Hanspal, Petru-Daniel Tudosiu, Shifeng Zhang, Sarah Parisot

    Abstract: Enabling image generation models to be spatially controlled is an important area of research, empowering users to better generate images according to their own fine-grained specifications via e.g. edge maps, poses. Although this task has seen impressive improvements in recent times, a focus on rapidly producing stronger models has come at the cost of detailed and fair scientific comparison. Differ… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: preprint

  15. arXiv:2507.15293  [pdf, ps, other

    cs.RO

    RepILN: Reparameterized Inertial Localization Network

    Authors: Shanshan Zhang, Tianshui Wen, Siyue Wang, Qi Zhang, Ziheng Zhou, Lingxiang Zheng, Yu Yang

    Abstract: Inertial localization is regarded as a promising positioning solution for consumer-grade IoT devices due to its cost-effectiveness and independence from external infrastructure. However, data-driven inertial localization methods often rely on increasingly complex network architectures to improve accuracy, which challenges the limited computational resources of IoT devices. Moreover, these methods… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  16. arXiv:2507.15223  [pdf, ps, other

    cs.CV

    Hierarchical Part-based Generative Model for Realistic 3D Blood Vessel

    Authors: Siqi Chen, Guoqing Zhang, Jiahao Lai, Bingzhi Shen, Sihong Zhang, Caixia Dong, Xuejin Chen, Yang Li

    Abstract: Advancements in 3D vision have increased the impact of blood vessel modeling on medical applications. However, accurately representing the complex geometry and topology of blood vessels remains a challenge due to their intricate branching patterns, curvatures, and irregular shapes. In this study, we propose a hierarchical part-based frame work for 3D vessel generation that separates the global bin… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  17. arXiv:2507.14969  [pdf, ps, other

    cs.SE

    Think Like an Engineer: A Neuro-Symbolic Collaboration Agent for Generative Software Requirements Elicitation and Self-Review

    Authors: Sai Zhang, Zhenchang Xing, Jieshan Chen, Dehai Zhao, Zizhong Zhu, Xiaowang Zhang, Zhiyong Feng, Xiaohong Li

    Abstract: The vision of End-User Software Engineering (EUSE) is to empower non-professional users with full control over the software development lifecycle. It aims to enable users to drive generative software development using only natural language requirements. However, since end-users often lack knowledge of software engineering, their requirement descriptions are frequently ambiguous, raising significan… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    ACM Class: D.2.1

  18. arXiv:2507.14850  [pdf, ps, other

    cs.LG cs.AI cs.RO

    Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

    Authors: H. M. Sabbir Ahmad, Ehsan Sabouni, Alexander Wasilkoff, Param Budhraja, Zijian Guo, Songyuan Zhang, Chuchu Fan, Christos Cassandras, Wenchao Li

    Abstract: We address the problem of safe policy learning in multi-agent safety-critical autonomous systems. In such systems, it is necessary for each agent to meet the safety requirements at all times while also cooperating with other agents to accomplish the task. Toward this end, we propose a safe Hierarchical Multi-Agent Reinforcement Learning (HMARL) approach based on Control Barrier Functions (CBFs). O… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  19. arXiv:2507.14815  [pdf, ps, other

    cs.CL

    FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing

    Authors: Shoutao Guo, Shaolei Zhang, Qingkai Fang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: The rapid advancement of Large Language Models (LLMs) has spurred significant progress in Large Speech-Language Models (LSLMs), enhancing their capabilities in both speech understanding and generation. While existing LSLMs often concentrate on augmenting speech generation or tackling a diverse array of short-speech tasks, the efficient processing of long-form speech remains a critical yet underexp… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: The code is at https://github.com/ictnlp/FastLongSpeech. This model is at https://huggingface.co/ICTNLP/FastLongSpeech. The dataset is at https://huggingface.co/datasets/ICTNLP/LongSpeech-Eval

  20. arXiv:2507.14459  [pdf, ps, other

    cs.CV

    VisGuard: Securing Visualization Dissemination through Tamper-Resistant Data Retrieval

    Authors: Huayuan Ye, Juntong Chen, Shenzhuo Zhang, Yipeng Zhang, Changbo Wang, Chenhui Li

    Abstract: The dissemination of visualizations is primarily in the form of raster images, which often results in the loss of critical information such as source code, interactive features, and metadata. While previous methods have proposed embedding metadata into images to facilitate Visualization Image Data Retrieval (VIDR), most existing methods lack practicability since they are fragile to common image ta… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 9 pages, IEEE VIS 2025

  21. arXiv:2507.14198  [pdf, ps, other

    cs.CL cs.AI

    Retention analysis of edited knowledge after fine-tuning

    Authors: Fufang Wen, Shichang Zhang

    Abstract: Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training. In parallel, LLMs ar… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  22. arXiv:2507.13839  [pdf

    cs.CL cs.HC

    The Expressions of Depression and Anxiety in Chinese Psycho-counseling: Usage of First-person Singular Pronoun and Negative Emotional Words

    Authors: Lizhi Ma, Tong Zhao, Shuai Zhang, Nirui Song, Hongliang He, Anqi Li, Ran Feng, Huachuan Qiu, Jingsong Ma, Zhenzhong Lan

    Abstract: This study explores the relationship between linguistic expressions and psychological states of depression and anxiety within Chinese psycho-counseling interactions, focusing specifically on the usage of first-person singular pronouns and negative emotional words. Utilizing a corpus derived from 735 online counseling sessions, the analysis employed a general linear mixed-effect model to assess lin… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  23. arXiv:2507.13793  [pdf, other

    cs.CL

    An Enhanced Model-based Approach for Short Text Clustering

    Authors: Enhao Cheng, Shoujia Zhang, Jianhua Yin, Xuemeng Song, Tian Gan, Liqiang Nie

    Abstract: Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook. Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches. This task is inherently challenging due to the sparse, large-scale, and high-dimensional characteristics of the short text data… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  24. arXiv:2507.13765  [pdf, ps, other

    cs.LG

    Dual-Center Graph Clustering with Neighbor Distribution

    Authors: Enhao Cheng, Shoujia Zhang, Jianhua Yin, Li Jin, Liqiang Nie

    Abstract: Graph clustering is crucial for unraveling intricate data structures, yet it presents significant challenges due to its unsupervised nature. Recently, goal-directed clustering techniques have yielded impressive results, with contrastive learning methods leveraging pseudo-label garnering considerable attention. Nonetheless, pseudo-label as a supervision signal is unreliable and existing goal-direct… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: ECAI-2025

  25. arXiv:2507.13659  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.NE

    When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework

    Authors: Xiao Wang, Qian Zhu, Shujuan Wu, Bo Jiang, Shiliang Zhang, Yaowei Wang, Yonghong Tian, Bin Luo

    Abstract: Recent researchers have proposed using event cameras for person re-identification (ReID) due to their promising performance and better balance in terms of privacy protection, event camera-based person ReID has attracted significant attention. Currently, mainstream event-based person ReID algorithms primarily focus on fusing visible light and event stream, as well as preserving privacy. Although si… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  26. arXiv:2507.12791  [pdf, ps, other

    math.NA cs.DS math.PR math.ST

    Analysis of Langevin midpoint methods using an anticipative Girsanov theorem

    Authors: Matthew S. Zhang

    Abstract: We introduce a new method for analyzing midpoint discretizations of stochastic differential equations (SDEs), which are frequently used in Markov chain Monte Carlo (MCMC) methods for sampling from a target measure $π\propto \exp(-V)$. Borrowing techniques from Malliavin calculus, we compute estimates for the Radon-Nikodym derivative for processes on $L^2([0, T); \mathbb{R}^d)$ which may anticipate… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  27. arXiv:2507.12507  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training

    Authors: Mingjie Liu, Shizhe Diao, Jian Hu, Ximing Lu, Xin Dong, Hao Zhang, Alexander Bukharin, Shaokun Zhang, Jiaqi Zeng, Makesh Narsimhan Sreedhar, Gerald Shen, David Mosallanezhad, Di Zhang, Jonas Yang, June Yang, Oleksii Kuchaiev, Guilin Liu, Zhiding Yu, Pavlo Molchanov, Yejin Choi, Jan Kautz, Yi Dong

    Abstract: Recent advancements in reasoning-focused language models such as OpenAI's O1 and DeepSeek-R1 have shown that scaling test-time computation-through chain-of-thought reasoning and iterative exploration-can yield substantial improvements on complex tasks like mathematics and code generation. These breakthroughs have been driven by large-scale reinforcement learning (RL), particularly when combined wi… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: 14 pages, 7 figures

  28. arXiv:2507.12022  [pdf, ps, other

    cs.CV

    Dataset Ownership Verification for Pre-trained Masked Models

    Authors: Yuechen Xie, Jie Song, Yicheng Shan, Xiaoyan Zhang, Yuanyu Wan, Shengxuming Zhang, Jiarui Duan, Mingli Song

    Abstract: High-quality open-source datasets have emerged as a pivotal catalyst driving the swift advancement of deep learning, while facing the looming threat of potential exploitation. Protecting these datasets is of paramount importance for the interests of their owners. The verification of dataset ownership has evolved into a crucial approach in this domain; however, existing verification techniques are… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  29. arXiv:2507.11980  [pdf, ps, other

    cs.CV

    EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models

    Authors: Jiajian Xie, Shengyu Zhang, Zhou Zhao, Fan Wu, Fei Wu

    Abstract: Diffusion Models have shown remarkable proficiency in image and video synthesis. As model size and latency increase limit user experience, hybrid edge-cloud collaborative framework was recently proposed to realize fast inference and high-quality generation, where the cloud model initiates high-quality semantic planning and the edge model expedites later-stage refinement. However, excessive cloud d… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: 21 pages, 8 figures. arXiv admin note: text overlap with arXiv:2408.12588 by other authors

  30. arXiv:2507.11554  [pdf, ps, other

    cs.CV cs.AI

    Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

    Authors: Zejian Li, Yize Li, Chenye Meng, Zhongni Liu, Yang Ling, Shengyuan Zhang, Guang Yang, Changyuan Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model and a reward model, which not only incurs substantial computational overhead but may also compromise model accuracy and training efficiency. To address these l… ▽ More

    Submitted 18 July, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

  31. arXiv:2507.10628  [pdf, ps, other

    cs.LG cs.AI

    GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning

    Authors: Ziru Liu, Cheng Gong, Xinyu Fu, Yaofang Liu, Ran Chen, Shoubo Hu, Suiyun Zhang, Rui Liu, Qingfu Zhang, Dandan Tu

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for facilitating the self-improvement of large language models (LLMs), particularly in the domain of complex reasoning tasks. However, prevailing on-policy RL methods often contend with significant training instability and inefficiency. This is primarily due to a capacity-difficulty mismatch, where th… ▽ More

    Submitted 16 July, 2025; v1 submitted 14 July, 2025; originally announced July 2025.

    Comments: Code avaiable at https://github.com/hkgc-1/GHPO

  32. arXiv:2507.10532  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

    Authors: Mingqi Wu, Zhihao Zhang, Qiaole Dong, Zhiheng Xi, Jun Zhao, Senjie Jin, Xiaoran Fan, Yuhao Zhou, Yanwei Fu, Qin Liu, Songyang Zhang, Qi Zhang

    Abstract: The reasoning capabilities of large language models (LLMs) have been a longstanding focus of research. Recent works have further enhanced these capabilities using reinforcement learning (RL), with many new methods claiming significant improvements with minimal or no external supervision. Surprisingly, some studies even suggest that random or incorrect reward signals can enhance reasoning performan… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

    Comments: 26 pages

  33. arXiv:2507.09619  [pdf, ps, other

    cs.CV

    Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection

    Authors: Yilin Lu, Jianghang Lin, Linhuang Xie, Kai Zhao, Yansong Qu, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

    Abstract: Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples significantly limits the effectiveness of existing methods in tasks such as localization and classification. While several anomaly synthesis approaches have been introduced for data augmentation, they often struggle with low realism, inaccurate mask alignment, and poor generalization. To overcome… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

  34. arXiv:2507.09612  [pdf, ps, other

    cs.CV

    Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive

    Authors: You Huang, Lichao Chen, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

    Abstract: Interactive segmentation (IS) improves annotation efficiency by segmenting target regions from user prompts, with widespread applications in real-world scenarios. Current approaches face a critical trade-off: dense-token methods achieve superior accuracy and detail preservation but suffer from prohibitively slow processing on CPU devices, while the Segment Anything Model (SAM) advances the field w… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  35. arXiv:2507.09315  [pdf, ps, other

    cs.SE cs.AI

    Enhancing Interpretability in Software Change Management with Chain-of-Thought Reasoning

    Authors: Yongqian Sun, Weihua Kuang, Chao Shen, Xidao Wen, Tinghua Zheng, Heng Liu, Shenglin Zhang, Bo Wu, Dan Pei

    Abstract: In modern online services, frequent software changes introduce significant risks. To tackle this challenge, we propose SCELM (Software Change Evaluation and Lifecycle Management), an end-to-end automated framework for software change management. SCELM aims to manage software changes efficiently and precisely, significantly reducing service failures and economic losses.

    Submitted 12 July, 2025; originally announced July 2025.

    Comments: 22 pages, 19 figures

  36. arXiv:2507.09205  [pdf, ps, other

    cs.CL

    Advancing Large Language Models for Tibetan with Curated Data and Continual Pre-Training

    Authors: Leiyu Pan, Bojian Xiong, Lei Yang, Renren Jin, Shaowei Zhang, Yue Chen, Ling Shi, Jiang Zhou, Junru Wu, Zhen Wang, Jianxiang Peng, Juesi Xiao, Tianyu Dong, Zhuowen Han, Zhuo Chen, Yuqi Ren, Deyi Xiong

    Abstract: Large language models have achieved remarkable progress across many languages. However, Tibetan, as a representative low-resource language, is particularly underrepresented in existing models due to the scarcity of high-quality training corpora. To address this gap, we curate the largest Tibetan pre-training corpus to date, aggregating data from diverse sources and applying a dedicated data cleani… ▽ More

    Submitted 23 July, 2025; v1 submitted 12 July, 2025; originally announced July 2025.

  37. arXiv:2507.09104  [pdf, ps, other

    cs.CL cs.AI

    CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

    Authors: Taolin Zhang, Maosong Cao, Alexander Lam, Songyang Zhang, Kai Chen

    Abstract: Recently, the role of LLM-as-judge in evaluating large language models has gained prominence. However, current judge models suffer from narrow specialization and limited robustness, undermining their capacity for comprehensive evaluations. In this work, we present CompassJudger-2, a novel generalist judge model that overcomes these limitations via a task-driven, multi-domain data curation strategy… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  38. arXiv:2507.09070  [pdf, ps, other

    eess.AS cs.SD

    SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment

    Authors: Shivam Mehta, Yingru Liu, Zhenyu Tang, Kainan Peng, Vimal Manohar, Shun Zhang, Mike Seltzer, Qing He, Mingbo Ma

    Abstract: Zero-shot voice conversion (VC) synthesizes speech in a target speaker's voice while preserving linguistic and paralinguistic content. However, timbre leakage-where source speaker traits persist-remains a challenge, especially in neural codec and LLM-based VC, where quantized representations entangle speaker identity with content. We introduce SemAlignVC, an architecture designed to prevent timbre… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: 6 pages, 2 figures, Accepted at the ISCA Speech Synthesis Workshop (SSW) 2025

    MSC Class: 68T07 ACM Class: I.2.7; I.2.6; G.3; H.5.5

  39. arXiv:2507.08920  [pdf, ps, other

    q-bio.BM cs.AI

    AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

    Authors: Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou

    Abstract: We introduce AMix-1, a powerful protein foundation model built on Bayesian Flow Networks and empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm. To guarantee robust scalability, we establish a predictive scaling law and reveal the progressive emergence of structural unde… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  40. arXiv:2507.08382  [pdf, ps, other

    cs.LG

    Two-cluster test

    Authors: Xinying Liu, Lianyu Hu, Mudi Jiang, Simeng Zhang, Jun Lou, Zengyou He

    Abstract: Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually generated by certain clustering procedures, the deployment of classic two-sample tests in this context would yield extremely smaller p-values, leading to inflated Type-… ▽ More

    Submitted 14 July, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

  41. arXiv:2507.07855  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Principled Foundations for Preference Optimization

    Authors: Wenxuan Zhou, Shujian Zhang, Brice Magdalou, John Lambert, Ehsan Amid, Richard Nock, Andrew Hard

    Abstract: In this paper, we show that direct preference optimization (DPO) is a very specific form of a connection between two major theories in the ML context of learning from preferences: loss functions (Savage) and stochastic choice (Doignon-Falmagne and Machina). The connection is established for all of Savage's losses and at this level of generality, (i) it includes support for abstention on the choice… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    ACM Class: I.2.6; I.2.7

  42. arXiv:2507.07526  [pdf, ps, other

    cs.SD eess.AS

    DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction

    Authors: Cunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Minggang Zhao, Zhao Lv

    Abstract: Decoding speech from brain signals is a challenging research problem. Although existing technologies have made progress in reconstructing the mel spectrograms of auditory stimuli at the word or letter level, there remain core challenges in the precise reconstruction of minute-level continuous imagined speech: traditional models struggle to balance the efficiency of temporal dependency modeling and… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM MM 2025

  43. arXiv:2507.07396  [pdf, ps, other

    cs.MM cs.LG cs.SD eess.AS

    IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

    Authors: Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li

    Abstract: Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing task. Two key challenges hinder progress: (1) the… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Under review of TNNLS

  44. arXiv:2507.07270  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Audio-Visual Speech Separation via Bottleneck Iterative Network

    Authors: Sidong Zhang, Shiv Shankar, Trang Nguyen, Andrea Fanelli, Madalina Fiterau

    Abstract: Integration of information from non-auditory cues can significantly improve the performance of speech-separation models. Often such models use deep modality-specific networks to obtain unimodal features, and risk being too costly or lightweight but lacking capacity. In this work, we present an iterative representation refinement approach called Bottleneck Iterative Network (BIN), a technique that… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Accepted to the 42nd International Conference on Machine Learning Workshop on Machine Learning for Audio

  45. arXiv:2507.06920  [pdf, ps, other

    cs.CL

    Rethinking Verification for LLM Code Generation: From Generation to Testing

    Authors: Zihan Ma, Taolin Zhang, Maosong Cao, Junnan Liu, Wenwei Zhang, Minnan Luo, Songyang Zhang, Kai Chen

    Abstract: Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench. However, a detailed examination reveals that these evaluation suites often comprise only a limited number of homogeneous test cases, resulting in subtle faults going undetected. This not only artificially inflates measured performance but also compromises accurate… ▽ More

    Submitted 9 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

  46. arXiv:2507.06272  [pdf, ps, other

    cs.CV cs.AI

    LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

    Authors: Zhang Li, Biao Yang, Qiang Liu, Shuo Zhang, Zhiyin Ma, Shuo Zhang, Liang Yin, Linger Deng, Yabo Sun, Yuliang Liu, Xiang Bai

    Abstract: While large multi-modal models (LMMs) demonstrate promising capabilities in segmentation and comprehension, they still struggle with two limitations: inaccurate segmentation and hallucinated comprehension. These challenges stem primarily from constraints in weak visual comprehension and a lack of fine-grained perception. To alleviate these limitations, we propose LIRA, a framework that capitalizes… ▽ More

    Submitted 14 July, 2025; v1 submitted 8 July, 2025; originally announced July 2025.

    Comments: ICCV 2025

  47. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  48. arXiv:2507.06138  [pdf, ps, other

    cs.CL cs.AI

    Coding Triangle: How Does Large Language Model Understand Code?

    Authors: Taolin Zhang, Zihan Ma, Maosong Cao, Junnan Liu, Songyang Zhang, Kai Chen

    Abstract: Large language models (LLMs) have achieved remarkable progress in code generation, yet their true programming competence remains underexplored. We introduce the Code Triangle framework, which systematically evaluates LLMs across three fundamental dimensions: editorial analysis, code implementation, and test case generation. Through extensive experiments on competitive programming benchmarks, we re… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  49. arXiv:2507.06000  [pdf, ps, other

    cs.HC

    Exploring Collaboration Patterns and Strategies in Human-AI Co-creation through the Lens of Agency: A Scoping Review of the Top-tier HCI Literature

    Authors: Shuning Zhang, Hui Wang, Xin Yi

    Abstract: As Artificial Intelligence (AI) increasingly becomes an active collaborator in co-creation, understanding the distribution and dynamic of agency is paramount. The Human-Computer Interaction (HCI) perspective is crucial for this analysis, as it uniquely reveals the interaction dynamics and specific control mechanisms that dictate how agency manifests in practice. Despite this importance, a systemat… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  50. arXiv:2507.05911  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Differentiable Reward Optimization for LLM based TTS system

    Authors: Changfeng Gao, Zhihao Du, Shiliang Zhang

    Abstract: This paper proposes a novel Differentiable Reward Optimization (DiffRO) method aimed at enhancing the performance of neural codec language models based text-to-speech (TTS) systems. In contrast to conventional reinforcement learning from human feedback (RLHF) approaches applied to TTS, DiffRO directly compute the rewards based on neural codec tokens, rather than relying on synthesized audio. Furth… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.