这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 196 results for author: Niu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.24657  [pdf, ps, other

    cs.CV

    Group Relative Attention Guidance for Image Editing

    Authors: Xuanpu Zhang, Xuesong Niu, Ruidong Chen, Dan Song, Jianhao Zeng, Penghui Du, Haoxiang Cao, Kai Wu, An-an Liu

    Abstract: Recently, image editing based on Diffusion-in-Transformer models has undergone rapid development. However, existing editing methods often lack effective control over the degree of editing, limiting their ability to achieve more customized results. To address this limitation, we investigate the MM-Attention mechanism within the DiT model and observe that the Query and Key tokens share a bias vector… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  2. arXiv:2510.17604  [pdf, ps, other

    cs.RO

    Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm

    Authors: Hao Qiao, Yan Wang, Shuo Yang, Xiaoyao Yu, Jian kuang, Xiaoji Niu

    Abstract: With the rapid growth of bike sharing and the increasing diversity of cycling applications, accurate bicycle localization has become essential. traditional GNSS-based methods suffer from multipath effects, while existing inertial navigation approaches rely on precise modeling and show limited robustness. Tight Learned Inertial Odometry (TLIO) achieves low position drift by combining raw IMU data w… ▽ More

    Submitted 20 October, 2025; originally announced October 2025.

  3. arXiv:2510.11892  [pdf, ps, other

    cs.CL

    R-WoM: Retrieval-augmented World Model For Computer-use Agents

    Authors: Kai Mei, Jiang Guo, Shuaichen Chang, Mingwen Dong, Dongkyu Lee, Xing Niu, Jiarong Jiang

    Abstract: Large Language Models (LLMs) can serve as world models to enhance agent decision-making in digital environments by simulating future states and predicting action outcomes, potentially eliminating costly trial-and-error exploration. However, this capability is fundamentally limited by LLMs' tendency toward hallucination and their reliance on static training knowledge, which can lead to compounding… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  4. arXiv:2510.11072  [pdf, ps, other

    cs.RO cs.AI cs.LG eess.SY

    PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System

    Authors: Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, Qifeng Chen, Jingbo Wang, Jiangmiao Pang

    Abstract: Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, Ph… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: Project website: https://why618188.github.io/physhsi/

  5. arXiv:2510.07958  [pdf, ps, other

    cs.CL cs.AI

    A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

    Authors: Fengji Zhang, Xinyao Niu, Chengyang Ying, Guancheng Lin, Zhongkai Hao, Zhou Fan, Chengen Huang, Jacky Keung, Bei Chen, Junyang Lin

    Abstract: Recent advances in Large Language Models (LLMs) and Reinforcement Learning (RL) have led to strong performance in open-domain question answering (QA). However, existing models still struggle with questions that admit multiple valid answers. Standard QA benchmarks, which typically assume a single gold answer, overlook this reality and thus produce inappropriate training signals. Existing attempts t… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  6. arXiv:2510.07861  [pdf, ps, other

    cs.AI

    Understanding DeepResearch via Reports

    Authors: Tianyu Fan, Xinyao Niu, Yuxiang Zheng, Fengji Zhang, Chengen Huang, Bei Chen, Junyang Lin, Chao Huang

    Abstract: DeepResearch agents represent a transformative AI paradigm, conducting expert-level research through sophisticated reasoning and multi-tool integration. However, evaluating these systems remains critically challenging due to open-ended research scenarios and existing benchmarks that focus on isolated capabilities rather than holistic performance. Unlike traditional LLM tasks, DeepResearch systems… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

    Comments: 22 pages, 4 figures

  7. arXiv:2509.17963  [pdf, ps, other

    cs.ET cs.AR

    Single-Cell Universal Logic-in-Memory Using 2T-nC FeRAM: An Area and Energy-Efficient Approach for Bulk Bitwise Computation

    Authors: Rudra Biswas, Jiahui Duan, Shan Deng, Xuezhong Niu, Yixin Qin, Prapti Panigrahi, Varun Parekh, Rajiv Joshi, Kai Ni, Vijaykrishnan Narayanan

    Abstract: This work presents a novel approach to configure 2T-nC ferroelectric RAM (FeRAM) for performing single cell logic-in-memory operations, highlighting its advantages in energy-efficient computation over conventional DRAM-based approaches. Unlike conventional 1T-1C dynamic RAM (DRAM), which incurs refresh overhead, 2T-nC FeRAM offers a promising alternative as a non-volatile memory solution with low… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: 6 Pages, 7 Figures, To be presented at System on Chip Conference 2025

  8. arXiv:2509.15492  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech

    Authors: Xinlei Niu, Jianbo Ma, Dylan Harper-Harris, Xiangyu Zhang, Charles Patrick Martin, Jing Zhang

    Abstract: The generation of realistic, context-aware audio is important in real-world applications such as video game development. While existing video-to-audio (V2A) methods mainly focus on Foley sound generation, they struggle to produce intelligible speech. Meanwhile, current environmental speech synthesis approaches remain text-driven and fail to temporally align with dynamic video content. In this pape… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  9. arXiv:2509.13780  [pdf, ps, other

    cs.RO

    Behavior Foundation Model for Humanoid Robots

    Authors: Weishuai Zeng, Shunlin Lu, Kangning Yin, Xiaojie Niu, Minyue Dai, Jingbo Wang, Jiangmiao Pang

    Abstract: Whole-body control (WBC) of humanoid robots has witnessed remarkable progress in skill versatility, enabling a wide range of applications such as locomotion, teleoperation, and motion tracking. Despite these achievements, existing WBC frameworks remain largely task-specific, relying heavily on labor-intensive reward engineering and demonstrating limited generalization across tasks and skills. Thes… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  10. arXiv:2509.13496  [pdf, ps, other

    cs.CV cs.LG

    BiasMap: Leveraging Cross-Attentions to Discover and Mitigate Hidden Social Biases in Text-to-Image Generation

    Authors: Rajatsubhra Chakraborty, Xujun Che, Depeng Xu, Cori Faklaris, Xi Niu, Shuhan Yuan

    Abstract: Bias discovery is critical for black-box generative models, especiall text-to-image (TTI) models. Existing works predominantly focus on output-level demographic distributions, which do not necessarily guarantee concept representations to be disentangled post-mitigation. We propose BiasMap, a model-agnostic framework for uncovering latent concept-level representational biases in stable diffusion mo… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  11. arXiv:2509.07909  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Uncovering Scaling Laws for Large Language Models via Inverse Problems

    Authors: Arun Verma, Zhaoxuan Wu, Zijian Zhou, Xiaoqiang Lin, Zhiliang Chen, Rachael Hwee Ling Sim, Rui Qiao, Jingtan Wang, Nhung Bui, Xinyuan Niu, Wenyang Hu, Gregory Kang Ruey Lau, Zi-Yu Khoo, Zitong Zhao, Xinyi Xu, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: Accepted at EMNLP Findings 2025

  12. arXiv:2509.01299  [pdf, ps, other

    cs.CV

    Cross-Domain Few-Shot Segmentation via Ordinary Differential Equations over Time Intervals

    Authors: Huan Ni, Qingshan Liu, Xiaonan Niu, Danfeng Hong, Lingli Zhao, Haiyan Guan

    Abstract: Cross-domain few-shot segmentation (CD-FSS) not only enables the segmentation of unseen categories with very limited samples, but also improves cross-domain generalization ability within the few-shot segmentation framework. Currently, existing CD-FSS studies typically design multiple independent modules to enhance the cross-domain generalization ability of feature representations. However, the ind… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

  13. arXiv:2508.13486  [pdf, ps, other

    cs.IT

    A Convergent Primal-Dual Algorithm for Computing Rate-Distortion-Perception Functions

    Authors: Chunhui Chen, Linyi Chen, Xueyan Niu, Hao Wu

    Abstract: Recent advances in Rate-Distortion-Perception (RDP) theory highlight the importance of balancing compression level, reconstruction quality, and perceptual fidelity. While previous work has explored numerical approaches to approximate the information RDP function, the lack of theoretical guarantees remains a major limitation, especially in the presence of complex perceptual constraints that introdu… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  14. arXiv:2508.11485  [pdf

    cs.RO

    i2Nav-Robot: A Large-Scale Indoor-Outdoor Robot Dataset for Multi-Sensor Fusion Navigation and Mapping

    Authors: Hailiang Tang, Tisheng Zhang, Liqiang Wang, Xin Ding, Man Yuan, Zhiyu Xiang, Jujin Chen, Yuhan Bian, Shuangyan Liu, Yuqing Wang, Guan Wang, Xiaoji Niu

    Abstract: Accurate and reliable navigation is crucial for autonomous unmanned ground vehicle (UGV). However, current UGV datasets fall short in meeting the demands for advancing navigation and mapping techniques due to limitations in sensor configuration, time synchronization, ground truth, and scenario diversity. To address these challenges, we present i2Nav-Robot, a large-scale dataset designed for multi-… ▽ More

    Submitted 27 August, 2025; v1 submitted 15 August, 2025; originally announced August 2025.

    Comments: 10 pages, 12 figures

  15. arXiv:2508.06053  [pdf, ps, other

    cs.RO

    ReNiL: Event-Driven Pedestrian Bayesian Localization Using IMU for Real-World Applications

    Authors: Kaixuan Wu, Yuanzhuo Xu, Zejun Zhang, Weiping Zhu, Jian Zhang, Steve Drew, Xiaoguang Niu

    Abstract: Pedestrian inertial localization is key for mobile and IoT services because it provides infrastructure-free positioning. Yet most learning-based methods depend on fixed sliding-window integration, struggle to adapt to diverse motion scales and cadences, and yield inconsistent uncertainty, limiting real-world use. We present ReNiL, a Bayesian deep-learning framework for accurate, efficient, and unc… ▽ More

    Submitted 6 November, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

    Comments: This work has been submitted to the ACM for possible publication

  16. arXiv:2508.05949  [pdf, ps, other

    cs.SE

    A Survey on Task Scheduling in Carbon-Aware Container Orchestration

    Authors: Jialin Yang, Zainab Saad, Jiajun Wu, Xiaoguang Niu, Henry Leung, Steve Drew

    Abstract: The soaring energy demands of large-scale software ecosystems and cloud data centers, accelerated by the intensive training and deployment of large language models, have driven energy consumption and carbon footprint to unprecedented levels. In response, both industry and academia are increasing efforts to reduce the carbon emissions associated with cloud computing through more efficient task sche… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: Submitted to ACM Computing Surveys

  17. arXiv:2507.18028  [pdf, ps, other

    cs.CL cs.AI

    NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database

    Authors: Weizhi Fei, Hao Shi, Jing Xu, Jingchen Peng, Jiazheng Li, Jingzhao Zhang, Bo Bai, Wei Han, Zhenyuan Chen, Xueyan Niu

    Abstract: Efficiently editing knowledge stored in large language models (LLMs) enables model updates without large-scale training. One possible solution is Locate-and-Edit (L\&E), allowing simultaneous modifications of a massive number of facts. However, such editing may compromise the general abilities of LLMs and even result in forgetting edited facts when scaling up to thousands of edits. In this paper,… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  18. arXiv:2507.16177  [pdf, ps, other

    cs.AR

    A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization

    Authors: Yifan Zhang, Xiaoyu Niu, Hongzheng Tian, Yanjun Zhang, Bo Yu, Shaoshan Liu, Sitao Huang

    Abstract: Path planning is critical for autonomous driving, generating smooth, collision-free, feasible paths based on perception and localization inputs. However, its computationally intensive nature poses significant challenges for resource-constrained autonomous driving hardware. This paper presents an end-to-end FPGA-based acceleration framework targeting the quadratic programming (QP), core of optimiza… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM Transactions on Architecture and Code Optimization (ACM TACO)

  19. arXiv:2506.08555  [pdf, ps, other

    cs.CV cs.HC

    Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement

    Authors: Xinyue Niu, Akira Furui

    Abstract: Cross-subject electromyography (EMG) pattern recognition faces significant challenges due to inter-subject variability in muscle anatomy, electrode placement, and signal characteristics. Traditional methods rely on subject-specific calibration data to adapt models to new users, an approach that is both time-consuming and impractical for large-scale, real-world deployment. This paper presents an ap… ▽ More

    Submitted 17 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 6 pages, 3 figures. This work has been accepted for presentation at the IEEE Engineering in Medicine and Biology Conference (EMBC) 2025. New version corrects numerical errors in Table 1. Conclusions are unaffected

  20. arXiv:2505.20683  [pdf, ps, other

    cs.DB

    In-memory Incremental Maintenance of Provenance Sketches [extended version]

    Authors: Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, Danica Porobic, Xing Niu

    Abstract: Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  21. arXiv:2505.18612  [pdf, ps, other

    cs.CV

    Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter

    Authors: Weizhi Zhong, Huan Yang, Zheng Liu, Huiguo He, Zijian He, Xuesong Niu, Di Zhang, Guanbin Li

    Abstract: Personalized text-to-image generation aims to synthesize images of user-provided concepts in diverse contexts. Despite recent progress in multi-concept personalization, most are limited to object concepts and struggle to customize abstract concepts (e.g., pose, lighting). Some methods have begun exploring multi-concept personalization supporting abstract concepts, but they require test-time fine-t… ▽ More

    Submitted 27 September, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: Code will be released upon paper acceptance

  22. arXiv:2505.12634  [pdf, other

    cs.RO eess.SP eess.SY

    MSCEKF-MIO: Magnetic-Inertial Odometry Based on Multi-State Constraint Extended Kalman Filter

    Authors: Jiazhu Li, Jian Kuang, Xiaoji Niu

    Abstract: To overcome the limitation of existing indoor odometry technologies which often cannot simultaneously meet requirements for accuracy cost-effectiveness, and robustness-this paper proposes a novel magnetometer array-aided inertial odometry approach, MSCEKF-MIO (Multi-State Constraint Extended Kalman Filter-based Magnetic-Inertial Odometry). We construct a magnetic field model by fitting measurement… ▽ More

    Submitted 20 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    Comments: 10 pages

  23. arXiv:2505.06584  [pdf, ps, other

    cs.RO cs.AI

    JAEGER: Dual-Level Humanoid Whole-Body Controller

    Authors: Ziluo Ding, Haobin Jiang, Yuxuan Wang, Zhenguo Sun, Yu Zhang, Xiaojie Niu, Ming Yang, Weishuai Zeng, Xinrun Xu, Zongqing Lu

    Abstract: This paper presents JAEGER, a dual-level whole-body controller for humanoid robots that addresses the challenges of training a more robust and versatile policy. Unlike traditional single-controller approaches, JAEGER separates the control of the upper and lower bodies into two independent controllers, so that they can better focus on their distinct tasks. This separation alleviates the dimensional… ▽ More

    Submitted 16 June, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

    Comments: 15 pages, 2 figures

  24. arXiv:2505.05112  [pdf, ps, other

    eess.IV cs.CV

    MDAA-Diff: CT-Guided Multi-Dose Adaptive Attention Diffusion Model for PET Denoising

    Authors: Xiaolong Niu, Zanting Ye, Xu Han, Yanchao Huang, Hao Sun, Hubing Wu, Lijun Lu

    Abstract: Acquiring high-quality Positron Emission Tomography (PET) images requires administering high-dose radiotracers, which increases radiation exposure risks. Generating standard-dose PET (SPET) from low-dose PET (LPET) has become a potential solution. However, previous studies have primarily focused on single low-dose PET denoising, neglecting two critical factors: discrepancies in dose response cause… ▽ More

    Submitted 21 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  25. arXiv:2505.05064  [pdf, ps, other

    cs.LG

    WaterDrum: Watermarking for Data-centric Unlearning Metric

    Authors: Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have se… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  26. arXiv:2505.01635  [pdf, ps, other

    cs.ET cs.AI

    Dendritic Computing with Multi-Gate Ferroelectric Field-Effect Transistors

    Authors: A N M Nafiul Islam, Xuezhong Niu, Jiahui Duan, Shubham Kumar, Kai Ni, Abhronil Sengupta

    Abstract: Although inspired by neuronal systems in the brain, artificial neural networks generally employ point-neurons, which offer far less computational complexity than their biological counterparts. Neurons have dendritic arbors that connect to different sets of synapses and offer local non-linear accumulation - playing a pivotal role in processing and learning. Inspired by this, we propose a novel neur… ▽ More

    Submitted 20 October, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  27. arXiv:2505.00827  [pdf, ps, other

    cs.AI

    MIMIC-\RNum{4}-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp for Risk Prediction

    Authors: Jing Wang, Xing Niu, Tong Zhang, Jie Shen, Juyong Kim, Jeremy C. Weiss

    Abstract: A crucial component for clinical risk prediction is developing a reliable prediction model is collecting high-quality time series clinical events. In this work, we release such a dataset that consists of 22,588,586 Clinical Time Series events, which we term MIMIC-\RNum{4}-Ext-22MCTS. Our source data are discharge summaries selected from the well-known yet unstructured MIMIC-IV-Note \cite{Johnson20… ▽ More

    Submitted 17 November, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  28. arXiv:2504.12520  [pdf, ps, other

    math.ST cs.CY

    Interpreting Network Differential Privacy

    Authors: Jonathan Hehir, Xiaoyue Niu, Aleksandra Slavkovic

    Abstract: How do we interpret the differential privacy (DP) guarantee for network data? We take a deep dive into a popular form of network DP ($\varepsilon$--edge DP) to find that many of its common interpretations are flawed. Drawing on prior work for privacy with correlated data, we interpret DP through the lens of adversarial hypothesis testing and demonstrate a gap between the pairs of hypotheses actual… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 19 pages

  29. arXiv:2504.10826  [pdf, ps, other

    cs.SD cs.MM eess.AS

    SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

    Authors: Xinlei Niu, Kin Wai Cheuk, Jing Zhang, Naoki Murata, Chieh-Hsin Lai, Michele Mancusi, Woosung Choi, Giorgio Fabbro, Wei-Hsiang Liao, Charles Patrick Martin, Yuki Mitsufuji

    Abstract: Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided editing methods rely on pretrained diffusion models by involving forward-backward diffusion processes. However, these methods often struggle to preserve the musical content. Additionally, text instructions alone usually fail to ac… ▽ More

    Submitted 11 November, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by AAAI2026

  30. arXiv:2504.04497  [pdf, other

    cs.RO

    SELC: Self-Supervised Efficient Local Correspondence Learning for Low Quality Images

    Authors: Yuqing Wang, Yan Wang, Hailiang Tang, Xiaoji Niu

    Abstract: Accurate and stable feature matching is critical for computer vision tasks, particularly in applications such as Simultaneous Localization and Mapping (SLAM). While recent learning-based feature matching methods have demonstrated promising performance in challenging spatiotemporal scenarios, they still face inherent trade-offs between accuracy and computational efficiency in specific settings. In… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: 8 pages, 4 figures

  31. arXiv:2503.14983  [pdf, other

    cs.CV

    Semi-KAN: KAN Provides an Effective Representation for Semi-Supervised Learning in Medical Image Segmentation

    Authors: Zanting Ye, Xiaolong Niu, Xuanbin Wu, Wenxiang Yi, Yuan Chang, Lijun Lu

    Abstract: Deep learning-based medical image segmentation has shown remarkable success; however, it typically requires extensive pixel-level annotations, which are both expensive and time-intensive. Semi-supervised medical image segmentation (SSMIS) offers a viable alternative, driven by advancements in CNNs and ViTs. However, these networks often rely on single fixed activation functions and linear modeling… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 18 pages, 7 figures, 6 tables

  32. arXiv:2503.11586  [pdf, ps, other

    cs.AI cs.CL

    Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space

    Authors: Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) are used in chatbots or AI assistants to hold conversations with a human user. In such applications, the quality (e.g., user engagement, safety) of a conversation is important and can only be exactly known at the end of the conversation. To maximize its expected quality, conversation planning reasons about the stochastic transitions within a conversation to select the… ▽ More

    Submitted 7 June, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: ICLR 2025 Spotlight

  33. arXiv:2503.09039  [pdf, other

    cs.GT eess.SY

    Incentive Analysis for Agent Participation in Federated Learning

    Authors: Lihui Yi, Xiaochun Niu, Ermin Wei

    Abstract: Federated learning offers a decentralized approach to machine learning, where multiple agents collaboratively train a model while preserving data privacy. In this paper, we investigate the decision-making and equilibrium behavior in federated learning systems, where agents choose between participating in global training or conducting independent local training. The problem is first modeled as a st… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  34. Beyond Overfitting: Doubly Adaptive Dropout for Generalizable AU Detection

    Authors: Yong Li, Yi Ren, Xuesong Niu, Yi Ding, Xiu-Shen Wei, Cuntai Guan

    Abstract: Facial Action Units (AUs) are essential for conveying psychological states and emotional expressions. While automatic AU detection systems leveraging deep learning have progressed, they often overfit to specific datasets and individual features, limiting their cross-domain applicability. To overcome these limitations, we propose a doubly adaptive dropout approach for cross-domain AU detection, whi… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Accetped by IEEE Transactions on Affective Computing 2025. A novel method for cross-domain facial action unit detection

    Journal ref: IEEE Transactions on Affective Computing 2025

  35. arXiv:2503.04580  [pdf, ps, other

    cs.RO

    DogLegs: Robust Proprioceptive State Estimation for Legged Robots Using Multiple Leg-Mounted IMUs

    Authors: Yibin Wu, Jian Kuang, Shahram Khorshidi, Xiaoji Niu, Lasse Klingbeil, Maren Bennewitz, Heiner Kuhlmann

    Abstract: Robust and accurate proprioceptive state estimation of the main body is crucial for legged robots to execute tasks in extreme environments where exteroceptive sensors, such as LiDARs and cameras, may become unreliable. In this paper, we propose DogLegs, a state estimation system for legged robots that fuses the measurements from a body-mounted inertial measurement unit (Body-IMU), joint encoders,… ▽ More

    Submitted 25 July, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 8 pages, 8 figures

  36. arXiv:2503.03786  [pdf, ps, other

    q-bio.TO cs.CV eess.IV

    Self is the Best Learner: CT-free Ultra-Low-Dose PET Organ Segmentation via Collaborating Denoising and Segmentation Learning

    Authors: Zanting Ye, Xiaolong Niu, Xu Han, Xuanbin Wu, Wantong Lu, Yijun Lu, Hao Sun, Yanchao Huang, Hubing Wu, Lijun Lu

    Abstract: Organ segmentation in Positron Emission Tomography (PET) plays a vital role in cancer quantification. Low-dose PET (LDPET) provides a safer alternative by reducing radiation exposure. However, the inherent noise and blurred boundaries make organ segmentation more challenging. Additionally, existing PET organ segmentation methods rely on coregistered Computed Tomography (CT) annotations, overlookin… ▽ More

    Submitted 26 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: This work has been accepted by MICCAI2025; 9 pages, 5 figures

  37. arXiv:2502.14227  [pdf, other

    cs.LG cs.AI

    SleepGMUformer: A gated multimodal temporal neural network for sleep staging

    Authors: Chenjun Zhao, Xuesen Niu, Xinglin Yu, Long Chen, Na Lv, Huiyu Zhou, Aite Zhao

    Abstract: Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multi… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  38. arXiv:2501.12959  [pdf, other

    cs.CL

    Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

    Authors: Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, Wei Han

    Abstract: Although applications involving long-context inputs are crucial for the effective utilization of large language models (LLMs), they also result in increased computational costs and reduced performance. To address this challenge, we propose an efficient, training-free prompt compression method that retains key information within compressed prompts. We identify specific attention heads in transforme… ▽ More

    Submitted 5 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  39. arXiv:2501.03079  [pdf, other

    cs.RO

    Wheel-GINS: A GNSS/INS Integrated Navigation System with a Wheel-mounted IMU

    Authors: Yibin Wu, Jian Kuang, Xiaoji Niu, Cyrill Stachniss, Lasse Klingbeil, Heiner Kuhlmann

    Abstract: A long-term accurate and robust localization system is essential for mobile robots to operate efficiently outdoors. Recent studies have shown the significant advantages of the wheel-mounted inertial measurement unit (Wheel-IMU)-based dead reckoning system. However, it still drifts over extended periods because of the absence of external correction signals. To achieve the goal of long-term accurate… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems

  40. arXiv:2412.18566  [pdf, other

    cs.CL eess.AS

    Zero-resource Speech Translation and Recognition with LLMs

    Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

    Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More

    Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

  41. arXiv:2412.16530  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

    Authors: Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

    Abstract: Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been lar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP, 4 pages

  42. arXiv:2412.01253  [pdf, other

    cs.CL cs.AI cs.LG

    Yi-Lightning Technical Report

    Authors: Alan Wake, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qicheng Hu, Shawn Wang, Shijun Zhou , et al. (19 additional authors not shown)

    Abstract: This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert seg… ▽ More

    Submitted 22 January, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  43. arXiv:2412.00122  [pdf, other

    cs.CV

    Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback

    Authors: Xuexiang Niu, Jinping Tang, Lei Wang, Ge Zhu

    Abstract: Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models. However, due to the lack of focus in feedback content, especially regarding the object type and quantity, these techniques struggle to accurately match text and images when faced with specified prompts. To address this issue, we propose an efficient fine-turning method… ▽ More

    Submitted 28 November, 2024; originally announced December 2024.

  44. arXiv:2411.19248  [pdf, ps, other

    cs.IT

    Reconfigurable Intelligent Surface-Assisted Multiple-Antenna Coded Caching

    Authors: Xiaofan Niu, Minquan Cheng, Kai Wan, Robert Caiming Qiu, Giuseppe Caire

    Abstract: Reconfigurable Intelligent Surface (RIS) has emerged as a promising technology to enhance the wireless propagation environment for next-generation wireless communication systems. This paper introduces a new RIS-assisted multiple-antenna coded caching problem. Unlike the existing multi-antenna coded caching models, our considered model incorporates a passive RIS with a limited number of elements ai… ▽ More

    Submitted 13 November, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

    Comments: Submitted to IEEE Trans. Information Theory, 40 pages

  45. arXiv:2411.11262  [pdf, other

    cs.CV cs.AI

    Cross-Patient Pseudo Bags Generation and Curriculum Contrastive Learning for Imbalanced Multiclassification of Whole Slide Image

    Authors: Yonghuang Wu, Xuan Xie, Xinyuan Niu, Chengqian Zhao, Jinhua Yu

    Abstract: Pathology computing has dramatically improved pathologists' workflow and diagnostic decision-making processes. Although computer-aided diagnostic systems have shown considerable value in whole slide image (WSI) analysis, the problem of multi-classification under sample imbalance remains an intractable challenge. To address this, we propose learning fine-grained information by generating sub-bags w… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: 9 pages, 4 figures

  46. arXiv:2411.05088  [pdf

    cs.CL

    Findings of the IWSLT 2024 Evaluation Campaign

    Authors: Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha , et al. (20 additional authors not shown)

    Abstract: This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: IWSLT 2024; 59 pages

  47. Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification

    Authors: Shi Dong, Xiaobei Niu, Rui Zhong, Zhifeng Wang, Mingzhang Zuo

    Abstract: Accurate annotation of educational resources is crucial for effective personalized learning and resource recommendation in online education. However, fine-grained knowledge labels often overlap or share similarities, making it difficult for existing multi-label classification methods to differentiate them. The label distribution imbalance due to sparsity of human annotations further intensifies th… ▽ More

    Submitted 25 April, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Journal ref: Knowledge-Based Systems, 2025: 113412

  48. arXiv:2411.01777  [pdf, other

    cs.CV

    Learning predictable and robust neural representations by straightening image sequences

    Authors: Xueyan Niu, Cristina Savin, Eero P. Simoncelli

    Abstract: Prediction is a fundamental capability of all living organisms, and has been proposed as an objective for learning sensory representations. Recent work demonstrates that in primate visual systems, prediction is facilitated by neural representations that follow straighter temporal trajectories than their initial photoreceptor encoding, which allows for prediction by linear extrapolation. Inspired b… ▽ More

    Submitted 20 January, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: Accepted at NeurIPS 2024

  49. arXiv:2411.00023  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models

    Authors: Ognjen, Rudovic, Pranay Dighe, Yi Su, Vineet Garg, Sameer Dharur, Xiaochuan Niu, Ahmed H. Abdelaziz, Saurabh Adya, Ahmed Tewfik

    Abstract: Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection (DDSD) from the follow-up queries is critical for enabling naturalistic user experience. To this end, we explore the notion of Large Language Models (LLMs) and mode… ▽ More

    Submitted 4 November, 2024; v1 submitted 28 October, 2024; originally announced November 2024.

  50. arXiv:2410.20487  [pdf, other

    cs.LG cs.AI

    Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

    Authors: Kaiyan Zhao, Yiming Wang, Yuyang Chen, Yan Li, Leong Hou U, Xiaoguang Niu

    Abstract: Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity… ▽ More

    Submitted 18 May, 2025; v1 submitted 27 October, 2024; originally announced October 2024.