这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 127 results for author: Lan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.13839  [pdf

    cs.CL cs.HC

    The Expressions of Depression and Anxiety in Chinese Psycho-counseling: Usage of First-person Singular Pronoun and Negative Emotional Words

    Authors: Lizhi Ma, Tong Zhao, Shuai Zhang, Nirui Song, Hongliang He, Anqi Li, Ran Feng, Huachuan Qiu, Jingsong Ma, Zhenzhong Lan

    Abstract: This study explores the relationship between linguistic expressions and psychological states of depression and anxiety within Chinese psycho-counseling interactions, focusing specifically on the usage of first-person singular pronouns and negative emotional words. Utilizing a corpus derived from 735 online counseling sessions, the analysis employed a general linear mixed-effect model to assess lin… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  2. arXiv:2507.10601  [pdf, ps, other

    q-bio.QM cs.CV cs.LG eess.IV stat.ME

    AGFS-Tractometry: A Novel Atlas-Guided Fine-Scale Tractometry Approach for Enhanced Along-Tract Group Statistical Comparison Using Diffusion MRI Tractography

    Authors: Ruixi Zheng, Wei Zhang, Yijie Li, Xi Zhu, Zhou Lan, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Lauren J. O'Donnell, Fan Zhang

    Abstract: Diffusion MRI (dMRI) tractography is currently the only method for in vivo mapping of the brain's white matter (WM) connections. Tractometry is an advanced tractography analysis technique for along-tract profiling to investigate the morphology and microstructural properties along the fiber tracts. Tractometry has become an essential tool for studying local along-tract differences between different… ▽ More

    Submitted 12 July, 2025; originally announced July 2025.

    Comments: 31 pages and 7 figures

  3. arXiv:2507.01663  [pdf, ps, other

    cs.LG cs.AI

    AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

    Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu

    Abstract: Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled w… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2505.14796  [pdf, ps, other

    cs.DC cs.PF

    Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs

    Authors: Melanie Cornelius, Greg Cross, Shilpika Shilpika, Matthew T. Dearing, Zhiling Lan

    Abstract: As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads. This work addresses this challenge by presenting a data co-analysis approach using system data collected from the Polaris supercomputer at Argonne National Laboratory. We focus on GPU utilization and power demands, navigat… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 tables, 14 figures

  5. arXiv:2505.14030  [pdf, ps, other

    cs.RO

    AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory

    Authors: Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, Xuanbing Xie, Rongkui Zhang, Yicheng Zhu, Peihang Li, Tianshuo Yang, Tianxing Chen, Haoyu Gao, Xiaokang Yang, Xuelong Li, Hongyuan Zhang, Yao Mu, Ping Luo

    Abstract: Vision-language-action (VLA) models have shown promise as generalist robotic policies by jointly leveraging visual, linguistic, and proprioceptive modalities to generate action trajectories. While recent benchmarks have advanced VLA research in domestic tasks, professional science-oriented domains remain underexplored. We introduce AutoBio, a simulation framework and benchmark designed to evaluate… ▽ More

    Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  6. arXiv:2505.02184  [pdf, ps, other

    cs.AI cs.DC cs.PL cs.SE

    Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes

    Authors: Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor

    Abstract: While large language models (LLMs) are increasingly used for generating parallel scientific code, most current efforts emphasize functional correctness, often overlooking performance and energy considerations. In this work, we propose LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel code on a target parallel system for a given parallel code as input.… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures

  7. arXiv:2504.20460  [pdf, ps, other

    cs.IT

    Sequence Reconstruction under Channels with Multiple Bursts of Insertions or Deletions

    Authors: Zhaojun Lan, Yubo Sun, Wenjun Yu, Gennian Ge

    Abstract: The sequence reconstruction problem involves a model where a sequence is transmitted over several identical channels. This model investigates the minimum number of channels required for the unique reconstruction of the transmitted sequence. Levenshtein established that this number exceeds the maximum size of the intersection between the error balls of any two distinct transmitted sequences by one.… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  8. arXiv:2504.13059  [pdf, other

    cs.RO cs.AI cs.CL

    RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins

    Authors: Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, Lunkai Lin, Zhiqiang Xie, Mingyu Ding, Ping Luo

    Abstract: In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Highlight. 22 pages. Project page: https://robotwin-benchmark.github.io/

  9. arXiv:2504.10127  [pdf, other

    cs.AI cs.CL cs.CV

    Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

    Authors: Junlei Zhang, Zichen Ding, Chang Ma, Zijie Chen, Qiushi Sun, Zhenzhong Lan, Junxian He

    Abstract: Graphical User Interface (GUI) agents offer cross-platform solutions for automating complex digital tasks, with significant potential to transform productivity workflows. However, their performance is often constrained by the scarcity of high-quality trajectory data. To address this limitation, we propose training Vision Language Models (VLMs) on data-rich, reasoning-intensive tasks during a dedic… ▽ More

    Submitted 15 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: 24 pages, 11 figures

  10. arXiv:2504.07288  [pdf, other

    cs.CL

    MDIT: A Model-free Data Interpolation Method for Diverse Instruction Tuning

    Authors: Yangning Li, Zihua Lan, Lv Qingsong, Yinghui Li, Hai-Tao Zheng

    Abstract: As Large Language Models (LLMs) are increasingly applied across various tasks, instruction tuning has emerged as a critical method for enhancing model performance. However, current data management strategies face substantial challenges in generating diverse and comprehensive data, restricting further improvements in model performance. To address this gap, we propose MDIT, a novel model-free data i… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  11. arXiv:2504.07282  [pdf, ps, other

    cs.CL

    RAISE: Reinforced Adaptive Instruction Selection For Large Language Models

    Authors: Lv Qingsong, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Yinghui Li, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

    Abstract: In the instruction fine-tuning of large language models (LLMs), it is widely recognized that a few high-quality instructions are superior to a large number of low-quality instructions. At present, many instruction selection methods have been proposed, but most of these methods select instruction based on heuristic quality metrics, and only consider data selection before training. These designs lea… ▽ More

    Submitted 30 May, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  12. arXiv:2504.05262  [pdf, other

    cs.CL

    Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models

    Authors: Yang Yan, Yu Lu, Renjun Xu, Zhenzhong Lan

    Abstract: Despite high benchmark scores, Large Language Models (LLMs) often fail simple problem, raising a critical question: Do LLMs learn mathematical principles or merely memorize patterns? Rather than designing increasingly complex benchmarks like recent works, we investigate this using elementary two-integer addition ($0$ to $2^{64}$), probing two core properties: commutativity ($A+B=B+A$) and composit… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  13. arXiv:2503.21836  [pdf, ps, other

    cs.CV

    iMedImage Technical Report

    Authors: Ran Wei, ZhiXiong Lan, Qing Yan, Ning Song, Ming Lv, LongQing Ye

    Abstract: Background: Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging. While AI has shown promise in medical imaging, its effectiveness varies across modalities. Leveraging advances in Foundation Models that integrate multimodal medical imaging for robust feature extraction and accurate diagnosis, we developed iMedImage,… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  14. arXiv:2503.04812  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

    Authors: Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Jinsong Su

    Abstract: Universal multimodal embedding models play a critical role in tasks such as interleaved image-text retrieval, multimodal RAG, and multimodal clustering. However, our empirical results indicate that existing LMM-based embedding models trained with the standard InfoNCE loss exhibit a high degree of overlap in similarity distribution between positive and negative pairs, making it challenging to disti… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Preprint

  15. arXiv:2502.13447  [pdf, ps, other

    cs.CV cs.CL

    Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning

    Authors: Yang Yan, Bingqing Yue, Qiaxuan Li, Man Huang, Jingyu Chen, Zhenzhong Lan

    Abstract: The integration of artificial intelligence in medical imaging has shown tremendous potential, yet the relationship between pre-trained knowledge and performance in cross-modality learning remains unclear. This study investigates how explicitly injecting medical knowledge into the learning process affects the performance of cross-modality classification, focusing on Chest X-ray (CXR) images. We int… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by ICASSP'25

  16. arXiv:2502.11161  [pdf, ps, other

    cs.RO cs.CV

    BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

    Authors: Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

    Abstract: In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat multi-view features equally and directly concatenate them for policy learning. However, it will introduce redundant visual information and bring higher computational costs, leading to ineffective manipulation. For a fine-grained manipulation task, it… ▽ More

    Submitted 28 June, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 8 pages, 4 figures

  17. arXiv:2502.03796  [pdf, other

    cs.DC

    Exploring Uncore Frequency Scaling for Heterogeneous Computing

    Authors: Zhong Zheng, Seyfal Sultanov, Michael E. Papka, Zhiling Lan

    Abstract: High-performance computing (HPC) systems are essential for scientific discovery and engineering innovation. However, their growing power demands pose significant challenges, particularly as systems scale to the exascale level. Prior uncore frequency tuning studies have primarily focused on conventional HPC workloads running on homogeneous systems. As HPC advances toward heterogeneous computing, in… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  18. arXiv:2501.12464  [pdf, other

    cs.DC

    More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing

    Authors: Zhong Zheng, Michael E. Papka, Zhiling Lan

    Abstract: Capability jobs (e.g., large, long-running tasks) and capacity jobs (e.g., small, short-running tasks) are two common types of workloads in high-performance computing (HPC). Different HPC systems are typically deployed to handle distinct computing workloads. For example, Theta at the Argonne Leadership Computing Facility (ALCF) primarily serves capability jobs, while Cori at the National Energy Re… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  19. arXiv:2412.20694  [pdf, other

    cs.NE cs.AI cs.CL

    QUBE: Enhancing Automatic Heuristic Design via Quality-Uncertainty Balanced Evolution

    Authors: Zijie Chen, Zhanchao Zhou, Yu Lu, Renjun Xu, Lili Pan, Zhenzhong Lan

    Abstract: Solving NP-hard problems traditionally relies on heuristics, yet manually designing effective heuristics for complex problems remains a significant challenge. While recent advancements like FunSearch have shown that large language models (LLMs) can be integrated into evolutionary algorithms (EAs) for heuristic design, their potential is hindered by limitations in balancing exploitation and explora… ▽ More

    Submitted 20 February, 2025; v1 submitted 29 December, 2024; originally announced December 2024.

  20. arXiv:2412.19915  [pdf

    q-bio.QM cs.AI

    Identifying Cocoa Pollinators: A Deep Learning Dataset

    Authors: Wenxiu Xu, Saba Ghorbani Bazegar, Dong Sheng, Manuel Toledo-Hernandez, ZhenZhong Lan, Thomas Cherico Wanger

    Abstract: Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower visitor dataset containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and Encyrtidae, and… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    Comments: The manuscript introduces the first cocoa pollination dataset and an example analysis with YOLOv8 models

  21. arXiv:2412.19102  [pdf, other

    cs.CL

    "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

    Authors: Jiawei Yu, Xiang Geng, Yuang Li, Mengxin Ren, Wei Tang, Jiahuan Li, Zhibin Lan, Min Zhang, Hao Yang, Shujian Huang, Jinsong Su

    Abstract: Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing. New named entities appear every day, however, annotating their Spoken NER data is costly. In this paper, we demonstrate that existing Spoken NER systems perform poorly when dealing with previously unseen named entities. To tackle this challenge, we propose a method for… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted by ICASSP 2025

  22. arXiv:2412.00171  [pdf, other

    cs.RO cs.CV

    RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

    Authors: Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Haosheng Li, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

    Abstract: Existing robot policies predominantly adopt the task-centric approach, requiring end-to-end task data collection. This results in limited generalization to new tasks and difficulties in pinpointing errors within long-horizon, multi-stage tasks. To address this, we propose RoboMatrix, a skill-centric hierarchical framework designed for scalable robot task planning and execution in open-world enviro… ▽ More

    Submitted 25 March, 2025; v1 submitted 29 November, 2024; originally announced December 2024.

    Comments: 17 pages, 16 figures

  23. arXiv:2411.06307  [pdf, other

    cs.SD eess.AS

    Acoustic Volume Rendering for Neural Impulse Response Fields

    Authors: Zitong Lan, Chenhao Zheng, Zhiwei Zheng, Mingmin Zhao

    Abstract: Realistic audio synthesis that captures accurate acoustic phenomena is essential for creating immersive experiences in virtual and augmented reality. Synthesizing the sound received at any position relies on the estimation of impulse response (IR), which characterizes how sound propagates in one scene along different paths before arriving at the listener's position. In this paper, we present Acous… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Spotlight

  24. arXiv:2410.19609  [pdf, other

    cs.CL cs.AI

    OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

    Authors: Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu

    Abstract: The rapid development of large language and multimodal models has sparked significant interest in using proprietary models, such as GPT-4o, to develop autonomous agents capable of handling real-world scenarios like web navigation. Although recent open-source efforts have tried to equip agents with the ability to explore environments and continuously improve over time, they are building text-only a… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  25. arXiv:2410.17897  [pdf, ps, other

    cs.CL

    Value Residual Learning

    Authors: Zhanchao Zhou, Tianyi Wu, Zhiyun Jiang, Fares Obeid, Zhenzhong Lan

    Abstract: While Transformer models have achieved remarkable success in various domains, the effectiveness of information propagation through deep networks remains a critical challenge. Standard hidden state residuals often fail to adequately preserve initial token-level information in deeper layers. This paper introduces ResFormer, a novel architecture that enhances information flow by incorporating value r… ▽ More

    Submitted 8 June, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  26. arXiv:2410.14255  [pdf, other

    cs.AI cs.CL

    Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas

    Authors: Xiang Hu, Hongyu Fu, Jinge Wang, Yifeng Wang, Zhikun Li, Renjun Xu, Yu Lu, Yaochu Jin, Lili Pan, Zhenzhong Lan

    Abstract: Scientific innovation is pivotal for humanity, and harnessing large language models (LLMs) to generate research ideas could transform discovery. However, existing LLMs often produce simplistic and repetitive suggestions due to their limited ability in acquiring external knowledge for innovation. To address this problem, we introduce an enhanced planning and search methodology designed to boost the… ▽ More

    Submitted 27 October, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

  27. arXiv:2410.04439  [pdf, other

    cs.CV cs.AI

    Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training

    Authors: Wenbo Li, Guohao Li, Zhibin Lan, Xue Xu, Wanru Zhuang, Jiachen Liu, Xinyan Xiao, Jinsong Su

    Abstract: Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with legible visual texts. Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text, but their development shows promising potential. In this paper, we propose a series of methods, aiming to… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  28. arXiv:2410.02745  [pdf, other

    cs.CV cs.AI cs.CL

    AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity

    Authors: Zhibin Lan, Liqiang Niu, Fandong Meng, Wenbo Li, Jie Zhou, Jinsong Su

    Abstract: Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens. In this work, we introduce AVG-LLaVA, an LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. This approach not only reduces the number of visual tokens and s… ▽ More

    Submitted 4 October, 2024; v1 submitted 20 September, 2024; originally announced October 2024.

    Comments: Preprint

  29. arXiv:2409.20243  [pdf, other

    cs.CL

    PsyGUARD: An Automated System for Suicide Detection and Risk Assessment in Psychological Counseling

    Authors: Huachuan Qiu, Lizhi Ma, Zhenzhong Lan

    Abstract: As awareness of mental health issues grows, online counseling support services are becoming increasingly prevalent worldwide. Detecting whether users express suicidal ideation in text-based counseling services is crucial for identifying and prioritizing at-risk individuals. However, the lack of domain-specific systems to facilitate fine-grained suicide detection and corresponding risk assessment i… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP 2024 main conference

  30. arXiv:2408.15787  [pdf, other

    cs.CL cs.IR

    Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions

    Authors: Huachuan Qiu, Zhenzhong Lan

    Abstract: Virtual counselors powered by large language models (LLMs) aim to create interactive support systems that effectively assist clients struggling with mental health challenges. To replicate counselor-client conversations, researchers have built an online mental health platform that allows professional counselors to provide clients with text-based counseling services for about an hour per session. No… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  31. arXiv:2408.13459  [pdf, other

    cs.CV

    Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

    Authors: Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang

    Abstract: Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the f… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: accepted by ECCV2024

    ACM Class: I.4.4

  32. arXiv:2408.10496  [pdf, other

    cs.CV

    GPT-based Textile Pilling Classification Using 3D Point Cloud Data

    Authors: Yu Lu, YuYu Chen, Gang Zhou, Zhenghua Lan

    Abstract: Textile pilling assessment is critical for textile quality control. We collect thousands of 3D point cloud images in the actual test environment of textiles and organize and label them as TextileNet8 dataset. To the best of our knowledge, it is the first publicly available eight-categories 3D point cloud dataset in the field of textile pilling assessment. Based on PointGPT, the GPT-like big model… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures

  33. arXiv:2407.15083  [pdf, other

    cs.LG

    Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning

    Authors: Yuxuan Jiang, Yujie Yang, Zhiqian Lan, Guojian Zhan, Shengbo Eben Li, Qi Sun, Jian Ma, Tianwen Yu, Changwu Zhang

    Abstract: Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem pose… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Oral

  34. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  35. arXiv:2407.02894  [pdf, other

    cs.CL cs.AI

    Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

    Authors: Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Min Zhang, Jinsong Su

    Abstract: In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this regard, conventional cascaded methods suffer from issues such as error propagation, massive parameters, and difficulties in deployment and retaining visual characteristics of the input image. Thus, constructing end-to-end models has be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Findings

  36. arXiv:2407.01894  [pdf, other

    cs.CV cs.HC

    Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

    Authors: Zixing Li, Chao Yan, Zhen Lan, Xiaojia Xiang, Han Zhou, Jun Lai, Dengqing Tang

    Abstract: Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and ver… ▽ More

    Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages,15 figures

  37. LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes

    Authors: Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor

    Abstract: This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework called LASSI, designed to translate between parallel programming l… ▽ More

    Submitted 4 May, 2025; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 8 pages, 1 figure, 7 tables

    Journal ref: 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), pp. 136-143

  38. arXiv:2406.17287  [pdf, other

    cs.CL cs.AI

    Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

    Authors: Yang Yan, Lizhi Ma, Anqi Li, Jingsong Ma, Zhenzhong Lan

    Abstract: Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play an… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  39. arXiv:2406.15097  [pdf, other

    cs.NI

    Modeling and Analysis of Application Interference on Dragonfly+

    Authors: Yao Kang, Xin Wang, Neil McGlohon, Misbah Mubarak, Sudheer Chunduri, Zhiling Lan

    Abstract: Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system desi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGSIM PADS 2019

  40. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  41. arXiv:2405.18706  [pdf, other

    cs.CV

    FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

    Authors: You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji

    Abstract: The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces sta… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  42. arXiv:2405.12669  [pdf, other

    cs.CL

    A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

    Authors: Huangjun Shen, Liangying Shao, Wenbo Li, Zhibin Lan, Zhanyu Liu, Jinsong Su

    Abstract: In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual modalities as inputs, leveraging visual context to tackle the ambiguities in source texts. In this paper, we begin by offering an exhaustive overview of 99 prior works, comprehensively summarizing representative studies… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  43. arXiv:2405.04909  [pdf, other

    cs.CV cs.AI

    Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

    Authors: Zhengxing Lan, Hongbo Li, Lingshan Liu, Bo Fan, Yisheng Lv, Yilong Ren, Zhiyong Cui

    Abstract: Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explici… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  44. arXiv:2404.14070  [pdf

    cs.HC cs.CY

    No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling

    Authors: Lizhi Ma, Tong Zhao, Huachuan Qiu, Zhenzhong Lan

    Abstract: The pervasive use of AI applications is increasingly influencing our everyday decisions. However, the ethical challenges associated with AI transcend conventional ethics and single-discipline approaches. In this paper, we propose aspirational ethical principles specifically tailored for human-bot psycho-counseling during an era when AI-powered mental health services are continually emerging. We ex… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 54 pages,11 tables, APA style, the tables are presented following Reference

  45. arXiv:2404.13584  [pdf, other

    cs.CV cs.LG

    Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

    Authors: Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo

    Abstract: Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the q… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by CVIU

  46. Union: An Automatic Workload Manager for Accelerating Network Simulation

    Authors: Xin Wang, Misbah Mubarak, Yao Kang, Robert B. Ross, Zhiling Lan

    Abstract: With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In thi… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  47. Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network

    Authors: Yao Kang, Xin Wang, Zhiling Lan

    Abstract: High-radix interconnects such as Dragonfly and its variants rely on adaptive routing to balance network traffic for optimum performance. Ideally, adaptive routing attempts to forward packets between minimal and non-minimal paths with the least congestion. In practice, current adaptive routing algorithms estimate routing path congestion based on local information such as output queue occupancy. Usi… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  48. MRSch: Multi-Resource Scheduling for HPC

    Authors: Boyang Li, Yuping Fan, Matthew Dearing, Zhiling Lan, Paul Richy, William Allcocky, Michael Papka

    Abstract: Emerging workloads in high-performance computing (HPC) are embracing significant changes, such as having diverse resource requirements instead of being CPU-centric. This advancement forces cluster schedulers to consider multiple schedulable resources during decision-making. Existing scheduling studies rely on heuristic or optimization methods, which are limited by an inability to adapt to new scen… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  49. Interpretable Modeling of Deep Reinforcement Learning Driven Scheduling

    Authors: Boyang Li, Zhiling Lan, Michael E. Papka

    Abstract: In the field of high-performance computing (HPC), there has been recent exploration into the use of deep reinforcement learning for cluster scheduling (DRL scheduling), which has demonstrated promising outcomes. However, a significant challenge arises from the lack of interpretability in deep neural networks (DNN), rendering them as black-box models to system managers. This lack of model interpret… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  50. Study of Workload Interference with Intelligent Routing on Dragonfly

    Authors: Yao Kang, Xin Wang, Zhiling Lan

    Abstract: Dragonfly interconnect is a crucial network technology for supercomputers. To support exascale systems, network resources are shared such that links and routers are not dedicated to any node pair. While link utilization is increased, workload performance is often offset by network contention. Recently, intelligent routing built on reinforcement learning demonstrates higher network throughput with… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.