这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 152 results for author: Zhai, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.15613  [pdf, ps, other

    cs.CR cs.AI

    Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

    Authors: Andrii Balashov, Olena Ponomarova, Xiaohua Zhai

    Abstract: Large Language Models (LLMs) deployed in enterprise settings (e.g., as Microsoft 365 Copilot) face novel security challenges. One critical threat is prompt inference attacks: adversaries chain together seemingly benign prompts to gradually extract confidential data. In this paper, we present a comprehensive study of multi-stage prompt inference attacks in an enterprise LLM context. We simulate rea… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 26 pages

  2. arXiv:2507.09556  [pdf, ps, other

    cs.CV

    SeqCSIST: Sequential Closely-Spaced Infrared Small Target Unmixing

    Authors: Ximeng Zhai, Bohan Xu, Yaohong Chen, Hao Wang, Kehua Guo, Yimian Dai

    Abstract: Due to the limitation of the optical lens focal length and the resolution of the infrared detector, distant Closely-Spaced Infrared Small Target (CSIST) groups typically appear as mixing spots in the infrared image. In this paper, we propose a novel task, Sequential CSIST Unmixing, namely detecting all targets in the form of sub-pixel localization from a highly dense CSIST group. However, achievin… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

    Comments: Accepted by TGRS

  3. arXiv:2506.22510  [pdf, ps, other

    cs.CL cs.AI

    Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning

    Authors: Zihao Zhao, Xinlong Zhai, Jinyu Yang, Chuan Shi

    Abstract: Foundation models have achieved great success in natural language processing (NLP) and computer vision (CV). Their success largely stems from the ability to integrate multi-domain knowledge in pre-training and transfer it to target domains. Considering graph data, especially graphs without textual features, is ubiquitous in real-world applications such as social networks and recommendation systems… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures

  4. arXiv:2505.19266  [pdf, other

    cs.AI cs.CY

    Using Large Language Models to Assess Teachers' Pedagogical Content Knowledge

    Authors: Yaxuan Yang, Shiyu Wang, Xiaoming Zhai

    Abstract: Assessing teachers' pedagogical content knowledge (PCK) through performance-based tasks is both time and effort-consuming. While large language models (LLMs) offer new opportunities for efficient automatic scoring, little is known about whether LLMs introduce construct-irrelevant variance (CIV) in ways similar to or different from traditional machine learning (ML) and human raters. This study exam… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  5. arXiv:2505.11356  [pdf, ps, other

    cs.LG

    Fractal Graph Contrastive Learning

    Authors: Nero Z. Li, Xuehao Zhai, Zhichao Shi, Boshen Shi, Xuhui Jiang

    Abstract: While Graph Contrastive Learning (GCL) has attracted considerable attention in the field of graph self-supervised learning, its performance heavily relies on data augmentations that are expected to generate semantically consistent positive pairs. Existing strategies typically resort to random perturbations or local structure preservation, yet lack explicit control over global structural consistenc… ▽ More

    Submitted 22 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  6. arXiv:2505.10643  [pdf, other

    cs.CL cs.AI cs.CY

    Artificial Intelligence Bias on English Language Learners in Automatic Scoring

    Authors: Shuchen Guo, Yun Wang, Jichao Yu, Xuansheng Wu, Bilgehan Ayik, Field M. Watts, Ehsan Latif, Ninghao Liu, Lei Liu, Xiaoming Zhai

    Abstract: This study investigated potential scoring biases and disparities toward English Language Learners (ELLs) when using automatic scoring systems for middle school students' written responses to science assessments. We specifically focus on examining how unbalanced training data with ELLs contributes to scoring bias and disparities. We fine-tuned BERT with four datasets: responses from (1) ELLs, (2) n… ▽ More

    Submitted 19 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  7. arXiv:2505.02863  [pdf

    cs.CY cs.AI

    Understanding University Students' Use of Generative AI: The Roles of Demographics and Personality Traits

    Authors: Newnew Deng, Edward Jiusi Liu, Xiaoming Zhai

    Abstract: The use of generative AI (GAI) among university students is rapidly increasing, yet empirical research on students' GAI use and the factors influencing it remains limited. To address this gap, we surveyed 363 undergraduate and graduate students in the United States, examining their GAI usage and how it relates to demographic variables and personality traits based on the Big Five model (i.e., extra… ▽ More

    Submitted 19 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  8. arXiv:2505.00032  [pdf

    cs.CL cs.AI

    MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis

    Authors: Yuyang Sha, Hongxin Pan, Wei Xu, Weiyu Meng, Gang Luo, Xinyu Du, Xiaobing Zhai, Henry H. Y. Tong, Caijuan Shi, Kefeng Li

    Abstract: Major depressive disorder (MDD) impacts more than 300 million people worldwide, highlighting a significant public health issue. However, the uneven distribution of medical resources and the complexity of diagnostic methods have resulted in inadequate attention to this disorder in numerous countries and regions. This paper introduces a high-performance MDD diagnosis tool named MDD-LLM, an AI-driven… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  9. arXiv:2504.14772  [pdf, other

    cs.CL cs.LG stat.ML

    Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

    Authors: Luyang Fang, Xiaowei Yu, Jiazhang Cai, Yongkai Chen, Shushan Wu, Zhengliang Liu, Zhenyuan Yang, Haoran Lu, Xilin Gong, Yufang Liu, Terry Ma, Wei Ruan, Ali Abbasi, Jing Zhang, Tao Wang, Ehsan Latif, Wei Liu, Wei Zhang, Soheil Kolouri, Xiaoming Zhai, Dajiang Zhu, Wenxuan Zhong, Tianming Liu, Ping Ma

    Abstract: The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and lingui… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  10. arXiv:2504.10281  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cs.AI cs.CV cs.LG

    Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials

    Authors: Jingyun Yang, Ruoyan Avery Yin, Chi Jiang, Yuepeng Hu, Xiaokai Zhu, Xingjian Hu, Sutharsika Kumar, Xiao Wang, Xiaohua Zhai, Keran Rong, Yunyue Zhu, Tianyi Zhang, Zongyou Yin, Jing Kong, Neil Zhenqiang Gong, Zhichu Ren, Haozhe Wang

    Abstract: Characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training. Even for trained human operators, accurate and reliable characterization remains challenging when examining newly discovered materials such as two-dimensional (2D) structures. This bottleneck drives demand for fully autonomous experimentation systems capable of comprehendin… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 13 pages, 4 figures

  11. arXiv:2503.21802  [pdf

    stat.AP cs.LG stat.ML

    Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis

    Authors: Jingyao Sun, Qilu Zhang, Di Ma, Tianyu Jia, Shijie Jia, Xiaoxue Zhai, Ruimou Xie, Ping-Ju Lin, Zhibin Li, Yu Pan, Linhong Ji, Chong Li

    Abstract: Multivariate cortico-muscular analysis has recently emerged as a promising approach for evaluating the corticospinal neural pathway. However, current multivariate approaches encounter challenges such as high dimensionality and limited sample sizes, thus restricting their further applications. In this paper, we propose a structured and sparse partial least squares coherence algorithm (ssPLSC) to ex… ▽ More

    Submitted 14 June, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  12. arXiv:2503.19889  [pdf

    cond-mat.mtrl-sci cs.RO

    A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design

    Authors: Jie Tian, Martin Taylor Sobczak, Dhanush Patil, Jixin Hou, Lin Pang, Arunachalam Ramanathan, Libin Yang, Xianyan Chen, Yuval Golan, Xiaoming Zhai, Hongyue Sun, Kenan Song, Xianqiao Wang

    Abstract: Metamaterials, renowned for their exceptional mechanical, electromagnetic, and thermal properties, hold transformative potential across diverse applications, yet their design remains constrained by labor-intensive trial-and-error methods and limited data interoperability. Here, we introduce CrossMatAgent -- a novel multi-agent framework that synergistically integrates large language models with st… ▽ More

    Submitted 6 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  13. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  14. arXiv:2503.15796  [pdf, other

    cs.LG cs.AI

    Blend the Separated: Mixture of Synergistic Experts for Data-Scarcity Drug-Target Interaction Prediction

    Authors: Xinlong Zhai, Chunchen Wang, Ruijia Wang, Jiazheng Kang, Shujie Li, Boyu Chen, Tengfei Ma, Zikai Zhou, Cheng Yang, Chuan Shi

    Abstract: Drug-target interaction prediction (DTI) is essential in various applications including drug discovery and clinical application. There are two perspectives of input data widely used in DTI prediction: Intrinsic data represents how drugs or targets are constructed, and extrinsic data represents how drugs or targets are related to other biological entities. However, any of the two perspectives of in… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  15. arXiv:2503.11711  [pdf, other

    cs.LG cs.AI

    Privacy-Preserved Automated Scoring using Federated Learning for Educational Research

    Authors: Ehsan Latif, Xiaoming Zhai

    Abstract: Data privacy remains a critical concern in educational research, requiring strict adherence to ethical standards and regulatory protocols. While traditional approaches rely on anonymization and centralized data collection, they often expose raw student data to security vulnerabilities and impose substantial logistical overhead. In this study, we propose a federated learning (FL) framework for auto… ▽ More

    Submitted 8 May, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted to AIED25

  16. arXiv:2503.09774  [pdf, other

    cs.CL

    Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment

    Authors: Luyang Fang, Ehsan Latif, Haoran Lu, Yifan Zhou, Ping Ma, Xiaoming Zhai

    Abstract: Automatic scoring of student responses enhances efficiency in education, but deploying a separate neural network for each task increases storage demands, maintenance efforts, and redundant computations. To address these challenges, this paper introduces the Gromov-Wasserstein Scoring Model Merging (GW-SMM) method, which merges models based on feature distribution similarities measured via the Grom… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Submitted to AIED2025

  17. arXiv:2503.09748  [pdf, other

    cs.CY

    Advancing Education through Tutoring Systems: A Systematic Literature Review

    Authors: Vincent Liu, Ehsan Latif, Xiaoming Zhai

    Abstract: This study systematically reviews the transformative role of Tutoring Systems, encompassing Intelligent Tutoring Systems (ITS) and Robot Tutoring Systems (RTS), in addressing global educational challenges through advanced technologies. As many students struggle with proficiency in core academic areas, Tutoring Systems emerge as promising solutions to bridge learning gaps by delivering personalized… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: A comprehensive study on tutoring systems

  18. arXiv:2502.15576  [pdf, other

    cs.CL

    Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders

    Authors: Xuansheng Wu, Jiayi Yuan, Wenlin Yao, Xiaoming Zhai, Ninghao Liu

    Abstract: Large language models (LLMs) excel at handling human queries, but they can occasionally generate flawed or unexpected responses. Understanding their internal states is crucial for understanding their successes, diagnosing their failures, and refining their capabilities. Although sparse autoencoders (SAEs) have shown promise for interpreting LLM internal representations, limited research has explor… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: Pre-print. 20 pages, 5 figures

  19. arXiv:2502.14786  [pdf, other

    cs.CV cs.AI

    SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Authors: Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, Xiaohua Zhai

    Abstract: We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training objective with several prior, independently developed techniques into a unified recipe -- this includes captioning-based pretraining, self-supervised losses (self-distillation, masked prediction) and onli… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: Model checkpoints are available at https://github.com/google-research/big_vision/tree/main/big_vision/configs/proj/image_text/README_siglip2.md

  20. arXiv:2502.14133  [pdf, ps, other

    cs.CL

    Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification

    Authors: Xuansheng Wu, Wenhao Yu, Xiaoming Zhai, Ninghao Liu

    Abstract: Modern text classification methods heavily rely on contextual embeddings from large language models (LLMs). Compared to human-engineered features, these embeddings provide automatic and effective representations for classification model training. However, they also introduce a challenge: we lose the ability to manually remove unintended features, such as sensitive or task-irrelevant features, to g… ▽ More

    Submitted 15 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by SIGKDD 2025

  21. arXiv:2502.07617  [pdf, other

    cs.CV

    Scaling Pre-training to One Hundred Billion Data for Vision Language Models

    Authors: Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai

    Abstract: We provide an empirical investigation of the potential of pre-training vision-language models on an unprecedented scale: 100 billion examples. We find that model performance tends to saturate at this scale on many common Western-centric classification and retrieval benchmarks, such as COCO Captions. Nevertheless, tasks of cultural diversity achieve more substantial gains from the 100-billion scale… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  22. arXiv:2502.07503  [pdf, other

    cs.AI cs.LG

    Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems

    Authors: Ibrahim Alabdulmohsin, Xiaohua Zhai

    Abstract: Inspired by recent findings on the fractal geometry of language, we introduce Recursive INference Scaling (RINS) as a complementary, plug-in recipe for scaling inference time in language and multimodal systems. RINS is a particular form of recursive depth that significantly outperforms +55 other variants, including the recent "repeat-all-over" (RAO) strategy in Mobile LLM (Liu et al., 2024) and la… ▽ More

    Submitted 8 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  23. arXiv:2501.16450  [pdf, other

    cs.IR cs.AI

    360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation

    Authors: Hamed Firooz, Maziar Sanjabi, Adrian Englhardt, Aman Gupta, Ben Levine, Dre Olgiati, Gungor Polatkan, Iuliia Melnychuk, Karthik Ramgopal, Kirill Talanine, Kutta Srinivasan, Luke Simon, Natesh Sivasubramoniapillai, Necip Fazil Ayan, Qingquan Song, Samira Sriram, Souvik Ghosh, Tao Song, Tejas Dharamsi, Vignesh Kothapalli, Xiaoling Zhai, Ya Xu, Yu Wang, Yun Dai

    Abstract: Ranking and recommendation systems are the foundation for numerous online experiences, ranging from search results to personalized content delivery. These systems have evolved into complex, multilayered architectures that leverage vast datasets and often incorporate thousands of predictive models. The maintenance and enhancement of these models is a labor intensive process that requires extensive… ▽ More

    Submitted 7 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  24. arXiv:2501.06704  [pdf, other

    cs.AI cs.CL

    Fine-tuning ChatGPT for Automatic Scoring of Written Scientific Explanations in Chinese

    Authors: Jie Yang, Ehsan Latif, Yuze He, Xiaoming Zhai

    Abstract: The development of explanations for scientific phenomena is essential in science assessment, but scoring student-written explanations remains challenging and resource-intensive. Large language models (LLMs) have shown promise in addressing this issue, particularly in alphabetic languages like English. However, their applicability to logographic languages is less explored. This study investigates t… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

  25. arXiv:2501.05239  [pdf, other

    cs.CR cs.CV eess.SP

    Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception

    Authors: Wenhao Liao, Sineng Yan, Youqian Zhang, Xinwei Zhai, Yuanyuan Wang, Eugene Yujun Fu

    Abstract: Autonomous vehicles rely on camera-based perception systems to comprehend their driving environment and make crucial decisions, thereby ensuring vehicles to steer safely. However, a significant threat known as Electromagnetic Signal Injection Attacks (ESIA) can distort the images captured by these cameras, leading to incorrect AI decisions and potentially compromising the safety of autonomous vehi… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: To appear in AAAI 2025

  26. arXiv:2501.00081  [pdf, other

    cs.HC cs.CY

    Human-Centered Design for AI-based Automatically Generated Assessment Reports: A Systematic Review

    Authors: Ehsan Latif, Ying Chen, Xiaoming Zhai, Yue Yin

    Abstract: This paper provides a comprehensive review of the design and implementation of automatically generated assessment reports (AutoRs) for formative use in K-12 Science, Technology, Engineering, and Mathematics (STEM) classrooms. With the increasing adoption of technology-enhanced assessments, there is a critical need for human-computer interactive tools that efficiently support the interpretation and… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

    Comments: Submitted to ETRD

  27. arXiv:2412.21065  [pdf, ps, other

    cs.CL

    Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring

    Authors: Ehsan Latif, Xiaoming Zhai

    Abstract: The integration of Artificial Intelligence (AI) in education requires scalable and efficient frameworks that balance performance, adaptability, and cost. This paper addresses these needs by proposing a shared backbone model architecture enhanced with lightweight LoRA adapters for task-specific fine-tuning, targeting the automated scoring of student responses across 27 mutually exclusive tasks. By… ▽ More

    Submitted 21 June, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI-iRAISE Workshop

  28. arXiv:2412.05753  [pdf, other

    cs.CY cs.AI

    Can OpenAI o1 outperform humans in higher-order cognitive thinking?

    Authors: Ehsan Latif, Yifan Zhou, Shuchen Guo, Lehong Shi, Yizhu Gao, Matthew Nyaaba, Arne Bewerdorff, Xiantong Yang, Xiaoming Zhai

    Abstract: This study evaluates the performance of OpenAI's o1-preview model in higher-order cognitive domains, including critical thinking, systematic thinking, computational thinking, data literacy, creative thinking, logical reasoning, and scientific reasoning. Using established benchmarks, we compared the o1-preview models's performance to human participants from diverse educational levels. o1-preview ac… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  29. arXiv:2412.04774  [pdf, other

    cs.CL

    Foundation Models for Low-Resource Language Education (Vision Paper)

    Authors: Zhaojun Ding, Zhengliang Liu, Hanqi Jiang, Yizhu Gao, Xiaoming Zhai, Tianming Liu, Ninghao Liu

    Abstract: Recent studies show that large language models (LLMs) are powerful tools for working with natural language, bringing advances in many areas of computational linguistics. However, these models face challenges when applied to low-resource languages due to limited training data and difficulty in understanding cultural nuances. Research is now focusing on multilingual models to improve LLM performance… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  30. arXiv:2412.03555  [pdf, other

    cs.CV

    PaliGemma 2: A Family of Versatile VLMs for Transfer

    Authors: Andreas Steiner, André Susano Pinto, Michael Tschannen, Daniel Keysers, Xiao Wang, Yonatan Bitton, Alexey Gritsenko, Matthias Minderer, Anthony Sherbondy, Shangbang Long, Siyang Qin, Reeve Ingle, Emanuele Bugliarello, Sahar Kazemzadeh, Thomas Mesnard, Ibrahim Alabdulmohsin, Lucas Beyer, Xiaohua Zhai

    Abstract: PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broa… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  31. arXiv:2411.18266  [pdf

    eess.AS cs.AI cs.SD eess.SY

    Wearable intelligent throat enables natural speech in stroke patients with dysarthria

    Authors: Chenyu Tang, Shuo Gao, Cong Li, Wentian Yi, Yuxuan Jin, Xiaoxue Zhai, Sixuan Lei, Hongbei Meng, Zibo Zhang, Muzi Xu, Shengbo Wang, Xuhang Chen, Chenxi Wang, Hongyun Yang, Ningli Wang, Wenyu Wang, Jin Cao, Xiaodong Feng, Peter Smielewski, Yu Pan, Wenhui Song, Martin Birchall, Luigi G. Occhipinti

    Abstract: Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to ena… ▽ More

    Submitted 14 March, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: 5 figures, 45 references

  32. arXiv:2411.15594  [pdf, other

    cs.CL cs.AI

    A Survey on LLM-as-a-Judge

    Authors: Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, Jian Guo

    Abstract: Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models (LLMs) have achieved remarkable success across diverse domains, leading to the emergence of "LLM-as-a-Judge," where LLMs are employed as evaluators for complex tasks. With their ability to process div… ▽ More

    Submitted 9 March, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

    Comments: Project Page: https://awesome-llm-as-a-judge.github.io/

  33. arXiv:2411.14461  [pdf, other

    cs.CL cs.AI cs.CY

    Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

    Authors: Shaochen Xu, Yifan Zhou, Zhengliang Liu, Zihao Wu, Tianyang Zhong, Huaqin Zhao, Yiwei Li, Hanqi Jiang, Yi Pan, Junhao Chen, Jin Lu, Wei Zhang, Tuo Zhang, Lu Zhang, Dajiang Zhu, Xiang Li, Wei Liu, Quanzheng Li, Andrea Sikora, Xiaoming Zhai, Zhen Xiang, Tianming Liu

    Abstract: Artificial Intelligence (AI) has become essential in modern healthcare, with large language models (LLMs) offering promising advances in clinical decision-making. Traditional model-based approaches, including those leveraging in-context demonstrations and those with specialized medical fine-tuning, have demonstrated strong performance in medical language processing but struggle with real-time adap… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

  34. arXiv:2411.11295  [pdf, other

    cs.CL cs.AI

    Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation

    Authors: Peng Shu, Junhao Chen, Zhengliang Liu, Hui Wang, Zihao Wu, Tianyang Zhong, Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Yifan Zhou, Constance Owl, Xiaoming Zhai, Ninghao Liu, Claudio Saunt, Tianming Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable success across a wide range of tasks and domains. However, their performance in low-resource language translation, particularly when translating into these languages, remains underexplored. This gap poses significant challenges, as linguistic barriers hinder the cultural preservation and development of minority communities. To address this… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  35. arXiv:2411.07407  [pdf, other

    cs.CL

    Using Generative AI and Multi-Agents to Provide Automatic Feedback

    Authors: Shuchen Guo, Ehsan Latif, Yifan Zhou, Xuan Huang, Xiaoming Zhai

    Abstract: This study investigates the use of generative AI and multi-agent systems to provide automatic feedback in educational contexts, particularly for student constructed responses in science assessments. The research addresses a key gap in the field by exploring how multi-agent systems, called AutoFeedback, can improve the quality of GenAI-generated feedback, overcoming known issues such as over-praise… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  36. arXiv:2410.21418  [pdf, other

    cs.AI cs.CL

    Large Language Models for Manufacturing

    Authors: Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Zhengliang Liu, Zihao Wu, Peng Shu, Jie Tian, Tianze Yang, Shaochen Xu, Yanjun Lyu, Parker Blenk, Jacob Pence, Jason Rupram, Eliza Banu, Ninghao Liu, Linbing Wang, Wenzhan Song, Xiaoming Zhai, Kenan Song, Dajiang Zhu, Beiwen Li, Xianqiao Wang, Tianming Liu

    Abstract: The rapid advances in Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper provides a comprehensive exploration of the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing, from prod… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  37. arXiv:2410.21287  [pdf, other

    cs.CY cs.AI

    A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education

    Authors: Ehsan Latif, Yifan Zhou, Shuchen Guo, Yizhu Gao, Lehong Shi, Matthew Nayaaba, Gyeonggeon Lee, Liang Zhang, Arne Bewersdorff, Luyang Fang, Xiantong Yang, Huaqin Zhao, Hanqi Jiang, Haoran Lu, Jiaxi Li, Jichao Yu, Weihang You, Zhengliang Liu, Vincent Shung Liu, Hui Wang, Zihao Wu, Jin Lu, Fei Dou, Ping Ma, Ninghao Liu , et al. (2 additional authors not shown)

    Abstract: As artificial intelligence (AI) continues to advance, it demonstrates capabilities comparable to human intelligence, with significant potential to transform education and workforce development. This study evaluates OpenAI o1-preview's ability to perform higher-order cognitive tasks across 14 dimensions, including critical thinking, systems thinking, computational thinking, design thinking, metacog… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: An assessment of OpenAI o1-Preview for Higher Order Thinking in Education

  38. arXiv:2410.14706  [pdf, other

    cs.PL cs.LG

    Transformers are Efficient Compilers, Provably

    Authors: Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon Shaolei Du

    Abstract: Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. To this end, we introduce a representative programmi… ▽ More

    Submitted 24 January, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: 65 pages

    MSC Class: 68T07 (Primary)

  39. arXiv:2410.03018  [pdf, other

    cs.CY cs.AI

    Transforming Teachers' Roles and Agencies in the Era of Generative AI: Perceptions, Acceptance, Knowledge, and Practices

    Authors: Xiaoming Zhai

    Abstract: This paper explores the transformative impact of Generative Artificial Intelligence (GenAI) on teachers' roles and agencies in education, presenting a comprehensive framework that addresses teachers' perceptions, knowledge, acceptance, and practices of GenAI. As GenAI technologies, such as ChatGPT, become increasingly integrated into educational settings, teachers are required to adapt to evolving… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  40. arXiv:2410.01985  [pdf, other

    cs.AI

    Lost-in-Distance: Impact of Contextual Proximity on LLM Performance in Graph Tasks

    Authors: Hamed Firooz, Maziar Sanjabi, Wenlong Jiang, Xiaoling Zhai

    Abstract: Despite significant advancements, Large Language Models (LLMs) exhibit blind spots that impair their ability to retrieve and process relevant contextual data effectively. We demonstrate that LLM performance in graph tasks with complexities beyond the "needle-in-a-haystack" scenario-where solving the problem requires cross-referencing and reasoning across multiple subproblems jointly-is influenced… ▽ More

    Submitted 1 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  41. arXiv:2409.18486  [pdf, ps, other

    cs.CL

    Evaluation of OpenAI o1: Opportunities and Challenges of AGI

    Authors: Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yiheng Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen , et al. (50 additional authors not shown)

    Abstract: This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performan… ▽ More

    Submitted 7 July, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

  42. arXiv:2409.12964  [pdf, ps, other

    cs.IT cs.AI

    OpenRANet: Neuralized Spectrum Access by Joint Subcarrier and Power Allocation with Optimization-based Deep Learning

    Authors: Siya Chen, Chee Wei Tan, Xiangping Zhai, H. Vincent Poor

    Abstract: The next-generation radio access network (RAN), known as Open RAN, is poised to feature an AI-native interface for wireless cellular networks, including emerging satellite-terrestrial systems, making deep learning integral to its operation. In this paper, we address the nonconvex optimization challenge of joint subcarrier and power allocation in Open RAN, with the objective of minimizing the total… ▽ More

    Submitted 10 February, 2025; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the IEEE Transactions on Green Communications and Networking

  43. arXiv:2408.12800  [pdf, other

    cs.MM

    Cap2Sum: Learning to Summarize Videos by Generating Captions

    Authors: Cairong Zhao, Chutian Wang, Zifan Song, Guosheng Hu, Haonan Chen, Xiaofan Zhai

    Abstract: With the rapid growth of video data on the internet, video summarization is becoming a very important AI technology. However, due to the high labelling cost of video summarization, existing studies have to be conducted on small-scale datasets, leading to limited performance and generalization capacity. In this work, we introduce the use of dense video captions as a supervision signal to train vide… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 13 pages, 4 figures

  44. arXiv:2408.05124  [pdf, other

    cs.CR cs.CV

    Modeling Electromagnetic Signal Injection Attacks on Camera-based Smart Systems: Applications and Mitigation

    Authors: Youqian Zhang, Michael Cheung, Chunxi Yang, Xinwei Zhai, Zitong Shen, Xinyu Ji, Eugene Y. Fu, Sze-Yiu Chau, Xiapu Luo

    Abstract: Numerous safety- or security-critical systems depend on cameras to perceive their surroundings, further allowing artificial intelligence (AI) to analyze the captured images to make important decisions. However, a concerning attack vector has emerged, namely, electromagnetic waves, which pose a threat to the integrity of these systems. Such attacks enable attackers to manipulate the images remotely… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures, 4 tables

  45. arXiv:2407.18328  [pdf, ps, other

    cs.CL cs.CY

    Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring

    Authors: Xuansheng Wu, Padmaja Pravin Saraf, Gyeonggeon Lee, Ehsan Latif, Ninghao Liu, Xiaoming Zhai

    Abstract: Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans or if it adheres to the same grading… ▽ More

    Submitted 21 February, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by Technology, Knowledge, and Learning (TKNL)

  46. arXiv:2407.11983  [pdf, other

    cs.HC cs.CY

    Generative AI as a Learning Buddy and Teaching Assistant: Pre-service Teachers' Uses and Attitudes

    Authors: Matthew Nyaaba, Lehong Shi, Macharious Nabang, Xiaoming Zhai, Patrick Kyeremeh, Samuel Arthur Ayoberd, Bismark Nyaaba Akanzire

    Abstract: To uncover pre-service teachers' (PSTs') user experience and perceptions of generative artificial intelligence (GenAI) applications, we surveyed 167 Ghana PSTs' specific uses of GenAI as a learning buddy and teaching assistant, and their attitudes towards these applications. Employing exploratory factor analysis (EFA), we identified three key factors shaping PSTs' attitudes towards GenAI: teaching… ▽ More

    Submitted 3 June, 2024; originally announced July 2024.

  47. arXiv:2407.07726  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PaliGemma: A versatile 3B VLM for transfer

    Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

    Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More

    Submitted 10 October, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: v2 adds Appendix H and I and a few citations

  48. arXiv:2407.05389  [pdf, other

    cs.CV cs.AI

    Image-Conditional Diffusion Transformer for Underwater Image Enhancement

    Authors: Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao

    Abstract: Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is ap… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  49. arXiv:2407.02362  [pdf, other

    cs.AR cs.AI cs.LG

    Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

    Authors: Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

    Abstract: Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MAD… ▽ More

    Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  50. arXiv:2407.00503  [pdf, other

    cs.CV

    Toward a Diffusion-Based Generalist for Dense Vision Tasks

    Authors: Yue Fan, Yongqin Xian, Xiaohua Zhai, Alexander Kolesnikov, Muhammad Ferjad Naeem, Bernt Schiele, Federico Tombari

    Abstract: Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring results. In this paper, we explore diffusion-based vision generalists, where we unify different types of dense prediction tasks as conditional image g… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Published at CVPR 2024 as a workshop paper