这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 238 results for author: Yang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.13264  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Voxtral

    Authors: Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, Abhinav Rastogi, Adam Yang, Albert Q. Jiang, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Anmol Agarwal, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout , et al. (81 additional authors not shown)

    Abstract: We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enab… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 17 pages

  2. arXiv:2507.11042  [pdf, ps, other

    cs.IR

    Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment

    Authors: Adam Yang, Gustavo Penha, Enrico Palumbo, Hugues Bouchard

    Abstract: With the breakthroughs in large language models (LLMs), query generation techniques that expand documents and queries with related terms are becoming increasingly popular in the information retrieval field. Such techniques have been shown to improve the effectiveness of traditional lexical retrieval methods by dealing with the vocabulary mismatch problem. Recent work has found that generating quer… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  3. arXiv:2507.08365  [pdf, ps, other

    cs.LG

    Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer

    Authors: Francesco De Cristofaro, Felix Hofbaur, Aixi Yang, Arno Eichberger

    Abstract: Lane changes of preceding vehicles have a great impact on the motion planning of automated vehicles especially in complex traffic situations. Predicting them would benefit the public in terms of safety and efficiency. While many research efforts have been made in this direction, few concentrated on predicting maneuvers within a set time interval compared to predicting at a set prediction time. In… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: 14 pages, 18 figures

  4. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  5. arXiv:2507.04190  [pdf, ps, other

    cs.CV eess.IV

    Towards Spatially-Varying Gain and Binning

    Authors: Anqi Yang, Eunhee Kang, Wei Chen, Hyong-Euk Lee, Aswin C. Sankaranarayanan

    Abstract: Pixels in image sensors have progressively become smaller, driven by the goal of producing higher-resolution imagery. However, ceteris paribus, a smaller pixel accumulates less light, making image quality worse. This interplay of resolution, noise, and the dynamic range of the sensor and their impact on the eventual quality of acquired imagery is a fundamental concept in photography. In this paper… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  6. arXiv:2507.03328  [pdf, ps, other

    cs.SE

    scikit-package -- software packaging standards and roadmap for sharing reproducible scientific software

    Authors: S. Lee, C. Myers, A. Yang, T. Zhang, S. J. L. Billinge

    Abstract: Scientific advancement relies on the ability to share and reproduce results. When data analysis or calculations are carried out using software written by scientists there are special challenges around code versions, quality and code sharing. scikit-package provides a roadmap to facilitate code reuse and sharing with minimal effort through tutorials coupled with automated and centralized reusable w… ▽ More

    Submitted 8 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: GitHub: https://github.com/scikit-package/scikit-package Doc: https://scikit-package.github.io/scikit-package/

  7. arXiv:2507.01206  [pdf, ps, other

    cs.RO cs.HC

    2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration

    Authors: Kathy Zhuang, Zixun Huang, Yukun Song, Rui Li, Yinuo Zhou, Allen Y. Yang

    Abstract: As modern computing advances, new interaction paradigms have emerged, particularly in Augmented Reality (AR), which overlays virtual interfaces onto physical objects. This evolution poses challenges in machine perception, especially for tasks like 3D object pose estimation in complex, dynamic environments. Our project addresses critical issues in human-robot interaction within mobile AR, focusing… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2506.18677  [pdf, ps, other

    cs.CV

    Reconstructing Tornadoes in 3D with Gaussian Splatting

    Authors: Adam Yang, Nadula Kadawedduwa, Tianfu Wang, Maria Molina, Christopher Metzler

    Abstract: Accurately reconstructing the 3D structure of tornadoes is critically important for understanding and preparing for this highly destructive weather phenomenon. While modern 3D scene reconstruction techniques, such as 3D Gaussian splatting (3DGS), could provide a valuable tool for reconstructing the 3D structure of tornados, at present we are critically lacking a controlled tornado dataset with whi… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  9. arXiv:2506.18221  [pdf, ps, other

    cs.LG cs.AI stat.ML

    These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining

    Authors: Xingyu Alice Yang, Jianyu Zhang, Léon Bottou

    Abstract: Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data. However, a significant challenge remains in ensuring that transferred features are sufficient to handle unseen datasets, amplified by the difficulty of quantifying whether two tasks are "related". To address these challenges, we evalua… ▽ More

    Submitted 26 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 10 pages, 7 figures, Preprint. Under review

  10. arXiv:2506.16055  [pdf, ps, other

    cs.CL cs.FL

    Knee-Deep in C-RASP: A Transformer Depth Hierarchy

    Authors: Andy Yang, Michaël Cadilhac, David Chiang

    Abstract: It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained with greater depth? We answer this question with a theoretical proof followed by an empirical study. First, we consider transformers that round to fixed precision except inside attention. We show that this subclass of transformers is e… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 27 pages, 4 figures

  11. arXiv:2506.10910  [pdf, ps, other

    cs.CL

    Magistral

    Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

    Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  12. arXiv:2506.05207  [pdf, ps, other

    cs.CV

    Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

    Authors: Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

    Abstract: Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to larg… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: project page: https://follow-your-motion.github.io/

  13. arXiv:2506.05176  [pdf, ps, other

    cs.CL

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Authors: Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou

    Abstract: In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training… ▽ More

    Submitted 10 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  14. arXiv:2506.04590  [pdf, ps, other

    cs.CV

    Follow-Your-Creation: Empowering 4D Creation through Video Inpainting

    Authors: Yue Ma, Kunyu Feng, Xinhua Zhang, Hongyu Liu, David Junhao Zhang, Jinbo Xing, Yinhan Zhang, Ayden Yang, Zeyu Wang, Qifeng Chen

    Abstract: We introduce Follow-Your-Creation, a novel 4D video creation framework capable of both generating and editing 4D content from a single monocular video input. By leveraging a powerful video inpainting foundation model as a generative prior, we reformulate 4D video creation as a video inpainting task, enabling the model to fill in missing content caused by camera trajectory changes or user edits. To… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Project Page: https://follow-your-creation.github.io/

  15. arXiv:2506.03686  [pdf, ps, other

    cs.DS cs.DC cs.DM

    GenTT: Generate Vectorized Codes for General Tensor Permutation

    Authors: Yaojian Chen, Tianyu Ma, An Yang, Lin Gan, Wenlai Zhao, Guangwen Yang

    Abstract: Tensor permutation is a fundamental operation widely applied in AI, tensor networks, and related fields. However, it is extremely complex, and different shapes and permutation maps can make a huge difference. SIMD permutation began to be studied in 2006, but the best method at that time was to split complex permutations into multiple simple permutations to do SIMD, which might increase the complex… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 11 pages, 9 figures

    ACM Class: I.2.2

  16. arXiv:2506.01939  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Authors: Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well understood. In this work, we undertake a pioneering exploration of RLVR through the novel perspective of token entropy patterns, comprehensively analyzing how different tokens influence reasoning perf… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 25 pages, 17 figures, 2 tables

  17. arXiv:2505.24147  [pdf, other

    cs.CL

    Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability

    Authors: Chiwei Zhu, Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Zhendong Mao

    Abstract: Training language models with rationales augmentation has been shown to be beneficial in many existing works. In this paper, we identify that such a prevailing view does not hold consistently. We conduct comprehensive investigations to thoroughly inspect the impact of rationales on model performance as well as a novel perspective of model reliability. The results lead to several key findings that… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: To be published in ACL 2025 Findings. (Work originally done in Jan 2024)

  18. arXiv:2505.13008  [pdf, ps, other

    cs.SE

    Adversarial Reasoning for Repair Based on Inferred Program Intent

    Authors: He Ye, Aidan Z. H. Yang, Chang Hu, Yanlin Wang, Tao Zhang, Claire Le Goues

    Abstract: Automated program repair (APR) has shown promising results, particularly with the use of neural networks. Currently, most APR tools focus on code transformations specified by test suites, rather than reasoning about the program intent and the high-level bug specification. Without a proper understanding of program intent, these tools tend to generate patches that overfit incomplete test suites and… ▽ More

    Submitted 20 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  19. arXiv:2505.10527  [pdf, other

    cs.CL

    WorldPM: Scaling Human Preference Modeling

    Authors: Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin

    Abstract: Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from pub… ▽ More

    Submitted 18 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  20. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  21. arXiv:2505.00917  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Multivariate Conformal Selection

    Authors: Tian Bai, Yue Zhao, Xiang Yu, Archer Y. Yang

    Abstract: Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 25 pages, 4 figures. Accepted to ICML 2025

  22. arXiv:2504.19524  [pdf, other

    cs.CV

    LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning

    Authors: Peijian Zeng, Feiyan Pang, Zhanbo Wang, Aimin Yang

    Abstract: Industrial Anomaly Detection (IAD) is critical for ensuring product quality by identifying defects. Traditional methods such as feature embedding and reconstruction-based approaches require large datasets and struggle with scalability. Existing vision-language models (VLMs) and Multimodal Large Language Models (MLLMs) address some limitations but rely on mask annotations, leading to high implement… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 10 pages

  23. OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning

    Authors: Anirudhan Badrinath, Alex Yang, Kousik Rajesh, Prabhat Agarwal, Jaewon Yang, Haoyu Chen, Jiajing Xu, Charles Rosenberg

    Abstract: Representation learning, a task of learning latent vectors to represent entities, is a key task in improving search and recommender systems in web applications. Various representation learning methods have been developed, including graph-based approaches for relationships among entities, sequence-based methods for capturing the temporal evolution of user activities, and content-based models for le… ▽ More

    Submitted 11 June, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: To appear in Proceedings of KDD 2025 Industry Track

  24. arXiv:2504.15279  [pdf, other

    cs.CV

    VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

    Authors: Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu

    Abstract: Visual reasoning is a core component of human intelligence and a critical capability for advanced multimodal models. Yet current reasoning evaluations of multimodal large language models (MLLMs) often rely on text descriptions and allow language-based reasoning shortcuts, failing to measure genuine vision-centric reasoning. To address this, we introduce VisuLogic: a benchmark of 1,000 human-verifi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Code, data, and baselines are available at https://visulogic-benchmark.github.io/VisuLogic

  25. arXiv:2504.13857  [pdf

    cs.HC

    Impact of Environmental Colors on Human Aggressiveness: Insights from a Minecraft-Based Behavioral Study

    Authors: Austin Deng-Yao Yang, Shih-Jen Tsai, Hsin-Jung Tsai

    Abstract: This study explores the influence of environmental colors on human behavior, specifically focusing on aggressiveness and passiveness. Color is widely regarded as an influential environmental factor shaping human behavior, yet existing studies present conflicting evidence regarding its impact on aggressiveness and passiveness. This study employed Minecraft as a controlled digital platform to invest… ▽ More

    Submitted 22 March, 2025; originally announced April 2025.

    Comments: Abstract (256 words); Body (3,266 words); 2 Tables; 4 Figures

  26. arXiv:2504.08233  [pdf

    cs.CE

    A 120 lines code for isogeometric topology optimization and its extension to 3D in MATLAB

    Authors: Xianda Xie, Zhihui Ou, Aodi Yang, Xiaobing Li, Shuting Wang

    Abstract: In this paper, a compact and efficient code implementation is presented for isogeometric topology optimization (ITO) approach. With the aid of Bėzier extraction technique, a derived explicit stiffness matrix computation formula is applied to all B-spline IGA elements with rectangular shape under linear elasticity assumption. Using the aforementioned explicit formula, the stiffness matrix calculati… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  27. arXiv:2504.01444  [pdf, ps, other

    cs.CR cs.AI

    PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization

    Authors: Aofan Liu, Lulu Tang, Ting Pan, Yuguo Yin, Bin Wang, Ao Yang

    Abstract: Multimodal Large Language Models (MLLMs), which integrate vision and other modalities into Large Language Models (LLMs), significantly enhance AI capabilities but also introduce new security vulnerabilities. By exploiting the vulnerabilities of the visual modality and the long-tail distribution characteristic of code training data, we present PiCo, a novel jailbreaking framework designed to progre… ▽ More

    Submitted 21 June, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  28. arXiv:2504.01329  [pdf, other

    cs.LG eess.SP

    Flexible and Explainable Graph Analysis for EEG-based Alzheimer's Disease Classification

    Authors: Jing Wang, Jun-En Ding, Feng Liu, Elisa Kallioniemi, Shuqiang Wang, Wen-Xiang Tsai, Albert C. Yang

    Abstract: Alzheimer's Disease is a progressive neurological disorder that is one of the most common forms of dementia. It leads to a decline in memory, reasoning ability, and behavior, especially in older people. The cause of Alzheimer's Disease is still under exploration and there is no all-inclusive theory that can explain the pathologies in each individual patient. Nevertheless, early intervention has be… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  29. arXiv:2504.00072  [pdf, other

    cs.CV

    Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

    Authors: Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol

    Abstract: We address the task of video chaptering, i.e., partitioning a long video timeline into semantic units and generating corresponding chapter titles. While relatively underexplored, automatic chaptering has the potential to enable efficient navigation and content retrieval in long-form videos. In this paper, we achieve strong chaptering performance on hour-long videos by efficiently addressing the pr… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: CVPR 2025 Camera ready. Project page: https://imagine.enpc.fr/~lucas.ventura/chapter-llama/

  30. arXiv:2503.21833  [pdf, other

    cs.CL

    Refining Time Series Anomaly Detectors using Large Language Models

    Authors: Alan Yang, Yulin Chen, Sean Lee, Venus Montes

    Abstract: Time series anomaly detection (TSAD) is of widespread interest across many industries, including finance, healthcare, and manufacturing. Despite the development of numerous automatic methods for detecting anomalies, human oversight remains necessary to review and act upon detected anomalies, as well as verify their accuracy. We study the use of multimodal large language models (LLMs) to partially… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Main content: 4 pages, 1 figure, 1 table

  31. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  32. arXiv:2503.09852  [pdf, other

    cs.MM

    StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation

    Authors: An Yang, Chenyu Liu, Pengcheng Xia, Jun Du

    Abstract: Speech-driven 3D facial animation is challenging due to the diversity in speaking styles and the limited availability of 3D audio-visual data. Speech predominantly dictates the coarse motion trends of the lip region, while specific styles determine the details of lip motion and the overall facial expressions. Prior works lack fine-grained learning in style modeling and do not adequately consider s… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  33. arXiv:2503.06624  [pdf, other

    cs.CV

    Chameleon: On the Scene Diversity and Domain Variety of AI-Generated Videos Detection

    Authors: Meiyu Zeng, Xingming Liao, Canyu Chen, Nankai Lin, Zhuowei Wang, Chong Chen, Aimin Yang

    Abstract: Artificial intelligence generated content (AIGC), known as DeepFakes, has emerged as a growing concern because it is being utilized as a tool for spreading disinformation. While much research exists on identifying AI-generated text and images, research on detecting AI-generated videos is limited. Existing datasets for AI-generated videos detection exhibit limitations in terms of diversity, complex… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 17 pages

  34. arXiv:2502.19915   

    cs.AI

    LLM-driven Effective Knowledge Tracing by Integrating Dual-channel Difficulty

    Authors: Jiahui Cen, Jianghao Lin, Weixuan Zhong, Dong Zhou, Jin Chen, Aimin Yang, Yongmei Zhou

    Abstract: Knowledge Tracing (KT) is a fundamental technology in intelligent tutoring systems used to simulate changes in students' knowledge state during learning, track personalized knowledge mastery, and predict performance. However, current KT models face three major challenges: (1) When encountering new questions, models face cold-start problems due to sparse interaction records, making precise modeling… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: During a careful review of our base-experiment results, we discovered a possible error in the way some data were recorded. To ensure the integrity and accuracy of our work, we must correct these results and revise the corresponding analysis before making the manuscript publicly available

  35. arXiv:2502.11633  [pdf, other

    cs.CL

    CLASS: Enhancing Cross-Modal Text-Molecule Retrieval Performance and Training Efficiency

    Authors: Hongyan Wu, Peijian Zeng, Weixiong Zheng, Lianxi Wang, Nankai Lin, Shengyi Jiang, Aimin Yang

    Abstract: Cross-modal text-molecule retrieval task bridges molecule structures and natural language descriptions. Existing methods predominantly focus on aligning text modality and molecule modality, yet they overlook adaptively adjusting the learning states at different training stages and enhancing training efficiency. To tackle these challenges, this paper proposes a Curriculum Learning-bAsed croSS-modal… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 12 pages

  36. arXiv:2502.10712  [pdf, other

    cs.LG cs.AI

    FuncGenFoil: Airfoil Generation and Editing Model in Function Space

    Authors: Jinouwen Zhang, Junjie Ren, Aobo Yang, Yan Lu, Lu Chen, Hairun Xie, Jing Wang, Miao Zhang, Wanli Ouyang, Shixiang Tang

    Abstract: Aircraft manufacturing is the jewel in the crown of industry, in which generating high-fidelity airfoil geometries with controllable and editable representations remains a fundamental challenge. Existing deep learning methods, which typically rely on predefined parametric representations (e.g., Bézier) or discrete point sets, face an inherent trade-off between expressive power and resolution adapt… ▽ More

    Submitted 23 May, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

  37. arXiv:2501.15383  [pdf, other

    cs.CL

    Qwen2.5-1M Technical Report

    Authors: An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He, Weijia Xu, Wenbiao Yin, Wenyuan Yu, Xiafei Qiu, Xingzhang Ren, Xinlong Yang , et al. (3 additional authors not shown)

    Abstract: We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  38. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  39. arXiv:2501.12162  [pdf, other

    cs.CL cs.AI cs.DC cs.LG

    AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding

    Authors: Zikun Li, Zhuofu Chen, Remi Delacourt, Gabriele Oliaro, Zeyu Wang, Qinghan Chen, Shuhuai Lin, April Yang, Zhihao Zhang, Zhuoming Chen, Sean Lai, Xinhao Cheng, Xupeng Miao, Zhihao Jia

    Abstract: Modern large language model (LLM) applications exhibit diverse service-level objectives (SLOs), from low-latency requirements in interactive coding assistants to more relaxed constraints in data wrangling tasks. Existing LLM serving systems, which rely on uniform batching and scheduling strategies, often fail to meet these heterogeneous SLOs concurrently. We present AdaServe, the first LLM serving… ▽ More

    Submitted 17 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  40. arXiv:2501.01595  [pdf

    cs.CV

    Adaptive Homophily Clustering: Structure Homophily Graph Learning with Adaptive Filter for Hyperspectral Image

    Authors: Yao Ding, Weijie Kang, Aitao Yang, Zhili Zhang, Junyang Zhao, Jie Feng, Danfeng Hong, Qinhe Zheng

    Abstract: Hyperspectral image (HSI) clustering has been a fundamental but challenging task with zero training labels. Currently, some deep graph clustering methods have been successfully explored for HSI due to their outstanding performance in effective spatial structural information encoding. Nevertheless, insufficient structural information utilization, poor feature presentation ability, and weak graph up… ▽ More

    Submitted 7 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: 14 pages, 8 figure

  41. arXiv:2501.01257  [pdf, other

    cs.CL

    CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

    Authors: Shanghaoran Quan, Jiaxi Yang, Bowen Yu, Bo Zheng, Dayiheng Liu, An Yang, Xuancheng Ren, Bofei Gao, Yibo Miao, Yunlong Feng, Zekun Wang, Jian Yang, Zeyu Cui, Yang Fan, Yichang Zhang, Binyuan Hui, Junyang Lin

    Abstract: With the increasing code reasoning capabilities of existing large language models (LLMs) and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to develop more challenging and comprehensive benchmarks that effectively test their sophisticated competition-level coding abilities. Existing benchmarks, like LiveCodeBench and USACO, fall short due to the unavailability of… ▽ More

    Submitted 3 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

  42. arXiv:2501.00726  [pdf, other

    math.OC cs.LG

    Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

    Authors: Xianchao Xiu, Anning Yang, Chenyi Huang, Xinrong Li, Wanquan Liu

    Abstract: Unsupervised feature selection (UFS) is widely applied in machine learning and pattern recognition. However, most of the existing methods only consider a single sparsity, which makes it difficult to select valuable and discriminative feature subsets from the original high-dimensional feature set. In this paper, we propose a new UFS method called DSCOFS via embedding double sparsity constrained opt… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  43. arXiv:2412.15115  [pdf, other

    cs.CL

    Qwen2.5 Technical Report

    Authors: Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu , et al. (19 additional authors not shown)

    Abstract: In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This pr… ▽ More

    Submitted 2 January, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  44. arXiv:2412.12621  [pdf, other

    cs.CL

    Jailbreaking? One Step Is Enough!

    Authors: Weixiong Zheng, Peijian Zeng, Yiwei Li, Hongyan Wu, Nankai Lin, Junhao Chen, Aimin Yang, Yongmei Zhou

    Abstract: Large language models (LLMs) excel in various tasks but remain vulnerable to jailbreak attacks, where adversaries manipulate prompts to generate harmful outputs. Examining jailbreak prompts helps uncover the shortcomings of LLMs. However, current jailbreak methods and the target model's defenses are engaged in an independent and adversarial process, resulting in the need for frequent attack iterat… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 17 pages

  45. arXiv:2412.10535  [pdf, other

    cs.CL cs.AI

    On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models

    Authors: April Yang, Jordan Tab, Parth Shah, Paul Kotchavong

    Abstract: The increasing reliance on large language models (LLMs) for diverse applications necessitates a thorough understanding of their robustness to adversarial perturbations and out-of-distribution (OOD) inputs. In this study, we investigate the correlation between adversarial robustness and OOD robustness in LLMs, addressing a critical gap in robustness evaluation. By applying methods originally design… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

  46. arXiv:2412.09925  [pdf, ps, other

    cs.LG cs.CL cs.FL

    Simulating Hard Attention Using Soft Attention

    Authors: Andy Yang, Lena Strobl, David Chiang, Dana Angluin

    Abstract: We study conditions under which transformers using soft attention can simulate hard attention, that is, effectively focus all attention on a subset of positions. First, we examine several subclasses of languages recognized by hard-attention transformers, which can be defined in variants of linear temporal logic. We demonstrate how soft-attention transformers can compute formulas of these logics us… ▽ More

    Submitted 26 June, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 19 pages

  47. arXiv:2411.13855  [pdf, other

    eess.IV cs.CV cs.LG

    A Multimodal Approach to The Detection and Classification of Skin Diseases

    Authors: Allen Yang, Edward Yang

    Abstract: According to PBS, nearly one-third of Americans lack access to primary care services, and another forty percent delay going to avoid medical costs. As a result, many diseases are left undiagnosed and untreated, even if the disease shows many physical symptoms on the skin. With the rise of AI, self-diagnosis and improved disease recognition have become more promising than ever; in spite of that, ex… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  48. arXiv:2411.01783  [pdf, other

    cs.DC cs.AI cs.LG

    Context Parallelism for Scalable Million-Token Inference

    Authors: Amy Yang, Jingyi Yang, Aya Ibrahim, Xinfeng Xie, Bangsheng Tang, Grigory Sizov, Jeremy Reizenstein, Jongsoo Park, Jianyu Huang

    Abstract: We present context parallelism for long-context large language model inference, which achieves near-linear scaling for long-context prefill latency with up to 128 H100 GPUs across 16 nodes. Particularly, our method achieves 1M context prefill with Llama3 405B model in 77s (93% parallelization efficiency, 63% FLOPS utilization) and 128K context prefill in 3.8s. We develop two lossless exact ring at… ▽ More

    Submitted 20 April, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

  49. arXiv:2411.01471  [pdf, other

    cs.CR

    A Practical and Privacy-Preserving Framework for Real-World Large Language Model Services

    Authors: Yu Mao, Xueping Liao, Wei Liu, Anjia Yang

    Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in text understanding and generation, and they are increasingly being utilized across various domains to enhance productivity. However, due to the high costs of training and maintaining these models, coupled with the fact that some LLMs are proprietary, individuals often rely on online AI as a Service (AIaaS) provided by LLM c… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  50. arXiv:2410.23933  [pdf, other

    cs.CL

    Language Models can Self-Lengthen to Generate Long Texts

    Authors: Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to process long contexts, yet a notable gap remains in generating long, aligned outputs. This limitation stems from a training gap where pre-training lacks effective instructions for long-text generation, and post-training data primarily consists of short query-response pairs. Current approaches, such as… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.