这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 150 results for author: Pan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.08870  [pdf, ps, other

    cs.LG cs.MA

    GUIDE: Towards Scalable Advising for Research Ideas

    Authors: Yaowenqi Liu, BingXu Meng, Rui Pan, Jerry Huang, Tong Zhang

    Abstract: The field of AI research is advancing at an unprecedented pace, enabling automated hypothesis generation and experimental design across diverse domains such as biology, mathematics, and artificial intelligence. Despite these advancements, there remains a significant gap in the availability of scalable advising systems capable of providing high-quality, well-reasoned feedback to refine proposed hyp… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  2. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  3. arXiv:2506.18945  [pdf, ps, other

    cs.LG cs.CL

    Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

    Authors: Zihan Wang, Rui Pan, Jiarui Yao, Robert Csordas, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu

    Abstract: We propose Chain-of-Experts (CoE), a new Mixture-of-Experts (MoE) architecture that introduces sequential expert communication within each layer. Unlike traditional MoE models, where experts operate independently in parallel, CoE processes tokens iteratively across a chain of experts inside a layer. To support dynamic expert selection across iterations, CoE employs a dedicated router at each itera… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  4. arXiv:2506.13888  [pdf, ps, other

    cs.CL cs.CV

    VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training

    Authors: Jipeng Zhang, Kehao Miao, Renjie Pi, Zhaowei Wang, Runtao Liu, Rui Pan, Tong Zhang

    Abstract: Reinforcement Fine-Tuning (RFT) with verifiable rewards has advanced large language models but remains underexplored for Vision-Language (VL) models. The Vision-Language Reward Model (VL-RM) is key to aligning VL models by providing structured feedback, yet training effective VL-RMs faces two major challenges. First, the bootstrapping dilemma arises as high-quality training data depends on already… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  5. arXiv:2506.04616  [pdf

    cs.CL stat.AP stat.ML

    Subjective Perspectives within Learned Representations Predict High-Impact Innovation

    Authors: Likun Cao, Rui Pan, James Evans

    Abstract: Existing studies of innovation emphasize the power of social structures to shape innovation capacity. Emerging machine learning approaches, however, enable us to model innovators' personal perspectives and interpersonal innovation opportunities as a function of their prior trajectories of experience. We theorize then quantify subjective perspectives and innovation opportunities based on innovator… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 107 pages, 20 figures

  6. arXiv:2506.03190  [pdf, other

    cs.CV cs.AI

    MINT: Memory-Infused Prompt Tuning at Test-time for CLIP

    Authors: Jiaming Yi, Ruirui Pan, Jishen Yang, Xiulong Yang

    Abstract: Improving the generalization ability of Vision-Language Pre-trained Models (VLMs) under test-time data distribution shifts remains a critical challenge. The existing Test-Time Adaptation (TTA) methods fall short in fully leveraging the model's internal knowledge, particularly in dynamically adapting to complex and hierarchical visual semantic information. In this paper, we propose Memory-Infused P… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 14 pages, 3 figures

  7. arXiv:2506.01901  [pdf, ps, other

    cs.AI

    Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

    Authors: Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang

    Abstract: Supervised fine-tuning (SFT) on domain-specific data is the dominant approach for adapting foundation models to specialized tasks. However, it has been observed that SFT models tend to forget knowledge acquired during pretraining. In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue. In this work, we demonstrate that the same holds f… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  8. arXiv:2506.00726  [pdf

    cs.CL

    Structured Gradient Guidance for Few-Shot Adaptation in Large Language Models

    Authors: Hongye Zheng, Yichen Wang, Ray Pan, Guiran Liu, Binrong Zhu, Hanlu Zhang

    Abstract: This paper presents a gradient-informed fine-tuning method for large language models under few-shot conditions. The goal is to enhance task adaptability and training stability when data is limited. The method builds on a base loss function and introduces two gradient-related regularization terms. The first enforces gradient direction consistency to guide parameter updates along task-relevant direc… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  9. arXiv:2505.24846  [pdf, ps, other

    cs.AI cs.CL

    MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

    Authors: Jingyan Shen, Jiarui Yao, Rui Yang, Yifan Sun, Feng Luo, Rui Pan, Tong Zhang, Han Zhao

    Abstract: Reward modeling is a key step in building safe foundation models when applying reinforcement learning from human feedback (RLHF) to align Large Language Models (LLMs). However, reward modeling based on the Bradley-Terry (BT) model assumes a global reward function, failing to capture the inherently diverse and heterogeneous human preferences. Hence, such oversimplification limits LLMs from supporti… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  10. arXiv:2505.20452  [pdf, ps, other

    cs.LG

    Active Learning for Multiple Change Point Detection in Non-stationary Time Series with Deep Gaussian Processes

    Authors: Hao Zhao, Rong Pan

    Abstract: Multiple change point (MCP) detection in non-stationary time series is challenging due to the variety of underlying patterns. To address these challenges, we propose a novel algorithm that integrates Active Learning (AL) with Deep Gaussian Processes (DGPs) for robust MCP detection. Our method leverages spectral analysis to identify potential changes and employs AL to strategically select new sampl… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  11. arXiv:2505.20004  [pdf, ps, other

    cs.SE

    Requirements Coverage-Guided Minimization for Natural Language Test Cases

    Authors: Rongqi Pan, Feifei Niu, Lionel C. Briand, Hanyang Hu

    Abstract: As software systems evolve, test suites tend to grow in size and often contain redundant test cases. Such redundancy increases testing effort, time, and cost. Test suite minimization (TSM) aims to eliminate such redundancy while preserving key properties such as requirement coverage and fault detection capability. In this paper, we propose RTM (Requirement coverage-guided Test suite Minimization),… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  12. arXiv:2505.17592  [pdf, ps, other

    astro-ph.IM cs.LG

    AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model

    Authors: Tijmen de Haan, Yuan-Sen Ting, Tirthankar Ghosal, Tuan Dung Nguyen, Alberto Accomazzi, Emily Herron, Vanessa Lama, Rui Pan, Azton Wells, Nesar Ramachandra

    Abstract: General-purpose large language models, despite their broad capabilities, often struggle with specialized domain knowledge, a limitation particularly pronounced in more accessible, lower-parameter versions. This gap hinders their deployment as effective agents in demanding fields such as astronomy. Building on our prior work with AstroSage-8B, this study introduces AstroSage-70B, a significantly la… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  13. arXiv:2504.15427  [pdf, ps, other

    cs.SE

    TVR: Automotive System Requirement Traceability Validation and Recovery Through Retrieval-Augmented Generation

    Authors: Feifei Niu, Rongqi Pan, Lionel C. Briand, Hanyang Hu, Krishna Koravadi

    Abstract: In automotive software development, as well as other domains, traceability between stakeholder requirements and system requirements is crucial to ensure consistency, correctness, and regulatory compliance. However, erroneous or missing traceability relationships often arise due to improper propagation of requirement changes or human errors in requirement mapping, leading to inconsistencies and inc… ▽ More

    Submitted 15 June, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  14. arXiv:2504.07891  [pdf, other

    cs.LG cs.AI

    SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

    Authors: Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali

    Abstract: Recent advances in inference-time compute have significantly improved performance on complex tasks by generating long chains of thought (CoTs) using Large Reasoning Models (LRMs). However, this improved accuracy comes at the cost of high inference latency due to the length of generated reasoning sequences and the autoregressive nature of decoding. Our key insight in tackling these overheads is tha… ▽ More

    Submitted 16 May, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  15. arXiv:2504.07459  [pdf, other

    cs.CL

    Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts

    Authors: Zehan Li, Ruhua Pan, Xinyu Pi

    Abstract: We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: published at the 7th Workshop on Narrative Understanding, NAACL 2025

  16. arXiv:2504.04220  [pdf, other

    cs.SE

    AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation

    Authors: Yueheng Zhu, Chao Liu, Xuan He, Xiaoxue Ren, Zhongxin Liu, Ruwei Pan, Hongyu Zhang

    Abstract: Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating function-level source code based on task descriptions. A typical multi-agent framework consists of Large Language Model (LLM)-based agents that are responsible for task planning, code generation, testing, debugging, e… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  17. arXiv:2504.03122  [pdf, other

    cs.LG stat.ML

    From Observation to Orientation: an Adaptive Integer Programming Approach to Intervention Design

    Authors: Abdelmonem Elrefaey, Rong Pan

    Abstract: Using both observational and experimental data, a causal discovery process can identify the causal relationships between variables. A unique adaptive intervention design paradigm is presented in this work, where causal directed acyclic graphs (DAGs) are for effectively recovered with practical budgetary considerations. In order to choose treatments that optimize information gain under these consid… ▽ More

    Submitted 9 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

  18. arXiv:2503.20762  [pdf, ps, other

    cs.LG math.OC

    ASGO: Adaptive Structured Gradient Optimization

    Authors: Kang An, Yuxing Liu, Rui Pan, Yi Ren, Shiqian Ma, Donald Goldfarb, Tong Zhang

    Abstract: Training deep neural networks is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than by vectors. Under this structural representation, it has been widely observed that gradients are low-rank and Hessians are approximately block-wise diagonal. These structured properties are crucial for designing efficient optimization algorithms,… ▽ More

    Submitted 22 June, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: 30 pages

  19. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  20. arXiv:2503.17682  [pdf, other

    cs.LG cs.AI

    Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

    Authors: Jiaming Ji, Xinyu Chen, Rui Pan, Conghui Zhang, Han Zhu, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Yida Tang, Sirui Han, Yike Guo, Yaodong Yang

    Abstract: Multimodal large language models (MLLMs) are essential for building general-purpose AI assistants; however, they pose increasing safety risks. How can we ensure safety alignment of MLLMs to prevent undesired behaviors? Going further, it is critical to explore how to fine-tune MLLMs to preserve capabilities while meeting safety constraints. Fundamentally, this challenge can be formulated as a min-m… ▽ More

    Submitted 22 May, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

  21. arXiv:2503.12483  [pdf, other

    cs.SE

    Modularization is Better: Effective Code Generation with Modular Prompting

    Authors: Ruwei Pan, Hongyu Zhang

    Abstract: Large Language Models are transforming software development by automatically generating code. Current prompting techniques such as Chain-of-Thought (CoT) suggest tasks step by step and the reasoning process follows a linear structure, which hampers the understanding of complex programming problems, particularly those requiring hierarchical solutions. Inspired by the principle of modularization in… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  22. arXiv:2503.12163  [pdf, other

    cs.SE

    AgentDroid: A Multi-Agent Framework for Detecting Fraudulent Android Applications

    Authors: Ruwei Pan, Hongyu Zhang, Zhonghao Jiang, Ran Hou

    Abstract: With the increasing prevalence of fraudulent Android applications such as fake and malicious applications, it is crucial to detect them with high accuracy and adaptability. This paper introduces AgentDroid, a novel framework for Android fraudulent application detection based on multi-modal analysis and multi-agent systems. AgentDroid overcomes the limitations of traditional detection methods such… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  23. arXiv:2503.03205  [pdf, other

    cs.CL cs.AI

    MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

    Authors: Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, Tong Zhang

    Abstract: Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted the mathematical and computer science communities. State-of-the-art methods utilize a single Large Language Model (LLM) to generate complete proof or perform tree search, but they fail to balance these tasks. We propose **MA-LoT**: *Model-CollAboration Lean-based Long Chain-of-Thought*, a compreh… ▽ More

    Submitted 27 May, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  24. arXiv:2502.12826  [pdf, other

    cs.OS cs.AR

    Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices

    Authors: Yu Liang, Aofeng Shen, Chun Jason Xue, Riwei Pan, Haiyu Mao, Nika Mansouri Ghiasi, Qingcai Jiang, Rakesh Nadig, Lei Li, Rachata Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Growing application memory demands and concurrent usage are making mobile device memory scarce. When memory pressure is high, current mobile systems use a RAM-based compressed swap scheme (called ZRAM) to compress unused execution-related data (called anonymous data in Linux) in main memory. We observe that the state-of-the-art ZRAM scheme prolongs relaunch latency and wastes CPU time because it… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: This is an extended version of a paper that will appear in HPCA 2025

  25. arXiv:2502.05368  [pdf, ps, other

    cs.SE cs.LG

    Otter: Generating Tests from Issues to Validate SWE Patches

    Authors: Toufique Ahmed, Jatin Ganhotra, Rangeet Pan, Avraham Shinnar, Saurabh Sinha, Martin Hirzel

    Abstract: While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. This paper focuses on the scenario where that code patch does not yet exist. Doing so supports two major use-cases. First, it supports TDD (test-driven development), the discipline of "test firs… ▽ More

    Submitted 30 May, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted to the main technical track of the International Conference on Machine Learning (ICML), 2025

  26. arXiv:2502.03460  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

    Authors: Rui Pan, Boyao Wang, Shizhe Diao, Xingyuan Pan, Jipeng Zhang, Renjie Pi, Tong Zhang

    Abstract: Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train the models from scratch, which incurs substantial computational costs, or compress/prune existing large language models (LLMs), which results in performance drops… ▽ More

    Submitted 14 June, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  27. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  28. arXiv:2501.12948  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Authors: DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu , et al. (175 additional authors not shown)

    Abstract: We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  29. arXiv:2501.07811  [pdf, other

    cs.SE

    CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation

    Authors: Ruwei Pan, Hongyu Zhang, Chao Liu

    Abstract: Code generation aims to produce code that fulfills requirements written in natural languages automatically. Large language Models (LLMs) like ChatGPT have demonstrated promising effectiveness in this area. Nonetheless, these LLMs often fail to ensure the syntactic and semantic correctness of the generated code. Recently, researchers proposed multi-agent frameworks that guide LLMs with different pr… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  30. arXiv:2412.20340  [pdf, other

    cs.SE cs.AI

    Distilling Desired Comments for Enhanced Code Review with Large Language Models

    Authors: Yongda Yu, Lei Zhang, Guoping Rong, Haifeng Shen, Jiahao Zhang, Haoxiang Yan, Guohao Shi, Dong Shao, Ruiqi Pan, Yuan Li, Qiushi Wang, Zhao Tian

    Abstract: There has been a growing interest in using Large Language Models (LLMs) for code review thanks to their proven proficiency in code comprehension. The primary objective of most review scenarios is to generate desired review comments (DRCs) that explicitly identify issues to trigger code fixes. However, existing LLM-based solutions are not so effective in generating DRCs for various reasons such as… ▽ More

    Submitted 5 January, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

    Comments: 12 pages, 9 figures

    ACM Class: D.2.3; I.2.7

  31. arXiv:2412.19437  [pdf, other

    cs.CL cs.AI

    DeepSeek-V3 Technical Report

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao , et al. (175 additional authors not shown)

    Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for loa… ▽ More

    Submitted 18 February, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

  32. arXiv:2412.15838  [pdf, other

    cs.AI cs.CL

    Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

    Authors: Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng, Yaodong Yang

    Abstract: Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the instruction-following capabilities of large language models; however, it remains underexplored in the cross-modality domain. As the number of modalities increases, aligning all-modality models with human intentions -- such as instruction following -- becomes a pressing challenge. In this work, we make the first… ▽ More

    Submitted 30 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  33. arXiv:2412.11006  [pdf, other

    cs.LG cs.CL

    Entropy-Regularized Process Reward Model

    Authors: Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang

    Abstract: Large language models (LLMs) have shown promise in performing complex multi-step reasoning, yet they continue to struggle with mathematical reasoning, often making systematic errors. A promising solution is reinforcement learning (RL) guided by reward models, particularly those focusing on process rewards, which score each intermediate step rather than solely evaluating the final outcome. This app… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: Preprint

  34. arXiv:2412.10543  [pdf, ps, other

    cs.LG cs.CL cs.IR

    METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

    Authors: Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang

    Abstract: RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they f… ▽ More

    Submitted 15 July, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 17 pages, 18 figures

  35. arXiv:2412.10488  [pdf, other

    cs.CV cs.AI cs.GR

    SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers

    Authors: Zehao Chen, Rong Pan

    Abstract: Scalable Vector Graphics (SVG) are essential XML-based formats for versatile graphics, offering resolution independence and scalability. Unlike raster images, SVGs use geometric shapes and support interactivity, animation, and manipulation via CSS and JavaScript. Current SVG generation methods face challenges related to high computational costs and complexity. In contrast, human designers use comp… ▽ More

    Submitted 12 March, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Project: https://svgbuilder.github.io

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(3), 2358-2366

  36. Residual Channel Boosts Contrastive Learning for Radio Frequency Fingerprint Identification

    Authors: Rui Pan, Hui Chen, Guanxiong Shen, Hongyang Chen

    Abstract: In order to address the issue of limited data samples for the deployment of pre-trained models in unseen environments, this paper proposes a residual channel-based data augmentation strategy for Radio Frequency Fingerprint Identification (RFFI), coupled with a lightweight SimSiam contrastive learning framework. By applying least square (LS) and minimum mean square error (MMSE) channel estimations… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 5 pages, 4 figures

  37. arXiv:2412.02883  [pdf, other

    cs.SE cs.CL cs.LG

    TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved?

    Authors: Toufique Ahmed, Martin Hirzel, Rangeet Pan, Avraham Shinnar, Saurabh Sinha

    Abstract: Test-driven development (TDD) is the practice of writing tests first and coding later, and the proponents of TDD expound its numerous benefits. For instance, given an issue on a source code repository, tests can clarify the desired behavior among stake-holders before anyone writes code for the agreed-upon fix. Although there has been a lot of work on automated test generation for the practice "wri… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  38. arXiv:2412.01674  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Discovery by Interventions via Integer Programming

    Authors: Abdelmonem Elrefaey, Rong Pan

    Abstract: Causal discovery is essential across various scientific fields to uncover causal structures within data. Traditional methods relying on observational data have limitations due to confounding variables. This paper presents an optimization-based approach using integer programming (IP) to design minimal intervention sets that ensure causal structure identifiability. Our method provides exact and modu… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  39. arXiv:2412.00608  [pdf

    cs.AI

    Leveraging LLM for Automated Ontology Extraction and Knowledge Graph Generation

    Authors: Mohammad Sadeq Abolhasani, Rong Pan

    Abstract: Extracting relevant and structured knowledge from large, complex technical documents within the Reliability and Maintainability (RAM) domain is labor-intensive and prone to errors. Our work addresses this challenge by presenting OntoKGen, a genuine pipeline for ontology extraction and Knowledge Graph (KG) generation. OntoKGen leverages Large Language Models (LLMs) through an interactive user inter… ▽ More

    Submitted 9 December, 2024; v1 submitted 30 November, 2024; originally announced December 2024.

  40. arXiv:2411.19379  [pdf, other

    cs.DC cs.AI cs.LG

    Marconi: Prefix Caching for the Era of Hybrid LLMs

    Authors: Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Yida Wang, Ravi Netravali

    Abstract: Hybrid models that combine the language modeling capabilities of Attention layers with the efficiency of Recurrent layers (e.g., State Space Models) have gained traction in practically supporting long contexts in Large Language Model serving. Yet, the unique properties of these models complicate the usage of complementary efficiency optimizations such as prefix caching that skip redundant computat… ▽ More

    Submitted 10 April, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

    Comments: MLSys 2025 camera-ready version

  41. arXiv:2411.05281  [pdf, other

    cs.CL cs.AI cs.LG

    Fox-1: Open Small Language Model for Cloud and Edge

    Authors: Zijian Hu, Jipeng Zhang, Rui Pan, Zhaozhuo Xu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Dimitris Stripelis, Yuhang Yao, Salman Avestimehr, Tong Zhang, Chaoyang He

    Abstract: We present Fox-1, a series of small language models (SLMs) consisting of Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1. These models are pre-trained on 3 trillion tokens of web-scraped document data and fine-tuned with 5 billion tokens of instruction-following and multi-turn conversation data. Aiming to improve the pre-training efficiency, Fox-1-1.6B model introduces a novel 3-stage data curriculum acro… ▽ More

    Submitted 7 April, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Base model is available at https://huggingface.co/tensoropera/Fox-1-1.6B and the instruction-tuned version is available at https://huggingface.co/tensoropera/Fox-1-1.6B-Instruct-v0.1

  42. arXiv:2410.24117  [pdf, ps, other

    cs.SE cs.LG

    AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation

    Authors: Ali Reza Ibrahimzada, Kaiyao Ke, Mrigank Pawagi, Muhammad Salman Abid, Rangeet Pan, Saurabh Sinha, Reyhaneh Jabbarvand

    Abstract: Code translation transforms programs from one programming language (PL) to another. Several rule-based transpilers have been designed to automate code translation between different pairs of PLs. However, the rules can become obsolete as the PLs evolve and cannot generalize to other PLs. Recent studies have explored the automation of code translation using Large Language Models (LLMs). One key obse… ▽ More

    Submitted 19 June, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

    Comments: Published in FSE 2025

  43. arXiv:2410.22594  [pdf, other

    cs.LG

    Gaussian Derivative Change-point Detection for Early Warnings of Industrial System Failures

    Authors: Hao Zhao, Rong Pan

    Abstract: An early warning of future system failure is essential for conducting predictive maintenance and enhancing system availability. This paper introduces a three-step framework for assessing system health to predict imminent system breakdowns. First, the Gaussian Derivative Change-Point Detection (GDCPD) algorithm is proposed for detecting changes in the high-dimensional feature space. GDCPD conducts… ▽ More

    Submitted 24 November, 2024; v1 submitted 29 October, 2024; originally announced October 2024.

  44. arXiv:2410.18957  [pdf, other

    cs.CL

    Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

    Authors: Jipeng Zhang, Jianshu Zhang, Yuanzhe Li, Renjie Pi, Rui Pan, Runtao Liu, Ziqiang Zheng, Tong Zhang

    Abstract: Large Language Models (LLMs) demonstrate strong proficiency in generating code for high-resource programming languages (HRPLs) like Python but struggle significantly with low-resource programming languages (LRPLs) such as Racket or D. This performance gap deepens the digital divide, preventing developers using LRPLs from benefiting equally from LLM advancements and reinforcing disparities in innov… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 15 pages, 3 figures

  45. arXiv:2410.18147  [pdf, other

    cs.LG cs.AI stat.ML

    MEC-IP: Efficient Discovery of Markov Equivalent Classes via Integer Programming

    Authors: Abdelmonem Elrefaey, Rong Pan

    Abstract: This paper presents a novel Integer Programming (IP) approach for discovering the Markov Equivalent Class (MEC) of Bayesian Networks (BNs) through observational data. The MEC-IP algorithm utilizes a unique clique-focusing strategy and Extended Maximal Spanning Graphs (EMSG) to streamline the search for MEC, thus overcoming the computational limitations inherent in other existing algorithms. Our nu… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  46. arXiv:2410.17043  [pdf, other

    cs.LG cs.NI

    Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling

    Authors: Jialong Li, Shreyansh Tripathi, Lakshay Rastogi, Yiming Lei, Rui Pan, Yiting Xia

    Abstract: As machine learning models scale in size and complexity, their computational requirements become a significant barrier. Mixture-of-Experts (MoE) models alleviate this issue by selectively activating relevant experts. Despite this, MoE models are hindered by high communication overhead from all-to-all operations, low GPU utilization due to the synchronous communication constraint, and complications… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  47. arXiv:2410.14602  [pdf, other

    cs.LG cs.AI

    How Does Data Diversity Shape the Weight Landscape of Neural Networks?

    Authors: Yang Ba, Michelle V. Mancenido, Rong Pan

    Abstract: To enhance the generalization of machine learning models to unseen data, techniques such as dropout, weight decay ($L_2$ regularization), and noise augmentation are commonly employed. While regularization methods (i.e., dropout and weight decay) are geared toward adjusting model parameters to prevent overfitting, data augmentation increases the diversity of the input training set, a method purport… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  48. arXiv:2410.13007  [pdf, other

    cs.SE

    Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights

    Authors: Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha

    Abstract: Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code translation, and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. These are typically derived a… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  49. arXiv:2410.10864  [pdf, other

    cs.CL cs.AI cs.LG

    Fill In The Gaps: Model Calibration and Generalization with Synthetic Data

    Authors: Yang Ba, Michelle V. Mancenido, Rong Pan

    Abstract: As machine learning models continue to swiftly advance, calibrating their performance has become a major concern prior to practical and widespread implementation. Most existing calibration methods often negatively impact model accuracy due to the lack of diversity of validation data, resulting in reduced generalizability. To address this, we propose a calibration method that incorporates synthetic… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 Main Conference (Long paper)

  50. arXiv:2410.07824  [pdf, ps, other

    cs.CV

    Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey

    Authors: Zihan Yu, Tianxiao Li, Yuxin Zhu, Rongze Pan

    Abstract: Change detection, as an important and widely applied technique in the field of remote sensing, aims to analyze changes in surface areas over time and has broad applications in areas such as environmental monitoring, urban development, and land use analysis.In recent years, deep learning, especially the development of foundation models, has provided more powerful solutions for feature extraction an… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 14 pages