这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 1,636 results for author: Chen, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15704  [pdf, ps, other

    cs.RO cs.AI cs.CV

    In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data

    Authors: Xiongyi Cai, Ri-Zhao Qiu, Geng Chen, Lai Wei, Isabella Liu, Tianshu Huang, Xuxin Cheng, Xiaolong Wang

    Abstract: Egocentric videos are a valuable and scalable data source to learn manipulation policies. However, due to significant data heterogeneity, most existing approaches utilize human data for simple pre-training, which does not unlock its full potential. This paper first provides a scalable recipe for collecting and using egocentric data by categorizing human data into two categories: in-the-wild and on… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Comments: Project webpage: https://xiongyicai.github.io/In-N-On/

  2. arXiv:2511.15606  [pdf, ps, other

    cs.GT math.OC

    A Scenario Approach to the Robustness of Nonconvex-Nonconcave Minimax Problems

    Authors: Huan Peng, Guanpu Chen, Karl Henrik Johansson

    Abstract: This paper investigates probabilistic robustness of nonconvex-nonconcave minimax problems via the scenario approach. Inspired by recent advances in scenario optimization (Garatti and Campi, 2025), we obtain robustness results for key equilibria with nonconvex-nonconcave payoffs, overcoming the dependence on the non-degeneracy assumption. Specifically, under convex strategy sets for all players, we… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  3. arXiv:2511.14342  [pdf, ps, other

    cs.CL

    ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

    Authors: Xingwei He, Qianru Zhang, Pengfei Chen, Guanhua Chen, Linlin Yu, Yuan Yuan, Siu-Ming Yiu

    Abstract: Instruction-following is a critical capability of Large Language Models (LLMs). While existing works primarily focus on assessing how well LLMs adhere to user instructions, they often overlook scenarios where instructions contain conflicting constraints-a common occurrence in complex prompts. The behavior of LLMs under such conditions remains under-explored. To bridge this gap, we introduce ConIns… ▽ More

    Submitted 19 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026

  4. arXiv:2511.13924  [pdf, ps, other

    cs.CV

    Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding

    Authors: Qingyang Yan, Guangyao Chen, Yixiong Zou

    Abstract: Chain-of-Thought (CoT) prompting has recently shown significant promise across various NLP and computer vision tasks by explicitly generating intermediate reasoning steps. However, we find that reinforcement learning (RL)-based fine-tuned CoT reasoning can paradoxically degrade performance in Visual Grounding tasks, particularly as CoT outputs become lengthy or complex. Additionally, our analysis… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026 (Oral)

  5. arXiv:2511.13285  [pdf, ps, other

    cs.CV

    SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design

    Authors: Yunjie Yu, Jingchen Wu, Junchen Zhu, Chunze Lin, Guibin Chen

    Abstract: Artistic design such as poster design often demands rapid yet precise modification of textual content while preserving visual harmony and typographic intent, especially across diverse font styles. Although modern image editing models have grown increasingly powerful, they still fall short in fine-grained, font-aware text manipulation, limiting their utility in professional design workflows such as… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  6. arXiv:2511.13121  [pdf, ps, other

    cs.CV

    CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model

    Authors: Yuqi Zhang, Guanying Chen, Jiaxing Chen, Chuanyu Fu, Chuan Huang, Shuguang Cui

    Abstract: Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task. Recent advances in video diffusion models have demonstrated strong temporal reasoning capabilities, making them a promising tool for enhancing reconstruction quality under sparse-view settings. However, existing approaches are primarily designed for modest viewpoint variations, which struggl… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: Project Link: https://zyqz97.github.io/CloseUpShot/

  7. arXiv:2511.12912  [pdf, ps, other

    cs.RO

    DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim2Real Robotic Grasping

    Authors: Yingting Zhou, Wenbo Cui, Weiheng Liu, Guixing Chen, Haoran Li, Dongbin Zhao

    Abstract: Transferring the depth-based end-to-end policy trained in simulation to physical robots can yield an efficient and robust grasping policy, yet sensor artifacts in real depth maps like voids and noise establish a significant sim2real gap that critically impedes policy transfer. Training-time strategies like procedural noise injection or learned mappings suffer from data inefficiency due to unrealis… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  8. arXiv:2511.12181  [pdf, ps, other

    cs.CV cs.LG

    MixAR: Mixture Autoregressive Image Generation

    Authors: Jinyuan Hu, Jiayou Zhang, Shaobo Cui, Kun Zhang, Guangyi Chen

    Abstract: Autoregressive (AR) approaches, which represent images as sequences of discrete tokens from a finite codebook, have achieved remarkable success in image generation. However, the quantization process and the limited codebook size inevitably discard fine-grained information, placing bottlenecks on fidelity. Motivated by this limitation, recent studies have explored autoregressive modeling in continu… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  9. arXiv:2511.11793  [pdf, ps, other

    cs.CL

    MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

    Authors: MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Wenhan Dou, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li , et al. (30 additional authors not shown)

    Abstract: We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of p… ▽ More

    Submitted 18 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

    Comments: Technical Report

  10. arXiv:2511.11733  [pdf, ps, other

    cs.DC cs.AI

    Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

    Authors: Jingwei Song, Wanyi Chen, Xinyuan Song, Max, Chris Tong, Gufeng Chen, Tianyi Zhao, Eric Yang, Bill Shi, Lynn Ai

    Abstract: Speculative decoding accelerates large language model (LLM) inference by using a lightweight draft model to propose tokens that are later verified by a stronger target model. While effective in centralized systems, its behavior in decentralized settings, where network latency often dominates compute, remains under-characterized. We present Decentralized Speculative Decoding (DSD), a plug-and-play… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: 6 pages, 2 figures, 2 tables. Uses ICML 2025 style

  11. arXiv:2511.11025  [pdf, ps, other

    cs.CV cs.AI

    AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning

    Authors: Jirong Zha, Yuxuan Fan, Tianyu Zhang, Geng Chen, Yingfeng Chen, Chen Gao, Xinlei Chen

    Abstract: Multimodal Large Language Models (MLLMs) have shown promise in single-agent vision tasks, yet benchmarks for evaluating multi-agent collaborative perception remain scarce. This gap is critical, as multi-drone systems provide enhanced coverage, robustness, and collaboration compared to single-sensor setups. Existing multi-image benchmarks mainly target basic perception tasks using high-quality sing… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  12. arXiv:2511.10913  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.MM eess.AS

    Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

    Authors: Guangke Chen, Yuhui Wang, Shouling Ji, Xiapu Luo, Ting Wang

    Abstract: Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully reproduces input text and mimics specified speaker identities. While prior misuse studies have focused on speaker impersonation, this work explores a distinct content-centric threat: exploiting TTS systems to produce speech containing harmful content.… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  13. arXiv:2511.09900  [pdf, ps, other

    cs.AI cs.CE

    Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search

    Authors: Yaodong Yang, Yang Wang, Jinpeng Li, Pei Guo, Da Han, Guangyong Chen, Pheng-Ann Heng

    Abstract: Protein evolution through amino acid sequence mutations is a cornerstone of life sciences. While current in-silicon directed evolution algorithms largely focus on designing heuristic search strategies, they overlook how to integrate the transformative protein language models, which encode rich evolutionary patterns, with reinforcement learning to learn to directly evolve proteins. To bridge this g… ▽ More

    Submitted 19 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: working in progress, 26 pages, 6 figures, 16 tables, updated with more baselines and related works

  14. arXiv:2511.09593  [pdf, ps, other

    cs.LG

    DynamicRTL: RTL Representation Learning for Dynamic Circuit Behavior

    Authors: Ruiyang Ma, Yunhao Zhou, Yipeng Wang, Yi Liu, Zhengyuan Shi, Ziyang Zheng, Kexin Chen, Zhiqiang He, Lingwei Yan, Gang Chen, Qiang Xu, Guojie Luo

    Abstract: There is a growing body of work on using Graph Neural Networks (GNNs) to learn representations of circuits, focusing primarily on their static characteristics. However, these models fail to capture circuit runtime behavior, which is crucial for tasks like circuit verification and optimization. To address this limitation, we introduce DR-GNN (DynamicRTL-GNN), a novel approach that learns RTL circui… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI'2026

  15. arXiv:2511.08937  [pdf, ps, other

    cs.CV cs.LG

    Boosting Adversarial Transferability via Ensemble Non-Attention

    Authors: Yipeng Zou, Qin Liu, Jie Wu, Yu Peng, Guo Chen, Hui Zhou, Guanghui Ye

    Abstract: Ensemble attacks integrate the outputs of surrogate models with diverse architectures, which can be combined with various gradient-based attacks to improve adversarial transferability. However, previous work shows unsatisfactory attack performance when transferring across heterogeneous model architectures. The main reason is that the gradient update directions of heterogeneous surrogate models dif… ▽ More

    Submitted 13 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: 16 pages, 11 figures, accepted by AAAI 2026

  16. arXiv:2511.07327  [pdf, ps, other

    cs.AI cs.CL

    IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

    Authors: Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introd… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: https://github.com/Alibaba-NLP/DeepResearch

  17. arXiv:2511.06307  [pdf, ps, other

    cs.LG

    DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

    Authors: Speed Zhu, Jianwei Cai, Guang Chen, Lulu Wu, Saiyong Yang, Wiggin Zhou

    Abstract: Recent reasoning-first models (e.g., OpenAI o1, DeepSeek R1) have spurred a resurgence of interest in RLVR. Nevertheless, advances are dominated by mathematics (e.g., AIME), with competitive-programming code generation underexplored and data curation receiving less attention than RL algorithm design. We investigate how to construct RLVR datasets (i.e., RL prompts) and present practical training te… ▽ More

    Submitted 9 November, 2025; originally announced November 2025.

    Comments: 15 pages, 8 figures

  18. arXiv:2511.05625  [pdf

    cs.CY cs.AI

    Report from Workshop on Dialogue alongside Artificial Intelligence

    Authors: Thomas J McKenna, Ingvill Rasmussen, Sten Ludvigsen, Avivit Arvatz, Christa Asterhan, Gaowei Chen, Julie Cohen, Michele Flammia, Dongkeun Han, Emma Hayward, Heather Hill, Yifat Kolikant, Helen Lehndorf, Kexin Li, Lindsay Clare Matsumura, Henrik Tjønn, Pengjin Wang, Rupert Wegerif

    Abstract: Educational dialogue -- the collaborative exchange of ideas through talk -- is widely recognized as a catalyst for deeper learning and critical thinking in and across contexts. At the same time, artificial intelligence (AI) has rapidly emerged as a powerful force in education, with the potential to address major challenges, personalize learning, and innovate teaching practices. However, these adva… ▽ More

    Submitted 10 November, 2025; v1 submitted 6 November, 2025; originally announced November 2025.

    Comments: Report from the Workshop on Dialogue alongside Artificial Intelligence (2025)

  19. arXiv:2511.03929  [pdf, ps, other

    cs.LG cs.AI cs.CV

    NVIDIA Nemotron Nano V2 VL

    Authors: NVIDIA, :, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Guo Chen, Karan Sapra, Zhiding Yu, Adi Renduchintala, Charles Wang, Peter Jin, Arushi Goel, Mike Ranzinger, Lukas Voegtle, Philipp Fischer, Timo Roman, Wei Ping, Boxin Wang, Zhuolin Yang , et al. (99 additional authors not shown)

    Abstract: We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and… ▽ More

    Submitted 6 November, 2025; v1 submitted 5 November, 2025; originally announced November 2025.

  20. arXiv:2511.02845  [pdf, ps, other

    eess.SP cs.AI physics.ins-det

    AI-Enhanced Wi-Fi Sensing Through Single Transceiver Pair

    Authors: Yuxuan Liu, Chiya Zhang, Yifeng Yuan, Chunlong He, Weizheng Zhang, Gaojie Chen

    Abstract: The advancement of next-generation Wi-Fi technology heavily relies on sensing capabilities, which play a pivotal role in enabling sophisticated applications. In response to the growing demand for large-scale deployments, contemporary Wi-Fi sensing systems strive to achieve high-precision perception while maintaining minimal bandwidth consumption and antenna count requirements. Remarkably, various… ▽ More

    Submitted 21 October, 2025; originally announced November 2025.

    Comments: 12 pages, 11 figures

  21. arXiv:2511.01881  [pdf, ps, other

    cs.DC

    HGraphScale: Hierarchical Graph Learning for Autoscaling Microservice Applications in Container-based Cloud Computing

    Authors: Zhengxin Fang, Hui Ma, Gang Chen, Rajkumar Buyya

    Abstract: Microservice architecture has become a dominant paradigm in application development due to its advantages of being lightweight, flexible, and resilient. Deploying microservice applications in the container-based cloud enables fine-grained elastic resource allocation. Autoscaling is an effective approach to dynamically adjust the resource provisioned to containers. However, the intricate microservi… ▽ More

    Submitted 23 October, 2025; originally announced November 2025.

  22. arXiv:2511.01670  [pdf, ps, other

    cs.CL cs.AI

    SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

    Authors: Chaoqun Liu, Mahani Aljunied, Guizhen Chen, Hou Pong Chan, Weiwen Xu, Yu Rong, Wenxuan Zhang

    Abstract: We introduce SeaLLMs-Audio, the first large audio-language model (LALM) tailored for multiple Southeast Asian (SEA) languages-Indonesian (id), Thai (th), and Vietnamese (vi)-alongside English (en) and Chinese (zh). Trained on a large-scale audio corpus, SeaLLMs-Audio exhibits strong performance across diverse audio-centric tasks, spanning fine-grained audio understanding and voice-based interactio… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

    Comments: 10 pages

  23. arXiv:2511.01633  [pdf, ps, other

    cs.LG cs.AI

    Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving

    Authors: Chengying Huan, Ziheng Meng, Yongchao Liu, Zhengyi Yang, Yun Zhu, Yue Yun, Shipeng Li, Rong Gu, Xiabao Wu, Haitao Zhang, Chuntao Hong, Shaonan Ma, Guihai Chen, Chen Tian

    Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, excessive token usage, high latency, and low throughput due to single-agent monolithic prompts, repeated context re-encoding, and inefficient serving execution. We present GLM, the first multi-agent Graph-CoT sys… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  24. arXiv:2511.01234  [pdf, ps, other

    cs.LG stat.ML

    A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

    Authors: Min Gan, Guang-Yong Chen, Yang Yi, Lin Yang

    Abstract: The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate t… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  25. arXiv:2511.00509  [pdf, ps, other

    cs.AI cs.CR

    Reimagining Safety Alignment with An Image

    Authors: Yifan Xia, Guorui Chen, Wenqian Yu, Zhijiang Li, Philip Torr, Jindong Gu

    Abstract: Large language models (LLMs) excel in diverse applications but face dual challenges: generating harmful content under jailbreak attacks and over-refusal of benign queries due to rigid safety mechanisms. These issues are further complicated by the need to accommodate different value systems and precisely align with given safety preferences. Moreover, traditional methods like SFT and RLHF lack this… ▽ More

    Submitted 1 November, 2025; originally announced November 2025.

  26. arXiv:2511.00432  [pdf, ps, other

    cs.CL

    G2: Guided Generation for Enhanced Output Diversity in LLMs

    Authors: Zhiwen Ruan, Yixia Li, Yefeng Liu, Yun Chen, Weihua Luo, Peng Li, Yang Liu, Guanhua Chen

    Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse natural language processing tasks. However, these models exhibit a critical limitation in output diversity, often generating highly similar content across multiple attempts. This limitation significantly affects tasks requiring diverse outputs, from creative writing to reasoning. Existing solutions, like temperat… ▽ More

    Submitted 1 November, 2025; originally announced November 2025.

    Comments: EMNLP 2025

  27. arXiv:2510.27339  [pdf, ps, other

    cs.SI

    Meritocracy versus Matthew-effect: Two underlying network formation mechanisms of online social platforms

    Authors: Yuchen Xu, Wenjun Mei, Ge Chen, Linyuan Lü

    Abstract: With the rapid development of the internet industry, online social networks have come to play an increasingly significant role in everyday life. In recent years, content-based emerging platforms such as TikTok, Instagram, and Bilibili have diverged fundamentally in their underlying logic from traditional connection-based social platforms like Facebook and LinkedIn. Empirical data on follower count… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

    Comments: 11 pages, 4 figures

  28. arXiv:2510.26692  [pdf, ps, other

    cs.CL cs.LG

    Kimi Linear: An Expressive, Efficient Attention Architecture

    Authors: Kimi Team, Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, T. Y. Liu, Haiming Wang, Shengjun Fang , et al. (35 additional authors not shown)

    Abstract: We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mech… ▽ More

    Submitted 1 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: Kimi Linear tech report

  29. arXiv:2510.25086  [pdf, ps, other

    cs.RO

    Mean-Shift Theory and Its Applications in Swarm Robotics: A New Way to Enhance the Efficiency of Multi-Robot Collaboration

    Authors: Guibin Sun, Jinhu Lü, Kexin Liu, Zhenqian Wang, Guanrong Chen

    Abstract: Swarms evolving from collective behaviors among multiple individuals are commonly seen in nature, which enables biological systems to exhibit more efficient and robust collaboration. Creating similar swarm intelligence in engineered robots poses challenges to the design of collaborative algorithms that can be programmed at large scales. The assignment-based method has played an eminent role for a… ▽ More

    Submitted 7 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

  30. arXiv:2510.24701  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.MA

    Tongyi DeepResearch Technical Report

    Authors: Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang , et al. (32 additional authors not shown)

    Abstract: We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across co… ▽ More

    Submitted 4 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: https://tongyi-agent.github.io/blog

  31. arXiv:2510.24695  [pdf, ps, other

    cs.CL

    AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

    Authors: Xuanzhong Chen, Zile Qiao, Guoxin Chen, Liangcai Su, Zhen Zhang, Xinyu Wang, Pengjun Xie, Fei Huang, Jingren Zhou, Yong Jiang

    Abstract: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an au… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  32. arXiv:2510.24592  [pdf, ps, other

    cs.CL

    ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

    Authors: Guoxin Chen, Jing Wu, Xinjie Chen, Wayne Xin Zhao, Ruihua Song, Chengxi Li, Kai Fan, Dayiheng Liu, Minpeng Liao

    Abstract: Autoformalization, which translates natural language mathematics into machine-verifiable formal statements, is critical for using formal mathematical reasoning to solve math problems stated in natural language. While Large Language Models can generate syntactically correct formal statements, they often fail to preserve the original problem's semantic intent. This limitation arises from the LLM app… ▽ More

    Submitted 30 October, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: https://github.com/Chen-GX/ReForm

  33. arXiv:2510.24551  [pdf, ps, other

    cs.AI

    Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives

    Authors: Gang Chen, Changshuo Liu, Gene Anne Ooi, Marcus Tan, Zhongle Xie, Jianwei Yin, James Wei Luen Yip, Wenqiao Zhang, Jiaqi Zhu, Beng Chin Ooi

    Abstract: Generative Artificial Intelligence (GenAI) is taking the world by storm. It promises transformative opportunities for advancing and disrupting existing practices, including healthcare. From large language models (LLMs) for clinical note synthesis and conversational assistance to multimodal systems that integrate medical imaging, electronic health records, and genomic data for decision support, Gen… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  34. arXiv:2510.23569  [pdf, ps, other

    cs.CV

    EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

    Authors: Baoqi Pei, Yifei Huang, Jilan Xu, Yuping He, Guo Chen, Fei Wu, Yu Qiao, Jiangmiao Pang

    Abstract: Egocentric video reasoning centers on an unobservable agent behind the camera who dynamically shapes the environment, requiring inference of hidden intentions and recognition of fine-grained interactions. This core challenge limits current multimodal large language models MLLMs, which excel at visible event reasoning but lack embodied, first-person understanding. To bridge this gap, we introduce E… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

    Comments: Accepted at NeurIPS 2025

  35. arXiv:2510.23407  [pdf, ps, other

    cs.NE

    Multi-Task Surrogate-Assisted Search with Bayesian Competitive Knowledge Transfer for Expensive Optimization

    Authors: Yi Lu, Xiaoming Xue, Kai Zhang, Liming Zhang, Guodong Chen, Chenming Cao, Piyang Liu, Kay Chen Tan

    Abstract: Expensive optimization problems (EOPs) present significant challenges for traditional evolutionary optimization due to their limited evaluation calls. Although surrogate-assisted search (SAS) has become a popular paradigm for addressing EOPs, it still suffers from the cold-start issue. In response to this challenge, knowledge transfer has been gaining popularity for its ability to leverage search… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

  36. arXiv:2510.21712  [pdf, ps, other

    cs.IR cs.AI cs.CL

    DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling

    Authors: Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang

    Abstract: Retrieval-Augmented Generation (RAG) systems have emerged as a pivotal methodology for enhancing Large Language Models (LLMs) through the dynamic integration of external knowledge. To further improve RAG's flexibility, Agentic RAG introduces autonomous agents into the workflow. However, Agentic RAG faces several challenges: (1) the success of each step depends on both high-quality planning and acc… ▽ More

    Submitted 7 September, 2025; originally announced October 2025.

    Comments: EMNLP 2025 Main Conference

  37. arXiv:2510.21668  [pdf, ps, other

    cs.GT cs.IT

    Privacy Guarantee for Nash Equilibrium Computation of Aggregative Games Based on Pointwise Maximal Leakage

    Authors: Zhaoyang Cheng, Guanpu Chen, Tobias J. Oechtering, Mikael Skoglund

    Abstract: Privacy preservation has served as a key metric in designing Nash equilibrium (NE) computation algorithms. Although differential privacy (DP) has been widely employed for privacy guarantees, it does not exploit prior distributional knowledge of datasets and is ineffective in assessing information leakage for correlated datasets. To address these concerns, we establish a pointwise maximal leakage (… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

  38. arXiv:2510.21228  [pdf, ps, other

    cs.CL cs.HC

    DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services

    Authors: Xiang Li, Huizi Yu, Wenkong Wang, Yiran Wu, Jiayan Zhou, Wenyue Hua, Xinxin Lin, Wenjia Tan, Lexuan Zhu, Bingyi Chen, Guang Chen, Ming-Li Chen, Yang Zhou, Zhao Li, Themistocles L. Assimes, Yongfeng Zhang, Qingyun Wu, Xin Ma, Lingyao Li, Lizhou Fan

    Abstract: Objective: Emergency medical dispatch (EMD) is a high-stakes process challenged by caller distress, ambiguity, and cognitive load. Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers. This study aimed to develop and evaluate a taxonomy-grounded, LLM-powered multi-agent system for simulating realistic EMD scenarios. Methods: We constructed a clinica… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: 27 pages, 7 figures, 3 tables

    MSC Class: 68T07; 92C50 ACM Class: I.2.7; J.3

  39. arXiv:2510.20677  [pdf, ps, other

    cs.SD cs.AI eess.AS

    R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion

    Authors: Junjie Zheng, Gongyu Chen, Chaofan Ding, Zihao Chen

    Abstract: In real-world singing voice conversion (SVC) applications, environmental noise and the demand for expressive output pose significant challenges. Conventional methods, however, are typically designed without accounting for real deployment scenarios, as both training and inference usually rely on clean data. This mismatch hinders practical use, given the inevitable presence of diverse noise sources… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: 5 pages, 2 figures

  40. arXiv:2510.19517  [pdf, ps, other

    cs.LG

    Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data

    Authors: Shuli Zhang, Hao Zhou, Jiaqi Zheng, Guibin Jiang, Bing Cheng, Wei Lin, Guihai Chen

    Abstract: Online Internet platforms require sophisticated marketing strategies to optimize user retention and platform revenue -- a classical resource allocation problem. Traditional solutions adopt a two-stage pipeline: machine learning (ML) for predicting individual treatment effects to marketing actions, followed by operations research (OR) optimization for decision-making. This paradigm presents two fun… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  41. arXiv:2510.19314  [pdf, ps, other

    cs.AI

    Continual Knowledge Adaptation for Reinforcement Learning

    Authors: Jinwu Hu, Zihao Lian, Zhiquan Wen, Chenghao Li, Guohao Chen, Xutao Wen, Bin Xiao, Mingkui Tan

    Abstract: Reinforcement Learning enables agents to learn optimal behaviors through interactions with environments. However, real-world environments are typically non-stationary, requiring agents to continuously adapt to new tasks and changing conditions. Although Continual Reinforcement Learning facilitates learning across multiple tasks, existing methods often suffer from catastrophic forgetting and ineffi… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: NeurIPS 2025

  42. arXiv:2510.19195  [pdf, ps, other

    cs.CV cs.AI

    Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

    Authors: Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang

    Abstract: Recent advancements in driving world models enable controllable generation of high-quality RGB videos or multimodal videos. Existing methods primarily focus on metrics related to generation quality and controllability. However, they often overlook the evaluation of downstream perception tasks, which are $\mathbf{really\ crucial}$ for the performance of autonomous driving. Existing methods usually… ▽ More

    Submitted 24 October, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

  43. arXiv:2510.18341  [pdf, ps, other

    cs.CV

    ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation

    Authors: Kaiyuan Tan, Yingying Shen, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye

    Abstract: Realistic view extrapolation is critical for closed-loop simulation in autonomous driving, yet it remains a significant challenge for current Novel View Synthesis (NVS) methods, which often produce distorted and inconsistent images beyond the original trajectory. This report presents our winning solution which ctook first place in the RealADSim Workshop NVS track at ICCV 2025. To address the core… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  44. arXiv:2510.18310  [pdf, ps, other

    cs.LG stat.ME

    Towards Identifiability of Hierarchical Temporal Causal Representation Learning

    Authors: Zijian Li, Minghao Fu, Junxian Huang, Yifan Shen, Ruichu Cai, Yuewen Sun, Guangyi Chen, Kun Zhang

    Abstract: Modeling hierarchical latent dynamics behind time series data is critical for capturing temporal dependencies across multiple levels of abstraction in real-world tasks. However, existing temporal causal representation learning methods fail to capture such dynamics, as they fail to recover the joint distribution of hierarchical latent variables from \textit{single-timestep observed variables}. Inte… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  45. arXiv:2510.18281  [pdf, ps, other

    cs.LG

    Online Time Series Forecasting with Theoretical Guarantees

    Authors: Zijian Li, Changze Zhou, Minghao Fu, Sanjay Manjunath, Fan Feng, Guangyi Chen, Yingyao Hu, Ruichu Cai, Kun Zhang

    Abstract: This paper is concerned with online time series forecasting, where unknown distribution shifts occur over time, i.e., latent variables influence the mapping from historical to future observations. To develop an automated way of online time series forecasting, we propose a Theoretical framework for Online Time-series forecasting (TOT in short) with theoretical guarantees. Specifically, we prove tha… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  46. arXiv:2510.17568  [pdf, ps, other

    cs.CV

    PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception

    Authors: Kaichen Zhou, Yuhan Wang, Grace Chen, Xinhai Chang, Gaspard Beaudouin, Fangneng Zhan, Paul Pu Liang, Mengyu Wang

    Abstract: Recent 3D feed-forward models, such as the Visual Geometry Grounded Transformer (VGGT), have shown strong capability in inferring 3D attributes of static scenes. However, since they are typically trained on static datasets, these models often struggle in real-world scenarios involving complex dynamic elements, such as moving humans or deformable objects like umbrellas. To address this limitation,… ▽ More

    Submitted 21 October, 2025; v1 submitted 20 October, 2025; originally announced October 2025.

  47. arXiv:2510.15455  [pdf, ps, other

    cs.CL

    CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

    Authors: Gucongcong Fan, Chaoyue Niu, Chengfei Lyu, Fan Wu, Guihai Chen

    Abstract: Mobile agents rely on Large Language Models (LLMs) to plan and execute tasks on smartphone user interfaces (UIs). While cloud-based LLMs achieve high task accuracy, they require uploading the full UI state at every step, exposing unnecessary and often irrelevant information. In contrast, local LLMs avoid UI uploads but suffer from limited capacity, resulting in lower task success rates. We propose… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

  48. arXiv:2510.15295  [pdf, ps, other

    cs.IT

    Rotatable Antenna Meets UAV: Towards Dual-Level Channel Reconfiguration Paradigm for ISAC

    Authors: Shiying Chen, Guangji Chen, Long Shi, Qingqing Wu, Kang Wei

    Abstract: Integrated sensing and communication (ISAC) is viewed as a key enabler for future wireless networks by sharing the hardware and wireless resources between the functionalities of sensing and communication (S&C). Due to the shared wireless resources for both S&C, it is challenging to achieve a critical trade-off between these two integrated functionalities. To address this issue, this paper proposes… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

    Comments: 5 pages

  49. arXiv:2510.15227  [pdf, ps, other

    eess.AS cs.SD

    LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models

    Authors: Xiaohan Zhao, Hongyu Xiang, Shengze Ye, Song Li, Zhengkun Tian, Guanyu Chen, Ke Ding, Guanglu Wan

    Abstract: This paper presents LongCat-Audio-Codec, an audio tokenizer and detokenizer solution designed for industrial grade end-to-end speech large language models. By leveraging a decoupled model architecture and a multistage training strategy, LongCat-Audio-Codec exhibits robust semantic modeling capabilities, flexible acoustic feature extraction capabilities, and low-latency streaming synthesis capabili… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  50. arXiv:2510.15217  [pdf, ps, other

    cs.LG

    Reflections from Research Roundtables at the Conference on Health, Inference, and Learning (CHIL) 2025

    Authors: Emily Alsentzer, Marie-Laure Charpignon, Bill Chen, Niharika D'Souza, Jason Fries, Yixing Jiang, Aparajita Kashyap, Chanwoo Kim, Simon Lee, Aishwarya Mandyam, Ashery Mbilinyi, Nikita Mehandru, Nitish Nagesh, Brighton Nuwagira, Emma Pierson, Arvind Pillai, Akane Sano, Tanveer Syeda-Mahmood, Shashank Yadav, Elias Adhanom, Muhammad Umar Afza, Amelia Archer, Suhana Bedi, Vasiliki Bikia, Trenton Chang , et al. (68 additional authors not shown)

    Abstract: The 6th Annual Conference on Health, Inference, and Learning (CHIL 2025), hosted by the Association for Health Learning and Inference (AHLI), was held in person on June 25-27, 2025, at the University of California, Berkeley, in Berkeley, California, USA. As part of this year's program, we hosted Research Roundtables to catalyze collaborative, small-group dialogue around critical, timely topics at… ▽ More

    Submitted 3 November, 2025; v1 submitted 16 October, 2025; originally announced October 2025.