-
Online Submission and Evaluation System Design for Competition Operations
Authors:
Zhe Chen,
Daniel Harabor,
Ryan Hechnenberger,
Nathan R. Sturtevant
Abstract:
Research communities have developed benchmark datasets across domains to compare the performance of algorithms and techniques However, tracking the progress in these research areas is not easy, as publications appear in different venues at the same time, and many of them claim to represent the state-of-the-art. To address this, research communities often organise periodic competitions to evaluate…
▽ More
Research communities have developed benchmark datasets across domains to compare the performance of algorithms and techniques However, tracking the progress in these research areas is not easy, as publications appear in different venues at the same time, and many of them claim to represent the state-of-the-art. To address this, research communities often organise periodic competitions to evaluate the performance of various algorithms and techniques, thereby tracking advancements in the field. However, these competitions pose a significant operational burden. The organisers must manage and evaluate a large volume of submissions. Furthermore, participants typically develop their solutions in diverse environments, leading to compatibility issues during the evaluation of their submissions. This paper presents an online competition system that automates the submission and evaluation process for a competition. The competition system allows organisers to manage large numbers of submissions efficiently, utilising isolated environments to evaluate submissions. This system has already been used successfully for several competitions, including the Grid-Based Pathfinding Competition and the League of Robot Runners competition.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Megrez2 Technical Report
Authors:
Boxun Li,
Yadong Li,
Zhiyuan Li,
Congyi Liu,
Weilin Liu,
Guowei Niu,
Zheyue Tan,
Haiyang Xu,
Zhuyu Yao,
Tao Yuan,
Dong Zhou,
Yueqing Zhuang,
Bo Zhao,
Guohao Dai,
Yu Wang
Abstract:
We present Megrez2, a novel lightweight and high-performance language model architecture optimized for device native deployment. Megrez2 introduces a novel cross-layer expert sharing mechanism, which significantly reduces total parameter count by reusing expert modules across adjacent transformer layers while maintaining most of the model's capacity. It also incorporates pre-gated routing, enablin…
▽ More
We present Megrez2, a novel lightweight and high-performance language model architecture optimized for device native deployment. Megrez2 introduces a novel cross-layer expert sharing mechanism, which significantly reduces total parameter count by reusing expert modules across adjacent transformer layers while maintaining most of the model's capacity. It also incorporates pre-gated routing, enabling memory-efficient expert loading and faster inference. As the first instantiation of the Megrez2 architecture, we introduce the Megrez2-Preview model, which is pre-trained on a 5-trillion-token corpus and further enhanced through supervised fine-tuning and reinforcement learning with verifiable rewards. With only 3B activated and 7.5B stored parameters, Megrez2-Preview demonstrates competitive or superior performance compared to larger models on a wide range of tasks, including language understanding, instruction following, mathematical reasoning, and code generation. These results highlight the effectiveness of the Megrez2 architecture to achieve a balance between accuracy, efficiency, and deployability, making it a strong candidate for real-world, resource-constrained applications.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Application of new conformal cooling layouts to the green injection molding of complex slender polymeric parts with high dimensional specifications
Authors:
Abelardo Torres Alba,
Jorge Manuel Mercado Colmenero,
Juan de Dios Caballero Garcia,
Cristina Martin Donate
Abstract:
Eliminating warpage in injection molded polymeric parts is one of the most important problems in the injection molding industry today. This situation is critical in geometries that are particularly susceptible to warping due to their geometric features, and this occurs with topologies of great length and slenderness with high changes in thickness. These features are, in these special geometries, i…
▽ More
Eliminating warpage in injection molded polymeric parts is one of the most important problems in the injection molding industry today. This situation is critical in geometries that are particularly susceptible to warping due to their geometric features, and this occurs with topologies of great length and slenderness with high changes in thickness. These features are, in these special geometries, impossible to manufacture with traditional technologies to meet the dimensional and sustainable requirements of the industry. This paper presents an innovative green conformal cooling system that is specifically designed for parts with slender geometric shapes that are highly susceptible to warping. Additionally, the work presented by the authors investigates the importance of using highly conductive inserts made of steel alloys in combination with the use of additively manufactured conformal channels for reducing influential parameters, such as warpage, cooling time, and residual stresses in the complex manufacturing of long and slender parts. The results of this real industrial case study indicated that the use of conformal cooling layouts decreased the cycle time by 175.1 s 66% below the current cooling time; the temperature gradient by 78.5% specifically, 18.16 C; the residual stress by 39.78 MPa or 81.88%; and the warpage by 6.9 mm or 90.5%. In this way, it was possible to achieve a final warping in the complex geometry studied of 0.72 mm, which was under the maximum value required at the industrial level of 1 mm. The resulting values obtained by the researchers present a turning point from which the manufacturing and sustainability in the injection molding of said plastic geometries is possible, and they take into account that the geometric manufacturing features analyzed will present a great demand in the coming years in the auto parts manufacturing industry.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer
Authors:
Danny D. Leybzon,
Shreyas Tirumala,
Nishant Jain,
Summer Gillen,
Michael Jackson,
Cameron McPhee,
Jennifer Schmidt
Abstract:
With the rise of voice-enabled artificial intelligence (AI) systems, quantitative survey researchers have access to a new data-collection mode: AI telephone surveying. By using AI to conduct phone interviews, researchers can scale quantitative studies while balancing the dual goals of human-like interactivity and methodological rigor. Unlike earlier efforts that used interactive voice response (IV…
▽ More
With the rise of voice-enabled artificial intelligence (AI) systems, quantitative survey researchers have access to a new data-collection mode: AI telephone surveying. By using AI to conduct phone interviews, researchers can scale quantitative studies while balancing the dual goals of human-like interactivity and methodological rigor. Unlike earlier efforts that used interactive voice response (IVR) technology to automate these surveys, voice AI enables a more natural and adaptive respondent experience as it is more robust to interruptions, corrections, and other idiosyncrasies of human speech.
We built and tested an AI system to conduct quantitative surveys based on large language models (LLM), automatic speech recognition (ASR), and speech synthesis technologies. The system was specifically designed for quantitative research, and strictly adhered to research best practices like question order randomization, answer order randomization, and exact wording.
To validate the system's effectiveness, we deployed it to conduct two pilot surveys with the SSRS Opinion Panel and followed-up with a separate human-administered survey to assess respondent experiences. We measured three key metrics: the survey completion rates, break-off rates, and respondent satisfaction scores. Our results suggest that shorter instruments and more responsive AI interviewers may contribute to improvements across all three metrics studied.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Authors:
Karen Zhou,
John Giorgi,
Pranav Mani,
Peng Xu,
Davis Liang,
Chenhao Tan
Abstract:
AI-generated clinical notes are increasingly used in healthcare, but evaluating their quality remains a challenge due to high subjectivity and limited scalability of expert review. Existing automated metrics often fail to align with real-world physician preferences. To address this, we propose a pipeline that systematically distills real user feedback into structured checklists for note evaluation…
▽ More
AI-generated clinical notes are increasingly used in healthcare, but evaluating their quality remains a challenge due to high subjectivity and limited scalability of expert review. Existing automated metrics often fail to align with real-world physician preferences. To address this, we propose a pipeline that systematically distills real user feedback into structured checklists for note evaluation. These checklists are designed to be interpretable, grounded in human feedback, and enforceable by LLM-based evaluators. Using deidentified data from over 21,000 clinical encounters, prepared in accordance with the HIPAA safe harbor standard, from a deployed AI medical scribe system, we show that our feedback-derived checklist outperforms baseline approaches in our offline evaluations in coverage, diversity, and predictive power for human ratings. Extensive experiments confirm the checklist's robustness to quality-degrading perturbations, significant alignment with clinician preferences, and practical value as an evaluation methodology. In offline research settings, the checklist can help identify notes likely to fall below our chosen quality thresholds.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Joint Asymmetric Loss for Learning with Noisy Labels
Authors:
Jialiang Wang,
Xianming Liu,
Xiong Zhou,
Gangfeng Hu,
Deming Zhai,
Junjun Jiang,
Xiangyang Ji
Abstract:
Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active an…
▽ More
Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active and a passive loss to mutually enhance the overall fitting ability. Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions. Despite these advancements, emerging theoretical analyses indicate that asymmetric losses, a new class of robust loss functions, possess superior properties compared to symmetric losses. However, existing asymmetric losses are not compatible with advanced optimization frameworks such as APL, limiting their potential and applicability. Motivated by this theoretical gap and the prospect of asymmetric losses, we extend the asymmetric loss to the more complex passive loss scenario and propose the Asymetric Mean Square Error (AMSE), a novel asymmetric loss. We rigorously establish the necessary and sufficient condition under which AMSE satisfies the asymmetric condition. By substituting the traditional symmetric passive loss in APL with our proposed AMSE, we introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL). Extensive experiments demonstrate the effectiveness of our method in mitigating label noise. Code available at: https://github.com/cswjl/joint-asymmetric-loss
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Authors:
Shan Jiang,
Pranoy Kovuri,
David Tao,
Zhixun Tan
Abstract:
Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (…
▽ More
Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills
Authors:
Mohammad Nur Hossain Khan,
David creswell,
Jordan Albert,
Patrick O'Connell,
Shawn Fallon,
Mathew Polowitz,
Xuhai "orson" Xu,
Bashima islam
Abstract:
Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill deve…
▽ More
Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone's accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Audio-Vision Contrastive Learning for Phonological Class Recognition
Authors:
Daiqi Liu,
Tomás Arias-Vergara,
Jana Hutter,
Andreas Maier,
Paula Andrea Pérez-Toro
Abstract:
Accurate classification of articulatory-phonological features plays a vital role in understanding human speech production and developing robust speech technologies, particularly in clinical contexts where targeted phonemic analysis and therapy can improve disease diagnosis accuracy and personalized rehabilitation. In this work, we propose a multimodal deep learning framework that combines real-tim…
▽ More
Accurate classification of articulatory-phonological features plays a vital role in understanding human speech production and developing robust speech technologies, particularly in clinical contexts where targeted phonemic analysis and therapy can improve disease diagnosis accuracy and personalized rehabilitation. In this work, we propose a multimodal deep learning framework that combines real-time magnetic resonance imaging (rtMRI) and speech signals to classify three key articulatory dimensions: manner of articulation, place of articulation, and voicing. We perform classification on 15 phonological classes derived from the aforementioned articulatory dimensions and evaluate the system with four audio/vision configurations: unimodal rtMRI, unimodal audio signals, multimodal middle fusion, and contrastive learning-based audio-vision fusion. Experimental results on the USC-TIMIT dataset show that our contrastive learning-based approach achieves state-of-the-art performance, with an average F1-score of 0.81, representing an absolute increase of 0.23 over the unimodal baseline. The results confirm the effectiveness of contrastive representation learning for multimodal articulatory analysis. Our code and processed dataset will be made publicly available at https://github.com/DaE-plz/AC_Contrastive_Phonology to support future research.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
MCM: Mamba-based Cardiac Motion Tracking using Sequential Images in MRI
Authors:
Jiahui Yin,
Xinxing Cheng,
Jinming Duan,
Yan Pang,
Declan O'Regan,
Hadrien Reynaud,
Qingjie Meng
Abstract:
Myocardial motion tracking is important for assessing cardiac function and diagnosing cardiovascular diseases, for which cine cardiac magnetic resonance (CMR) has been established as the gold standard imaging modality. Many existing methods learn motion from single image pairs consisting of a reference frame and a randomly selected target frame from the cardiac cycle. However, these methods overlo…
▽ More
Myocardial motion tracking is important for assessing cardiac function and diagnosing cardiovascular diseases, for which cine cardiac magnetic resonance (CMR) has been established as the gold standard imaging modality. Many existing methods learn motion from single image pairs consisting of a reference frame and a randomly selected target frame from the cardiac cycle. However, these methods overlook the continuous nature of cardiac motion and often yield inconsistent and non-smooth motion estimations. In this work, we propose a novel Mamba-based cardiac motion tracking network (MCM) that explicitly incorporates target image sequence from the cardiac cycle to achieve smooth and temporally consistent motion tracking. By developing a bi-directional Mamba block equipped with a bi-directional scanning mechanism, our method facilitates the estimation of plausible deformation fields. With our proposed motion decoder that integrates motion information from frames adjacent to the target frame, our method further enhances temporal coherence. Moreover, by taking advantage of Mamba's structured state-space formulation, the proposed method learns the continuous dynamics of the myocardium from sequential images without increasing computational complexity. We evaluate the proposed method on two public datasets. The experimental results demonstrate that the proposed method quantitatively and qualitatively outperforms both conventional and state-of-the-art learning-based cardiac motion tracking methods. The code is available at https://github.com/yjh-0104/MCM.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
How Should We Meta-Learn Reinforcement Learning Algorithms?
Authors:
Alexander David Goldie,
Zilin Wang,
Jakob Nicolaus Foerster,
Shimon Whiteson
Abstract:
The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has b…
▽ More
The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Perspective-Invariant 3D Object Detection
Authors:
Ao Liang,
Lingdong Kong,
Dongyue Lu,
Youquan Liu,
Jian Fang,
Huaici Zhao,
Wei Tsang Ooi
Abstract:
With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multipl…
▽ More
With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception systems across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Authors:
Lingdong Kong,
Dongyue Lu,
Ao Liang,
Rong Li,
Yuhao Dong,
Tianshuai Hu,
Lai Xing Ng,
Wei Tsang Ooi,
Benoit R. Cottereau
Abstract:
Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,0…
▽ More
Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,000 validated referring expressions, each enriched with four grounding attributes -- appearance, status, relation to viewer, and relation to other objects -- bridging spatial, temporal, and relational reasoning. To fully exploit these cues, we propose EventRefer, an attribute-aware grounding framework that dynamically fuses multi-attribute representations through a Mixture of Event-Attribute Experts (MoEE). Our method adapts to different modalities and scene dynamics, achieving consistent gains over state-of-the-art baselines in event-only, frame-only, and event-frame fusion settings. We hope our dataset and approach will establish a foundation for advancing multimodal, temporally-aware, and language-driven perception in real-world robotics and autonomy.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Event Detection for Active Lower Limb Prosthesis
Authors:
J. D. Clark,
P. Ellison
Abstract:
Accurate event detection is key to the successful design of semi-passive and powered prosthetics. Kinematically, the natural knee is complex, with translation and rotation components that have a substantial impact on gait characteristics. When simplified to a pin joint, some of this behaviour is lost. This study investigates the role of cruciate ligament stretch in event detection. A bicondylar kn…
▽ More
Accurate event detection is key to the successful design of semi-passive and powered prosthetics. Kinematically, the natural knee is complex, with translation and rotation components that have a substantial impact on gait characteristics. When simplified to a pin joint, some of this behaviour is lost. This study investigates the role of cruciate ligament stretch in event detection. A bicondylar knee design was used, constrained by analogues of the anterior and posterior cruciate ligaments. This offers the ability to characterize knee kinematics by the stretch of the ligaments. The ligament stretch was recorded using LVDTs parallel to the ligaments of the Russell knee on a bent knee crutch. Which was used to capture data on a treadmill at 3 speeds. This study finds speed dependence within the stretch of the cruciate ligaments, prominently around 5\% and 80\% of the gait cycle for the posterior and anterior. The cycle profile remains consistent with speed; therefore, other static events such as the turning point feature at around 90\% and 95\% of the cycle, for the posterior and anterior, respectively, could be used as a predictive precursor for initial contact. Likewise at 90\% and 95\%, another pair of turning points that in this case could be used to predict foot flat. This concludes that the use of a bicondylar knee design could improve the detection of events during the gait cycle, and therefore could increase the accuracy of subsequent controllers for powered prosthetics.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
SHINE: A Scalable HNSW Index in Disaggregated Memory
Authors:
Manuel Widmoser,
Daniel Kocher,
Nikolaus Augsten
Abstract:
Approximate nearest neighbor (ANN) search is a fundamental problem in computer science for which in-memory graph-based methods, such as Hierarchical Navigable Small World (HNSW), perform exceptionally well. To scale beyond billions of high-dimensional vectors, the index must be distributed. The disaggregated memory architecture physically separates compute and memory into two distinct hardware uni…
▽ More
Approximate nearest neighbor (ANN) search is a fundamental problem in computer science for which in-memory graph-based methods, such as Hierarchical Navigable Small World (HNSW), perform exceptionally well. To scale beyond billions of high-dimensional vectors, the index must be distributed. The disaggregated memory architecture physically separates compute and memory into two distinct hardware units and has become popular in modern data centers. Both units are connected via RDMA networks that allow compute nodes to directly access remote memory and perform all the computations, posing unique challenges for disaggregated indexes.
In this work, we propose a scalable HNSW index for ANN search in disaggregated memory. In contrast to existing distributed approaches, which partition the graph at the cost of accuracy, our method builds a graph-preserving index that reaches the same accuracy as a single-machine HNSW. Continuously fetching high-dimensional vector data from remote memory leads to severe network bandwidth limitations, which we overcome by employing an efficient caching mechanism. Since answering a single query involves processing numerous unique graph nodes, caching alone is not sufficient to achieve high scalability. We logically combine the caches of the compute nodes to increase the overall cache effectiveness and confirm the efficiency and scalability of our method in our evaluation.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Comparing performance of variational quantum algorithm simulations on HPC systems
Authors:
Marco De Pascale,
Tobias Valentin Bauer,
Yaknan John Gambo,
Mario Hernández Vera,
Stefan Huber,
Burak Mete,
Amit Jamadagni,
Amine Bentellis,
Marita Oliv,
Luigi Iapichino,
Jeanette Miriam Lorenz
Abstract:
Variational quantum algorithms are of special importance in the research on quantum computing applications because of their applicability to current Noisy Intermediate-Scale Quantum (NISQ) devices. The main building blocks of these algorithms (among them, the definition of the Hamiltonian and of the ansatz, the optimizer) define a relatively large parameter space, making the comparison of results…
▽ More
Variational quantum algorithms are of special importance in the research on quantum computing applications because of their applicability to current Noisy Intermediate-Scale Quantum (NISQ) devices. The main building blocks of these algorithms (among them, the definition of the Hamiltonian and of the ansatz, the optimizer) define a relatively large parameter space, making the comparison of results and performance between different approaches and software simulators cumbersome and prone to errors. In this paper, we employ a generic description of the problem, in terms of both Hamiltonian and ansatz, to port a problem definition consistently among different simulators. Three use cases of relevance for current quantum hardware (ground state calculation for the Hydrogen molecule, MaxCut, Travelling Salesman Problem) have been run on a set of HPC systems and software simulators to study the dependence of performance on the runtime environment, the scalability of the simulation codes and the mutual agreement of the physical results, respectively. The results show that our toolchain can successfully translate a problem definition between different simulators. On the other hand, variational algorithms are limited in their scaling by the long runtimes with respect to their memory footprint, so they expose limited parallelism to computation. This shortcoming is partially mitigated by using techniques like job arrays. The potential of the parser tool for exploring HPC performance and comparisons of results of variational algorithm simulations is highlighted.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling
Authors:
Xiaoxue Chen,
Bhargav Chandaka,
Chih-Hao Lin,
Ya-Qin Zhang,
David Forsyth,
Hao Zhao,
Shenlong Wang
Abstract:
We present InvRGB+L, a novel inverse rendering model that reconstructs large, relightable, and dynamic scenes from a single RGB+LiDAR sequence. Conventional inverse graphics methods rely primarily on RGB observations and use LiDAR mainly for geometric information, often resulting in suboptimal material estimates due to visible light interference. We find that LiDAR's intensity values-captured with…
▽ More
We present InvRGB+L, a novel inverse rendering model that reconstructs large, relightable, and dynamic scenes from a single RGB+LiDAR sequence. Conventional inverse graphics methods rely primarily on RGB observations and use LiDAR mainly for geometric information, often resulting in suboptimal material estimates due to visible light interference. We find that LiDAR's intensity values-captured with active illumination in a different spectral range-offer complementary cues for robust material estimation under variable lighting. Inspired by this, InvRGB+L leverages LiDAR intensity cues to overcome challenges inherent in RGB-centric inverse graphics through two key innovations: (1) a novel physics-based LiDAR shading model and (2) RGB-LiDAR material consistency losses. The model produces novel-view RGB and LiDAR renderings of urban and indoor scenes and supports relighting, night simulations, and dynamic object insertions, achieving results that surpass current state-of-the-art methods in both scene-level urban inverse rendering and LiDAR simulation.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Dual-branch Prompting for Multimodal Machine Translation
Authors:
Jie Wang,
Zhendong Yang,
Liansong Zong,
Xiaobo Zhang,
Dexian Wang,
Ji Zhang
Abstract:
Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often rely on paired image-text inputs at inference and are sensitive to irrelevant visual noise, which limits their robustness and practical applicability. To address these issues, we propose D2P-MMT, a diffusion-b…
▽ More
Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often rely on paired image-text inputs at inference and are sensitive to irrelevant visual noise, which limits their robustness and practical applicability. To address these issues, we propose D2P-MMT, a diffusion-based dual-branch prompting framework for robust vision-guided translation. Specifically, D2P-MMT requires only the source text and a reconstructed image generated by a pre-trained diffusion model, which naturally filters out distracting visual details while preserving semantic cues. During training, the model jointly learns from both authentic and reconstructed images using a dual-branch prompting strategy, encouraging rich cross-modal interactions. To bridge the modality gap and mitigate training-inference discrepancies, we introduce a distributional alignment loss that enforces consistency between the output distributions of the two branches. Extensive experiments on the Multi30K dataset demonstrate that D2P-MMT achieves superior translation performance compared to existing state-of-the-art approaches.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding
Authors:
Anna-Maria Halacheva,
Jan-Nico Zaech,
Sombit Dey,
Luc Van Gool,
Danda Pani Paudel
Abstract:
Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with applicati…
▽ More
Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with application-specific USD flavors. We identify challenges in utilizing holistic real-world scan datasets and present mitigation strategies. The efficacy of our approach is demonstrated through two downstream applications: LLM-based scene editing, enabling effective LLM understanding and adaptation of the data (80% success), and robotic simulation, achieving an 87% success rate in policy learning.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Robot-mediated physical Human-Human Interaction in Neurorehabilitation: a position paper
Authors:
Lorenzo Vianello,
Matthew Short,
Julia Manczurowsky,
Emek Barış Küçüktabak,
Francesco Di Tommaso,
Alessia Noccaro,
Laura Bandini,
Shoshana Clark,
Alaina Fiorenza,
Francesca Lunardini,
Alberto Canton,
Marta Gandolla,
Alessandra L. G. Pedrocchi,
Emilia Ambrosini,
Manuel Murie-Fernandez,
Carmen B. Roman,
Jesus Tornero,
Natacha Leon,
Andrew Sawers,
Jim Patton,
Domenico Formica,
Nevio Luigi Tagliamonte,
Georg Rauter,
Kilian Baur,
Fabian Just
, et al. (3 additional authors not shown)
Abstract:
Neurorehabilitation conventionally relies on the interaction between a patient and a physical therapist. Robotic systems can improve and enrich the physical feedback provided to patients after neurological injury, but they under-utilize the adaptability and clinical expertise of trained therapists. In this position paper, we advocate for a novel approach that integrates the therapist's clinical ex…
▽ More
Neurorehabilitation conventionally relies on the interaction between a patient and a physical therapist. Robotic systems can improve and enrich the physical feedback provided to patients after neurological injury, but they under-utilize the adaptability and clinical expertise of trained therapists. In this position paper, we advocate for a novel approach that integrates the therapist's clinical expertise and nuanced decision-making with the strength, accuracy, and repeatability of robotics: Robot-mediated physical Human-Human Interaction. This framework, which enables two individuals to physically interact through robotic devices, has been studied across diverse research groups and has recently emerged as a promising link between conventional manual therapy and rehabilitation robotics, harmonizing the strengths of both approaches. This paper presents the rationale of a multidisciplinary team-including engineers, doctors, and physical therapists-for conducting research that utilizes: a unified taxonomy to describe robot-mediated rehabilitation, a framework of interaction based on social psychology, and a technological approach that makes robotic systems seamless facilitators of natural human-human interaction.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Authors:
Xinyao Liu,
Diping Song
Abstract:
Multimodal large language models (MLLMs) demonstrate significant potential in the field of medical diagnosis. However, they face critical challenges in specialized domains such as ophthalmology, particularly the fragmentation of annotation granularity and inconsistencies in clinical reasoning logic, which hinder precise cross-modal understanding. This paper introduces FundusExpert, an ophthalmolog…
▽ More
Multimodal large language models (MLLMs) demonstrate significant potential in the field of medical diagnosis. However, they face critical challenges in specialized domains such as ophthalmology, particularly the fragmentation of annotation granularity and inconsistencies in clinical reasoning logic, which hinder precise cross-modal understanding. This paper introduces FundusExpert, an ophthalmology-specific MLLM with integrated positioning-diagnosis reasoning capabilities, along with FundusGen, a dataset constructed through the intelligent Fundus-Engine system. Fundus-Engine automates localization and leverages MLLM-based semantic expansion to integrate global disease classification, local object detection, and fine-grained feature analysis within a single fundus image. Additionally, by constructing a clinically aligned cognitive chain, it guides the model to generate interpretable reasoning paths. FundusExpert, fine-tuned with instruction data from FundusGen, achieves the best performance in ophthalmic question-answering tasks, surpassing the average accuracy of the 40B MedRegA by 26.6%. It also excels in zero-shot report generation tasks, achieving a clinical consistency of 77.0%, significantly outperforming GPT-4o's 47.6%. Furthermore, we reveal a scaling law between data quality and model capability ($L \propto N^{0.068}$), demonstrating that the cognitive alignment annotations in FundusGen enhance data utilization efficiency. By integrating region-level localization with diagnostic reasoning chains, our work develops a scalable, clinically-aligned MLLM and explores a pathway toward bridging the visual-language gap in specific MLLMs. Our project can be found at https://github.com/MeteorElf/FundusExpert.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Enabling Cyber Security Education through Digital Twins and Generative AI
Authors:
Vita Santa Barletta,
Vito Bavaro,
Miriana Calvano,
Antonio Curci,
Antonio Piccinno,
Davide Pio Posa
Abstract:
Digital Twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology), and IoT (Internet of Things) infrastructures, allowing for real time monitoring, threat analysis, and system simulation. This study investigates how integrating DTs with penetration testing tools and Large Language Models (LLMs) can enhance cy…
▽ More
Digital Twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology), and IoT (Internet of Things) infrastructures, allowing for real time monitoring, threat analysis, and system simulation. This study investigates how integrating DTs with penetration testing tools and Large Language Models (LLMs) can enhance cybersecurity education and operational readiness. By simulating realistic cyber environments, this approach offers a practical, interactive framework for exploring vulnerabilities and defensive strategies. At the core of this research is the Red Team Knife (RTK), a custom penetration testing toolkit aligned with the Cyber Kill Chain model. RTK is designed to guide learners through key phases of cyberattacks, including reconnaissance, exploitation, and response within a DT powered ecosystem. The incorporation of Large Language Models (LLMs) further enriches the experience by providing intelligent, real-time feedback, natural language threat explanations, and adaptive learning support during training exercises. This combined DT LLM framework is currently being piloted in academic settings to develop hands on skills in vulnerability assessment, threat detection, and security operations. Initial findings suggest that the integration significantly improves the effectiveness and relevance of cybersecurity training, bridging the gap between theoretical knowledge and real-world application. Ultimately, the research demonstrates how DTs and LLMs together can transform cybersecurity education to meet evolving industry demands.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
HOTA: Hamiltonian framework for Optimal Transport Advection
Authors:
Nazar Buzun,
Daniil Shlenskii,
Maxim Bobrin,
Dmitry V. Dylov
Abstract:
Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilto…
▽ More
Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilton-Jacobi-Bellman based method that tackles the dual dynamical OT problem explicitly through Kantorovich potentials, enabling efficient and scalable trajectory optimization. Our approach effectively evades the need for explicit density modeling, performing even when the cost functionals are non-smooth. Empirically, HOTA outperforms all baselines in standard benchmarks, as well as in custom datasets with non-differentiable costs, both in terms of feasibility and optimality.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems
Authors:
V. Slavin,
O. Kryvchikov,
D. Laptev
Abstract:
We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, cri…
▽ More
We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, critical transition points, and the effects of geometric frustration. It captures both local motifs and global symmetries, demonstrating that GNNs can infer magnetic behavior directly from structural connectivity. The proposed approach enables efficient prediction of magnetization without the need for additional Monte Carlo simulations.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
Authors:
Rashika Raina,
Nidhi Simmons,
David E. Simmons,
Michel Daoud Yacoub,
Trung Q. Duong
Abstract:
In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework. We first establish key theoretica…
▽ More
In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework. We first establish key theoretical properties of this system's outage probability (OP) under perfect calibration. Importantly, we show that as the number of resources grows, the OP of a perfectly calibrated predictor approaches the expected output conditioned on it being below the classification threshold. In contrast, when only one resource is available, the system's OP equals the model's overall expected output. We then derive the OP conditions for a perfectly calibrated predictor. These findings guide the choice of the classification threshold to achieve a desired OP, helping system designers meet specific reliability requirements. We also demonstrate that post-processing calibration cannot improve the system's minimum achievable OP, as it does not introduce new information about future channel states. Additionally, we show that well-calibrated models are part of a broader class of predictors that necessarily improve OP. In particular, we establish a monotonicity condition that the accuracy-confidence function must satisfy for such improvement to occur. To demonstrate these theoretical properties, we conduct a rigorous simulation-based analysis using post-processing calibration techniques: Platt scaling and isotonic regression. As part of this framework, the predictor is trained using an outage loss function specifically designed for this system. Furthermore, this analysis is performed on Rayleigh fading channels with temporal correlation captured by Clarke's 2D model, which accounts for receiver mobility.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Authors:
Alexander R. Fabbri,
Diego Mares,
Jorge Flores,
Meher Mankikar,
Ernesto Hernandez,
Dean Lee,
Bing Liu,
Chen Xing
Abstract:
Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoni…
▽ More
Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoning problems with context in English language/cultures. In this work, we introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark designed to assess LLMs on more than 1,000 native, linguistic and culturally grounded reasoning questions written by native speakers in French, Spanish, and Chinese. MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance. For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English. This set of English equivalents can provide a direct comparison of LLM reasoning capacity in other languages vs. English on the same reasoning questions. We systematically evaluate current 14 leading LLMs covering most LLM families on MultiNRC and its English equivalent set. The results show that (1) current LLMs are still not good at native multilingual reasoning, with none scoring above 50% on MultiNRC; (2) LLMs exhibit distinct strengths and weaknesses in handling linguistic, cultural, and logical reasoning tasks; (3) Most models perform substantially better in math reasoning in English compared to in original languages (+10%), indicating persistent challenges with culturally grounded knowledge.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors
Authors:
Wei-You Liao,
Yuxuan Du,
Xinbiao Wang,
Tian-Ci Tian,
Yong Luo,
Bo Du,
Dacheng Tao,
He-Liang Huang
Abstract:
The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models desig…
▽ More
The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models designed to emulate the mean-value behavior of a given quantum processor with provably computational efficiency. In particular, we propose two predictive surrogates that can substantially reduce the need for quantum processor access in diverse practical scenarios. To demonstrate their potential in advancing digital quantum simulation, we use these surrogates to emulate a quantum processor with up to 20 programmable superconducting qubits, enabling efficient pre-training of variational quantum eigensolvers for families of transverse-field Ising models and identification of non-equilibrium Floquet symmetry-protected topological phases. Experimental results reveal that the predictive surrogates not only reduce measurement overhead by orders of magnitude, but can also surpass the performance of conventional, quantum-resource-intensive approaches. Collectively, these findings establish predictive surrogates as a practical pathway to broadening the impact of advanced quantum processors.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization
Authors:
Sania Waheed,
Na Min An,
Michael Milford,
Sarvapali D. Ramchurn,
Shoaib Ehsan
Abstract:
Geo-localization from a single image at planet scale (essentially an advanced or extreme version of the kidnapped robot problem) is a fundamental and challenging task in applications such as navigation, autonomous driving and disaster response due to the vast diversity of locations, environmental conditions, and scene variations. Traditional retrieval-based methods for geo-localization struggle wi…
▽ More
Geo-localization from a single image at planet scale (essentially an advanced or extreme version of the kidnapped robot problem) is a fundamental and challenging task in applications such as navigation, autonomous driving and disaster response due to the vast diversity of locations, environmental conditions, and scene variations. Traditional retrieval-based methods for geo-localization struggle with scalability and perceptual aliasing, while classification-based approaches lack generalization and require extensive training data. Recent advances in vision-language models (VLMs) offer a promising alternative by leveraging contextual understanding and reasoning. However, while VLMs achieve high accuracy, they are often prone to hallucinations and lack interpretability, making them unreliable as standalone solutions. In this work, we propose a novel hybrid geo-localization framework that combines the strengths of VLMs with retrieval-based visual place recognition (VPR) methods. Our approach first leverages a VLM to generate a prior, effectively guiding and constraining the retrieval search space. We then employ a retrieval step, followed by a re-ranking mechanism that selects the most geographically plausible matches based on feature similarity and proximity to the initially estimated coordinates. We evaluate our approach on multiple geo-localization benchmarks and show that it consistently outperforms prior state-of-the-art methods, particularly at street (up to 4.51%) and city level (up to 13.52%). Our results demonstrate that VLM-generated geographic priors in combination with VPR lead to scalable, robust, and accurate geo-localization systems.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees
Authors:
Guanqin Zhang,
Kota Fukuda,
Zhenya Zhang,
H. M. N. Dilum Bandara,
Shiping Chen,
Jianjun Zhao,
Yulei Sui
Abstract:
The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary…
▽ More
The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-serve" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Physics-based Human Pose Estimation from a Single Moving RGB Camera
Authors:
Ayce Idil Aytekin,
Chuqiao Li,
Diogo Luvizon,
Rishabh Dabral,
Martin Oswald,
Marc Habermann,
Christian Theobalt
Abstract:
Most monocular and physics-based human pose tracking methods, while achieving state-of-the-art results, suffer from artifacts when the scene does not have a strictly flat ground plane or when the camera is moving. Moreover, these methods are often evaluated on in-the-wild real world videos without ground-truth data or on synthetic datasets, which fail to model the real world light transport, camer…
▽ More
Most monocular and physics-based human pose tracking methods, while achieving state-of-the-art results, suffer from artifacts when the scene does not have a strictly flat ground plane or when the camera is moving. Moreover, these methods are often evaluated on in-the-wild real world videos without ground-truth data or on synthetic datasets, which fail to model the real world light transport, camera motion, and pose-induced appearance and geometry changes. To tackle these two problems, we introduce MoviCam, the first non-synthetic dataset containing ground-truth camera trajectories of a dynamically moving monocular RGB camera, scene geometry, and 3D human motion with human-scene contact labels. Additionally, we propose PhysDynPose, a physics-based method that incorporates scene geometry and physical constraints for more accurate human motion tracking in case of camera motion and non-flat scenes. More precisely, we use a state-of-the-art kinematics estimator to obtain the human pose and a robust SLAM method to capture the dynamic camera trajectory, enabling the recovery of the human pose in the world frame. We then refine the kinematic pose estimate using our scene-aware physics optimizer. From our new benchmark, we found that even state-of-the-art methods struggle with this inherently challenging setting, i.e. a moving camera and non-planar environments, while our method robustly estimates both human and camera poses in world coordinates.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
The Wilhelm Tell Dataset of Affordance Demonstrations
Authors:
Rachel Ringe,
Mihai Pomarlan,
Nikolaos Tsiogkas,
Stefano De Giorgis,
Maria Hedblom,
Rainer Malaka
Abstract:
Affordances - i.e. possibilities for action that an environment or objects in it provide - are important for robots operating in human environments to perceive. Existing approaches train such capabilities on annotated static images or shapes. This work presents a novel dataset for affordance learning of common household tasks. Unlike previous approaches, our dataset consists of video sequences dem…
▽ More
Affordances - i.e. possibilities for action that an environment or objects in it provide - are important for robots operating in human environments to perceive. Existing approaches train such capabilities on annotated static images or shapes. This work presents a novel dataset for affordance learning of common household tasks. Unlike previous approaches, our dataset consists of video sequences demonstrating the tasks from first- and third-person perspectives, along with metadata about the affordances that are manifested in the task, and is aimed towards training perception systems to recognize affordance manifestations. The demonstrations were collected from several participants and in total record about seven hours of human activity. The variety of task performances also allows studying preparatory maneuvers that people may perform for a task, such as how they arrange their task space, which is also relevant for collaborative service robots.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Residual Prophet Inequalities
Authors:
Jose Correa,
Sebastian Perez-Salazar,
Dana Pizarro,
Bruno Ziliotto
Abstract:
We introduce a variant of the classic prophet inequality, called \emph{residual prophet inequality} (RPI). In the RPI problem, we consider a finite sequence of $n$ nonnegative independent random values with known distributions, and a known integer $0\leq k\leq n-1$. Before the gambler observes the sequence, the top $k$ values are removed, whereas the remaining $n-k$ values are streamed sequentiall…
▽ More
We introduce a variant of the classic prophet inequality, called \emph{residual prophet inequality} (RPI). In the RPI problem, we consider a finite sequence of $n$ nonnegative independent random values with known distributions, and a known integer $0\leq k\leq n-1$. Before the gambler observes the sequence, the top $k$ values are removed, whereas the remaining $n-k$ values are streamed sequentially to the gambler. For example, one can assume that the top $k$ values have already been allocated to a higher-priority agent. Upon observing a value, the gambler must decide irrevocably whether to accept or reject it, without the possibility of revisiting past values.
We study two variants of RPI, according to whether the gambler learns online of the identity of the variable that he sees (FI model) or not (NI model). Our main result is a randomized algorithm in the FI model with \emph{competitive ratio} of at least $1/(k+2)$, which we show is tight. Our algorithm is data-driven and requires access only to the $k+1$ largest values of a single sample from the $n$ input distributions. In the NI model, we provide a similar algorithm that guarantees a competitive ratio of $1/(2k+2)$. We further analyze independent and identically distributed instances when $k=1$. We build a single-threshold algorithm with a competitive ratio of at least 0.4901, and show that no single-threshold strategy can get a competitive ratio greater than 0.5464.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Investigating Training Data Detection in AI Coders
Authors:
Tianlin Li,
Yunxiang Wei,
Zhiming Li,
Aishan Liu,
Qing Guo,
Xianglong Liu,
Dongning Sun,
Yang Liu
Abstract:
Recent advances in code large language models (CodeLLMs) have made them indispensable tools in modern software engineering. However, these models occasionally produce outputs that contain proprietary or sensitive code snippets, raising concerns about potential non-compliant use of training data, and posing risks to privacy and intellectual property. To ensure responsible and compliant deployment o…
▽ More
Recent advances in code large language models (CodeLLMs) have made them indispensable tools in modern software engineering. However, these models occasionally produce outputs that contain proprietary or sensitive code snippets, raising concerns about potential non-compliant use of training data, and posing risks to privacy and intellectual property. To ensure responsible and compliant deployment of CodeLLMs, training data detection (TDD) has become a critical task. While recent TDD methods have shown promise in natural language settings, their effectiveness on code data remains largely underexplored. This gap is particularly important given code's structured syntax and distinct similarity criteria compared to natural language. To address this, we conduct a comprehensive empirical study of seven state-of-the-art TDD methods on source code data, evaluating their performance across eight CodeLLMs. To support this evaluation, we introduce CodeSnitch, a function-level benchmark dataset comprising 9,000 code samples in three programming languages, each explicitly labeled as either included or excluded from CodeLLM training. Beyond evaluation on the original CodeSnitch, we design targeted mutation strategies to test the robustness of TDD methods under three distinct settings. These mutation strategies are grounded in the well-established Type-1 to Type-4 code clone detection taxonomy. Our study provides a systematic assessment of current TDD techniques for code and offers insights to guide the development of more effective and robust detection methods in the future.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models
Authors:
Shen Tan,
Dong Zhou,
Xiangyu Shao,
Junqiao Wang,
Guanghui Sun
Abstract:
Open-vocabulary mobile manipulation (OVMM) that involves the handling of novel and unseen objects across different workspaces remains a significant challenge for real-world robotic applications. In this paper, we propose a novel Language-conditioned Open-Vocabulary Mobile Manipulation framework, named LOVMM, incorporating the large language model (LLM) and vision-language model (VLM) to tackle var…
▽ More
Open-vocabulary mobile manipulation (OVMM) that involves the handling of novel and unseen objects across different workspaces remains a significant challenge for real-world robotic applications. In this paper, we propose a novel Language-conditioned Open-Vocabulary Mobile Manipulation framework, named LOVMM, incorporating the large language model (LLM) and vision-language model (VLM) to tackle various mobile manipulation tasks in household environments. Our approach is capable of solving various OVMM tasks with free-form natural language instructions (e.g. "toss the food boxes on the office room desk to the trash bin in the corner", and "pack the bottles from the bed to the box in the guestroom"). Extensive experiments simulated in complex household environments show strong zero-shot generalization and multi-task learning abilities of LOVMM. Moreover, our approach can also generalize to multiple tabletop manipulation tasks and achieve better success rates compared to other state-of-the-art methods.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Realisability and Complementability of Multiparty Session Types
Authors:
Cinzia Di Giusto,
Etienne Lozes,
Pascal Urso
Abstract:
Multiparty session types (MPST) are a type-based approach for specifying message-passing distributed systems. They rely on the notion of global type specifying the global behaviour and local types, which are the projections of the global behaviour onto each local participant. An essential property of global types is realisability, i.e., whether the composition of the local behaviours conforms to t…
▽ More
Multiparty session types (MPST) are a type-based approach for specifying message-passing distributed systems. They rely on the notion of global type specifying the global behaviour and local types, which are the projections of the global behaviour onto each local participant. An essential property of global types is realisability, i.e., whether the composition of the local behaviours conforms to those specified by the global type. We explore how realisability of MPST relates to their complementability, i.e., whether there exists a global type that describes the complementary behaviour of the original global type. First, we show that if a global type is realisable with p2p communications, then it is realisable with synchronous communications. Second, we show that if a global type is realisable in the synchronous model, then it is complementable, in the sense that there exists a global type that describes the complementary behaviour of the original global type. Third, we give an algorithm to decide whether a complementable global type, given with an explicit complement, is realisable in p2p. The algorithm is PSPACE in the size of the global type and its complement. As a side contribution, we propose a complementation construction for global types with sender-driven choice with a linear blowup in the size of the global type.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
TOC-UCO: a comprehensive repository of tabular ordinal classification datasets
Authors:
Rafael Ayllón-Gavilán,
David Guijo-Rubio,
Antonio Manuel Gómez-Orellana,
David Guijo-Rubio,
Francisco Bérchez-Moreno,
Víctor Manuel Vargas-Yun,
Pedro A. Gutiérrez
Abstract:
An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC…
▽ More
An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of Córdoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD
Authors:
Rongwei Lu,
Jingyan Jiang,
Chunyang Li,
Haotian Dong,
Xingguang Wei,
Delin Cai,
Zhi Wang
Abstract:
Distributed machine learning in high end-to-end latency and low, varying bandwidth network environments undergoes severe throughput degradation. Due to its low communication requirements, distributed SGD (D-SGD) remains the mainstream optimizer in such challenging networks, but it still suffers from significant throughput reduction. To mitigate these limitations, existing approaches typically empl…
▽ More
Distributed machine learning in high end-to-end latency and low, varying bandwidth network environments undergoes severe throughput degradation. Due to its low communication requirements, distributed SGD (D-SGD) remains the mainstream optimizer in such challenging networks, but it still suffers from significant throughput reduction. To mitigate these limitations, existing approaches typically employ gradient compression and delayed aggregation to alleviate low bandwidth and high latency, respectively. To address both challenges simultaneously, these strategies are often combined, introducing a complex three-way trade-off among compression ratio, staleness (delayed synchronization steps), and model convergence rate. To achieve the balance under varying bandwidth conditions, an adaptive policy is required to dynamically adjust these parameters. Unfortunately, existing works rely on static heuristic strategies due to the lack of theoretical guidance, which prevents them from achieving this goal. This study fills in this theoretical gap by introducing a new theoretical tool, decomposing the joint optimization problem into a traditional convergence rate analysis with multiple analyzable noise terms. We are the first to reveal that staleness exponentially amplifies the negative impact of gradient compression on training performance, filling a critical gap in understanding how compressed and delayed gradients affect training. Furthermore, by integrating the convergence rate with a network-aware time minimization condition, we propose DeCo-SGD, which dynamically adjusts the compression ratio and staleness based on the real-time network condition and training task. DeCo-SGD achieves up to 5.07 and 1.37 speed-ups over D-SGD and static strategy in high-latency and low, varying bandwidth networks, respectively.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks
Authors:
Corrado Pezzato,
Ozan Çatal,
Toon Van de Maele,
Riddhi J. Pitliya,
Tim Verbelen
Abstract:
Despite growing interest in active inference for robotic control, its application to complex, long-horizon tasks remains untested. We address this gap by introducing a fully hierarchical active inference architecture for goal-directed behavior in realistic robotic settings. Our model combines a high-level active inference model that selects among discrete skills realized via a whole-body active in…
▽ More
Despite growing interest in active inference for robotic control, its application to complex, long-horizon tasks remains untested. We address this gap by introducing a fully hierarchical active inference architecture for goal-directed behavior in realistic robotic settings. Our model combines a high-level active inference model that selects among discrete skills realized via a whole-body active inference controller. This unified approach enables flexible skill composition, online adaptability, and recovery from task failures without requiring offline training. Evaluated on the Habitat Benchmark for mobile manipulation, our method outperforms state-of-the-art baselines across the three long-horizon tasks, demonstrating for the first time that active inference can scale to the complexity of modern robotics benchmarks.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image
Authors:
Hyeongjin Nam,
Donghwan Kim,
Gyeongsik Moon,
Kyoung Mu Lee
Abstract:
The misaligned human texture across different human parts is one of the main limitations of existing 3D human reconstruction methods. Each human part, such as a jacket or pants, should maintain a distinct texture without blending into others. The structural coherence of human parts serves as a crucial cue to infer human textures in the invisible regions of a single image. However, most existing 3D…
▽ More
The misaligned human texture across different human parts is one of the main limitations of existing 3D human reconstruction methods. Each human part, such as a jacket or pants, should maintain a distinct texture without blending into others. The structural coherence of human parts serves as a crucial cue to infer human textures in the invisible regions of a single image. However, most existing 3D human reconstruction methods do not explicitly exploit such part segmentation priors, leading to misaligned textures in their reconstructions. In this regard, we present PARTE, which utilizes 3D human part information as a key guide to reconstruct 3D human textures. Our framework comprises two core components. First, to infer 3D human part information from a single image, we propose a 3D part segmentation module (PartSegmenter) that initially reconstructs a textureless human surface and predicts human part labels based on the textureless surface. Second, to incorporate part information into texture reconstruction, we introduce a part-guided texturing module (PartTexturer), which acquires prior knowledge from a pre-trained image generation network on texture alignment of human parts. Extensive experiments demonstrate that our framework achieves state-of-the-art quality in 3D human reconstruction. The project page is available at https://hygenie1228.github.io/PARTE/.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task
Authors:
Milena Davudova,
Ziyuan Cai,
Valentina Giunchiglia,
Dragos C. Gruia,
Giulia Sanguedolce,
Adam Hampshire,
Fatemeh Geranmayeh
Abstract:
Detailed assessment of language impairment following stroke remains a cognitively complex and clinician-intensive task, limiting timely and scalable diagnosis. Automatic Speech Recognition (ASR) foundation models offer a promising pathway to augment human evaluation through intelligent systems, but their effectiveness in the context of speech and language impairment remains uncertain. In this stud…
▽ More
Detailed assessment of language impairment following stroke remains a cognitively complex and clinician-intensive task, limiting timely and scalable diagnosis. Automatic Speech Recognition (ASR) foundation models offer a promising pathway to augment human evaluation through intelligent systems, but their effectiveness in the context of speech and language impairment remains uncertain. In this study, we evaluate whether Whisper, a state-of-the-art ASR foundation model, can be applied to transcribe and analyze speech from patients with stroke during a commonly used picture-naming task. We assess both verbatim transcription accuracy and the model's ability to support downstream prediction of language function, which has major implications for outcomes after stroke. Our results show that the baseline Whisper model performs poorly on single-word speech utterances. Nevertheless, fine-tuning Whisper significantly improves transcription accuracy (reducing Word Error Rate by 87.72% in healthy speech and 71.22% in speech from patients). Further, learned representations from the model enable accurate prediction of speech quality (average F1 Macro of 0.74 for healthy, 0.75 for patients). However, evaluations on an unseen (TORGO) dataset reveal limited generalizability, highlighting the inability of Whisper to perform zero-shot transcription of single-word utterances on out-of-domain clinical speech and emphasizing the need to adapt models to specific clinical populations. While challenges remain in cross-domain generalization, these findings highlight the potential of foundation models, when appropriately fine-tuned, to advance automated speech and language assessment and rehabilitation for stroke-related impairments.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
Authors:
Dirk van der Hoeven,
Julia Olkhovskaia,
Tim van Erven
Abstract:
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $δ$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/δ) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introdu…
▽ More
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $δ$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/δ) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - δ$ $\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/δ)) /n$.
Furthermore, we also show that with sufficiently many observations relative to $\log(1/δ)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-δ$ $$
1/6 χ^2(\bar{p}\|p) \leq 1/4 χ^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/δ))/n\,, $$ where $χ^2$ denotes the $χ^2$-divergence.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Efficient Column-Wise N:M Pruning on RISC-V CPU
Authors:
Chi-Wei Chu,
Ding-Yong Hong,
Jan-Jan Wu
Abstract:
In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can signif…
▽ More
In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate's profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4.0x, and preserves ImageNet top-1 accuracy within 2.1\% of the dense baseline.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Integrating Belief Domains into Probabilistic Logic Programs
Authors:
Damiano Azzolini,
Fabrizio Riguzzi,
Theresa Swift
Abstract:
Probabilistic Logic Programming (PLP) under the Distribution Semantics is a leading approach to practical reasoning under uncertainty. An advantage of the Distribution Semantics is its suitability for implementation as a Prolog or Python library, available through two well-maintained implementations, namely ProbLog and cplint/PITA. However, current formulations of the Distribution Semantics use po…
▽ More
Probabilistic Logic Programming (PLP) under the Distribution Semantics is a leading approach to practical reasoning under uncertainty. An advantage of the Distribution Semantics is its suitability for implementation as a Prolog or Python library, available through two well-maintained implementations, namely ProbLog and cplint/PITA. However, current formulations of the Distribution Semantics use point-probabilities, making it difficult to express epistemic uncertainty, such as arises from, for example, hierarchical classifications from computer vision models. Belief functions generalize probability measures as non-additive capacities, and address epistemic uncertainty via interval probabilities. This paper introduces interval-based Capacity Logic Programs based on an extension of the Distribution Semantics to include belief functions, and describes properties of the new framework that make it amenable to practical applications.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments
Authors:
Shitong Zhu,
Chenhao Fang,
Derek Larson,
Neel Reddy Pochareddy,
Rajeev Rao,
Sophie Zeng,
Yanqing Peng,
Wendy Summer,
Alex Goncalves,
Arya Pudota,
Herve Robert
Abstract:
This paper presents Compliance Brain Assistant (CBA), a conversational, agentic AI assistant designed to boost the efficiency of daily compliance tasks for personnel in enterprise environments. To strike a good balance between response quality and latency, we design a user query router that can intelligently choose between (i) FastTrack mode: to handle simple requests that only need additional rel…
▽ More
This paper presents Compliance Brain Assistant (CBA), a conversational, agentic AI assistant designed to boost the efficiency of daily compliance tasks for personnel in enterprise environments. To strike a good balance between response quality and latency, we design a user query router that can intelligently choose between (i) FastTrack mode: to handle simple requests that only need additional relevant context retrieved from knowledge corpora; and (ii) FullAgentic mode: to handle complicated requests that need composite actions and tool invocations to proactively discover context across various compliance artifacts, and/or involving other APIs/models for accommodating requests. A typical example would be to start with a user query, use its description to find a specific entity and then use the entity's information to query other APIs for curating and enriching the final AI response.
Our experimental evaluations compared CBA against an out-of-the-box LLM on various real-world privacy/compliance-related queries targeting various personas. We found that CBA substantially improved upon the vanilla LLM's performance on metrics such as average keyword match rate (83.7% vs. 41.7%) and LLM-judge pass rate (82.0% vs. 20.0%). We also compared metrics for the full routing-based design against the `fast-track only` and `full-agentic` modes and found that it had a better average match-rate and pass-rate while keeping the run-time approximately the same. This finding validated our hypothesis that the routing mechanism leads to a good trade-off between the two worlds.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Fully Automated SAM for Single-source Domain Generalization in Medical Image Segmentation
Authors:
Huanli Zhuo,
Leilei Ma,
Haifeng Zhao,
Shiwei Zhou,
Dengdi Sun,
Yanping Fu
Abstract:
Although SAM-based single-source domain generalization models for medical image segmentation can mitigate the impact of domain shift on the model in cross-domain scenarios, these models still face two major challenges. First, the segmentation of SAM is highly dependent on domain-specific expert-annotated prompts, which prevents SAM from achieving fully automated medical image segmentation and ther…
▽ More
Although SAM-based single-source domain generalization models for medical image segmentation can mitigate the impact of domain shift on the model in cross-domain scenarios, these models still face two major challenges. First, the segmentation of SAM is highly dependent on domain-specific expert-annotated prompts, which prevents SAM from achieving fully automated medical image segmentation and therefore limits its application in clinical settings. Second, providing poor prompts (such as bounding boxes that are too small or too large) to the SAM prompt encoder can mislead SAM into generating incorrect mask results. Therefore, we propose the FA-SAM, a single-source domain generalization framework for medical image segmentation that achieves fully automated SAM. FA-SAM introduces two key innovations: an Auto-prompted Generation Model (AGM) branch equipped with a Shallow Feature Uncertainty Modeling (SUFM) module, and an Image-Prompt Embedding Fusion (IPEF) module integrated into the SAM mask decoder. Specifically, AGM models the uncertainty distribution of shallow features through the SUFM module to generate bounding box prompts for the target domain, enabling fully automated segmentation with SAM. The IPEF module integrates multiscale information from SAM image embeddings and prompt embeddings to capture global and local details of the target object, enabling SAM to mitigate the impact of poor prompts. Extensive experiments on publicly available prostate and fundus vessel datasets validate the effectiveness of FA-SAM and highlight its potential to address the above challenges.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Understanding Prompt Programming Tasks and Questions
Authors:
Jenny T. Liang,
Chenyang Yang,
Agnia Sergeyuk,
Travis D. Breaux,
Brad A. Myers
Abstract:
Prompting foundation models (FMs) like large language models (LLMs) have enabled new AI-powered software features (e.g., text summarization) that previously were only possible by fine-tuning FMs. Now, developers are embedding prompts in software, known as prompt programs. The process of prompt programming requires the developer to make many changes to their prompt. Yet, the questions developers as…
▽ More
Prompting foundation models (FMs) like large language models (LLMs) have enabled new AI-powered software features (e.g., text summarization) that previously were only possible by fine-tuning FMs. Now, developers are embedding prompts in software, known as prompt programs. The process of prompt programming requires the developer to make many changes to their prompt. Yet, the questions developers ask to update their prompt is unknown, despite the answers to these questions affecting how developers plan their changes. With the growing number of research and commercial prompt programming tools, it is unclear whether prompt programmers' needs are being adequately addressed. We address these challenges by developing a taxonomy of 25 tasks prompt programmers do and 51 questions they ask, measuring the importance of each task and question. We interview 16 prompt programmers, observe 8 developers make prompt changes, and survey 50 developers. We then compare the taxonomy with 48 research and commercial tools. We find that prompt programming is not well-supported: all tasks are done manually, and 16 of the 51 questions -- including a majority of the most important ones -- remain unanswered. Based on this, we outline important opportunities for prompt programming tools.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Authors:
Eyal German,
Sagiv Antebi,
Daniel Samira,
Asaf Shabtai,
Yuval Elovici
Abstract:
Large language models (LLMs) are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information (PII) in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the model and exposed through data extraction or membership inference attacks (MIAs). While existing MIA…
▽ More
Large language models (LLMs) are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information (PII) in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the model and exposed through data extraction or membership inference attacks (MIAs). While existing MIA methods primarily target textual content, their efficacy and threat implications may differ when applied to structured data, due to its limited content, diverse data types, unique value distributions, and column-level semantics. In this paper, we present Tab-MIA, a benchmark dataset for evaluating MIAs on tabular data in LLMs and demonstrate how it can be used. Tab-MIA comprises five data collections, each represented in six different encoding formats. Using our Tab-MIA benchmark, we conduct the first evaluation of state-of-the-art MIA methods on LLMs finetuned with tabular data across multiple encoding formats. In the evaluation, we analyze the memorization behavior of pretrained LLMs on structured data derived from Wikipedia tables. Our findings show that LLMs memorize tabular data in ways that vary across encoding formats, making them susceptible to extraction via MIAs. Even when fine-tuned for as few as three epochs, models exhibit high vulnerability, with AUROC scores approaching 90% in most cases. Tab-MIA enables systematic evaluation of these risks and provides a foundation for developing privacy-preserving methods for tabular data in LLMs.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations
Authors:
Xiaoan Liu,
Difan Jia,
Xianhao Carton Liu,
Mar Gonzalez-Franco,
Chen Zhu-Tian
Abstract:
Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from th…
▽ More
Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs
Authors:
Haolin Jin,
Mengbai Xiao,
Yuan Yuan,
Xiao Zhang,
Dongxiao Yu,
Guanghui Zhang,
Haoliang Wang
Abstract:
The Transformer architecture has revolutionized deep learning, delivering the state-of-the-art performance in areas such as natural language processing, computer vision, and time series prediction. However, its core component, self-attention, has the quadratic time complexity relative to input sequence length, which hinders the scalability of Transformers. The exsiting approaches on optimizing sel…
▽ More
The Transformer architecture has revolutionized deep learning, delivering the state-of-the-art performance in areas such as natural language processing, computer vision, and time series prediction. However, its core component, self-attention, has the quadratic time complexity relative to input sequence length, which hinders the scalability of Transformers. The exsiting approaches on optimizing self-attention either discard full-contextual information or lack of flexibility. In this work, we design DistrAttention, an effcient and flexible self-attention mechanism with the full context. DistrAttention achieves this by grouping data on the embedding dimensionality, usually referred to as $d$. We realize DistrAttention with a lightweight sampling and fusion method that exploits locality-sensitive hashing to group similar data. A block-wise grouping framework is further designed to limit the errors introduced by locality sensitive hashing. By optimizing the selection of block sizes, DistrAttention could be easily integrated with FlashAttention-2, gaining high-performance on modern GPUs. We evaluate DistrAttention with extensive experiments. The results show that our method is 37% faster than FlashAttention-2 on calculating self-attention. In ViT inference, DistrAttention is the fastest and the most accurate among approximate self-attention mechanisms. In Llama3-1B, DistrAttention still achieves the lowest inference time with only 1% accuray loss.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
On the Feasibility of Quantum Unit Testing
Authors:
Andriy Miranskyy,
José Campos,
Anila Mjeda,
Lei Zhang,
Ignacio García Rodríguez de Guzmán
Abstract:
The increasing complexity of quantum software presents significant challenges for software verification and validation, particularly in the context of unit testing. This work presents a comprehensive study on quantum-centric unit tests, comparing traditional statistical approaches with tests specifically designed for quantum circuits. These include tests that run only on a classical computer, such…
▽ More
The increasing complexity of quantum software presents significant challenges for software verification and validation, particularly in the context of unit testing. This work presents a comprehensive study on quantum-centric unit tests, comparing traditional statistical approaches with tests specifically designed for quantum circuits. These include tests that run only on a classical computer, such as the Statevector test, as well as those executable on quantum hardware, such as the Swap test and the novel Inverse test. Through an empirical study and detailed analysis on 1,796,880 mutated quantum circuits, we investigate (a) each test's ability to detect subtle discrepancies between the expected and actual states of a quantum circuit, and (b) the number of measurements required to achieve high reliability. The results demonstrate that quantum-centric tests, particularly the Statevector test and the Inverse test, provide clear advantages in terms of precision and efficiency, reducing both false positives and false negatives compared to statistical tests. This work contributes to the development of more robust and scalable strategies for testing quantum software, supporting the future adoption of fault-tolerant quantum computers and promoting more reliable practices in quantum software engineering.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.