Search | arXiv e-print repository

Towards Autonomous Sustainability Assessment via Multimodal AI Agents

Authors: Zhihan Zhang, Alexander Metzger, Yuxuan Mei, Felix Hähnlein, Zachary Englhardt, Tingyu Cheng, Gregory D. Abowd, Shwetak Patel, Adriana Schulz, Vikram Iyer

Abstract: Interest in sustainability information has surged in recent years. However, the data required for a life cycle assessment (LCA) that maps the materials and processes from product manufacturing to disposal into environmental impacts (EI) are often unavailable. Here we reimagine conventional LCA by introducing multimodal AI agents that emulate interactions between LCA experts and stakeholders like p… ▽ More Interest in sustainability information has surged in recent years. However, the data required for a life cycle assessment (LCA) that maps the materials and processes from product manufacturing to disposal into environmental impacts (EI) are often unavailable. Here we reimagine conventional LCA by introducing multimodal AI agents that emulate interactions between LCA experts and stakeholders like product managers and engineers to calculate the cradle-to-gate (production) carbon emissions of electronic devices. The AI agents iteratively generate a detailed life-cycle inventory leveraging a custom data abstraction and software tools that extract information from online text and images from repair communities and government certifications. This approach reduces weeks or months of expert time to under one minute and closes data availability gaps while yielding carbon footprint estimates within 19% of expert LCAs with zero proprietary data. Additionally, we develop a method to directly estimate EI by comparing an input to a cluster of products with similar descriptions and known carbon footprints. This runs in 3 ms on a laptop with a MAPE of 12.28% on electronic products. Further, we develop a data-driven method to generate emission factors. We use the properties of an unknown material to represent it as a weighted sum of emission factors for similar materials. Compared to human experts picking the closest LCA database entry, this improves MAPE by 120.26%. We analyze the data and compute scaling of this approach and discuss its implications for future LCA workflows. △ Less

Submitted 22 July, 2025; originally announced July 2025.

arXiv:2507.11889 [pdf, ps, other]

NemeSys: An Online Underwater Explorer with Goal-Driven Adaptive Autonomy

Authors: Adnan Abdullah, Alankrit Gupta, Vaishnav Ramesh, Shivali Patel, Md Jahidul Islam

Abstract: Adaptive mission control and dynamic parameter reconfiguration are essential for autonomous underwater vehicles (AUVs) operating in GPS-denied, communication-limited marine environments. However, most current AUV platforms execute static, pre-programmed missions or rely on tethered connections and high-latency acoustic channels for mid-mission updates, significantly limiting their adaptability and… ▽ More Adaptive mission control and dynamic parameter reconfiguration are essential for autonomous underwater vehicles (AUVs) operating in GPS-denied, communication-limited marine environments. However, most current AUV platforms execute static, pre-programmed missions or rely on tethered connections and high-latency acoustic channels for mid-mission updates, significantly limiting their adaptability and responsiveness. In this paper, we introduce NemeSys, a novel AUV system designed to support real-time mission reconfiguration through compact optical and magnetoelectric (OME) signaling facilitated by floating buoys. We present the full system design, control architecture, and a semantic mission encoding framework that enables interactive exploration and task adaptation via low-bandwidth communication. The proposed system is validated through analytical modeling, controlled experimental evaluations, and open-water trials. Results confirm the feasibility of online mission adaptation and semantic task updates, highlighting NemeSys as an online AUV platform for goal-driven adaptive autonomy in dynamic and uncertain underwater environments. △ Less

Submitted 16 July, 2025; originally announced July 2025.

Comments: 10 pages, V1

arXiv:2507.00990 [pdf, ps, other]

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

Authors: Shivansh Patel, Shraddhaa Mohan, Hanlin Mai, Unnat Jain, Svetlana Lazebnik, Yunzhu Li

Abstract: This work introduces Robots Imitating Generated Videos (RIGVid), a system that enables robots to perform complex manipulation tasks--such as pouring, wiping, and mixing--purely by imitating AI-generated videos, without requiring any physical demonstrations or robot-specific training. Given a language command and an initial scene image, a video diffusion model generates potential demonstration vide… ▽ More This work introduces Robots Imitating Generated Videos (RIGVid), a system that enables robots to perform complex manipulation tasks--such as pouring, wiping, and mixing--purely by imitating AI-generated videos, without requiring any physical demonstrations or robot-specific training. Given a language command and an initial scene image, a video diffusion model generates potential demonstration videos, and a vision-language model (VLM) automatically filters out results that do not follow the command. A 6D pose tracker then extracts object trajectories from the video, and the trajectories are retargeted to the robot in an embodiment-agnostic fashion. Through extensive real-world evaluations, we show that filtered generated videos are as effective as real demonstrations, and that performance improves with generation quality. We also show that relying on generated videos outperforms more compact alternatives such as keypoint prediction using VLMs, and that strong 6D pose tracking outperforms other ways to extract trajectories, such as dense feature point tracking. These findings suggest that videos produced by a state-of-the-art off-the-shelf model can offer an effective source of supervision for robotic manipulation. △ Less

Submitted 4 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

Comments: Project Page: https://rigvid-robot.github.io/

arXiv:2506.23414 [pdf, ps, other]

A High-Throughput Platform to Bench Test Smartphone-Based Heart Rate Measurements Derived From Video

Authors: Ming-Zher Poh, Jonathan Wang, Jonathan Hsu, Lawrence Cai, Eric Teasley, James A. Taylor, Jameson K. Rogers, Anupam Pathak, Shwetak Patel

Abstract: Smartphone-based heart rate (HR) monitoring apps using finger-over-camera photoplethysmography (PPG) face significant challenges in performance evaluation and device compatibility due to device variability and fragmentation. Manual testing is impractical, and standardized methods are lacking. This paper presents a novel, high-throughput bench-testing platform to address this critical need. We desi… ▽ More Smartphone-based heart rate (HR) monitoring apps using finger-over-camera photoplethysmography (PPG) face significant challenges in performance evaluation and device compatibility due to device variability and fragmentation. Manual testing is impractical, and standardized methods are lacking. This paper presents a novel, high-throughput bench-testing platform to address this critical need. We designed a system comprising a test rig capable of holding 12 smartphones for parallel testing, a method for generating synthetic PPG test videos with controllable HR and signal quality, and a host machine for coordinating video playback and data logging. The system achieved a mean absolute percentage error (MAPE) of 0.11% +/- 0.001% between input and measured HR, and a correlation coefficient of 0.92 +/- 0.008 between input and measured PPG signals using a clinically-validated smartphone-based HR app. Bench-testing results of 20 different smartphone models correctly classified all the devices as meeting the ANSI/CTA accuracy standards for HR monitors (MAPE <10%) when compared to a prospective clinical study with 80 participants, demonstrating high positive predictive value. This platform offers a scalable solution for pre-deployment testing of smartphone HR apps to improve app performance, ensure device compatibility, and advance the field of mobile health. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.17484 [pdf, ps, other]

doi 10.1145/XXXXXX.XXXXXX

From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases

Authors: Yao Zhang, Zaixi Shang, Silpan Patel, Mikel Zuniga

Abstract: Supply chain operations generate vast amounts of operational data; however, critical knowledge such as system usage practices, troubleshooting workflows, and resolution techniques often remains buried within unstructured communications like support tickets, emails, and chat logs. While RAG systems aim to leverage such communications as a knowledge base, their effectiveness is limited by raw data c… ▽ More Supply chain operations generate vast amounts of operational data; however, critical knowledge such as system usage practices, troubleshooting workflows, and resolution techniques often remains buried within unstructured communications like support tickets, emails, and chat logs. While RAG systems aim to leverage such communications as a knowledge base, their effectiveness is limited by raw data challenges: support tickets are typically noisy, inconsistent, and incomplete, making direct retrieval suboptimal. Unlike existing RAG approaches that focus on runtime optimization, we introduce a novel offline-first methodology that transforms these communications into a structured knowledge base. Our key innovation is a LLMs-based multi-agent system orchestrating three specialized agents: Category Discovery for taxonomy creation, Categorization for ticket grouping, and Knowledge Synthesis for article generation. Applying our methodology to real-world support tickets with resolution notes and comments, our system creates a compact knowledge base - reducing total volume to just 3.4% of original ticket data while improving quality. Experiments demonstrate that our prebuilt knowledge base in RAG systems significantly outperforms traditional RAG implementations (48.74% vs. 38.60% helpful answers) and achieves a 77.4% reduction in unhelpful responses. By automating institutional knowledge capture that typically remains siloed in experts' heads, our solution translates to substantial operational efficiency: reducing support workload, accelerating resolution times, and creating self-improving systems that automatically resolve approximately 50% of future supply chain tickets. Our approach addresses a key gap in knowledge management by transforming transient communications into structured, reusable knowledge through intelligent offline processing rather than latency-inducing runtime architectures. △ Less

Submitted 20 June, 2025; originally announced June 2025.

Comments: Accepted In Proceedings of the 1st Workshop on AI for Supply Chain: Today and Future @ 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD 25), August 3, 2025, Toronto, ON, Canada. ACM, New York, NY, USA, 14 pages, 2 figures

arXiv:2506.09108 [pdf, ps, other]

SensorLM: Learning the Language of Wearable Sensors

Authors: Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

Abstract: We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel… ▽ More We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people. Furthermore, SensorLM extends prominent multimodal pretraining architectures (e.g., CLIP, CoCa) and recovers them as specific variants within a generic architecture. Extensive experiments on real-world tasks in human activity analysis and healthcare verify the superior performance of SensorLM over state-of-the-art in zero-shot recognition, few-shot learning, and cross-modal retrieval. SensorLM also demonstrates intriguing capabilities including scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to unseen tasks. △ Less

Submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.08249 [pdf, other]

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Authors: Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro… ▽ More Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.07621 [pdf, ps, other]

LoRMA: Low-Rank Multiplicative Adaptation for LLMs

Authors: Harsh Bihany, Shubham Patel, Ashutosh Modi

Abstract: Large Language Models have shown remarkable capabilities in the NLP domain. Their effectiveness can mainly be attributed to their ability to adapt to an array of downstream tasks. However, generally, full fine-tuning is a computationally expensive job. To mitigate this, many techniques have been developed that prime efficiency, a prominent one being Low-Rank Adaptation (LoRA). However, LoRA and it… ▽ More Large Language Models have shown remarkable capabilities in the NLP domain. Their effectiveness can mainly be attributed to their ability to adapt to an array of downstream tasks. However, generally, full fine-tuning is a computationally expensive job. To mitigate this, many techniques have been developed that prime efficiency, a prominent one being Low-Rank Adaptation (LoRA). However, LoRA and its variants employ re-parametrized additive updates. In this paper, we propose Low-Rank Multiplicative Adaptation (LoRMA), which shifts the paradigm of additive updates to a richer space of matrix multiplicative transformations. We tackle challenges such as computational complexity and rank bottleneck of matrix multiplication by effectively re-ordering operations and introducing rank inflation strategies. We conduct extensive experiments to demonstrate the effectiveness of our approach in terms of various evaluation metrics. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: Accepted at ACL Findings 2025; 21 pages (9 main paper + 5 pages references + 7 pages appendix)

arXiv:2506.05321 [pdf, other]

LSM-2: Learning from Incomplete Wearable Sensor Data

Authors: Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff

Abstract: Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-… ▽ More Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: Xu and Narayanswamy are co-first authors. McDuff and Liu are co-last authors

arXiv:2506.04515 [pdf, ps, other]

The Latent Space Hypothesis: Toward Universal Medical Representation Learning

Authors: Salil Patel

Abstract: Medical data range from genomic sequences and retinal photographs to structured laboratory results and unstructured clinical narratives. Although these modalities appear disparate, many encode convergent information about a single underlying physiological state. The Latent Space Hypothesis frames each observation as a projection of a unified, hierarchically organized manifold -- much like shadows… ▽ More Medical data range from genomic sequences and retinal photographs to structured laboratory results and unstructured clinical narratives. Although these modalities appear disparate, many encode convergent information about a single underlying physiological state. The Latent Space Hypothesis frames each observation as a projection of a unified, hierarchically organized manifold -- much like shadows cast by the same three-dimensional object. Within this learned geometric representation, an individual's health status occupies a point, disease progression traces a trajectory, and therapeutic intervention corresponds to a directed vector. Interpreting heterogeneous evidence in a shared space provides a principled way to re-examine eponymous conditions -- such as Parkinson's or Crohn's -- that often mask multiple pathophysiological entities and involve broader anatomical domains than once believed. By revealing sub-trajectories and patient-specific directions of change, the framework supplies a quantitative rationale for personalised diagnosis, longitudinal monitoring, and tailored treatment, moving clinical practice away from grouping by potentially misleading labels toward navigation of each person's unique trajectory. Challenges remain -- bias amplification, data scarcity for rare disorders, privacy, and the correlation-causation divide -- but scale-aware encoders, continual learning on longitudinal data streams, and perturbation-based validation offer plausible paths forward. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 51 pages, 12 figures. A position paper examining the latent space hypothesis - the proposition that diverse medical data can be represented in shared latent spaces reflecting fundamental biological processes. The paper discusses theoretical foundations, reviews supporting evidence, and considers potential implications for medical AI and representation learning

arXiv:2506.04467 [pdf]

Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy

Authors: Yuzhen Ding, Jason Holmes, Hongying Feng, Martin Bues, Lisa A. McGee, Jean-Claude M. Rwigema, Nathan Y. Yu, Terence S. Sio, Sameer R. Keole, William W. Wong, Steven E. Schild, Jonathan B. Ashman, Sujay A. Vora, Daniel J. Ma, Samir H. Patel, Wei Liu

Abstract: Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but… ▽ More Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but degrades accuracy. To address this, denoising low-statistics MC dose maps is proposed to enable fast, high-quality dose generation. Methods: We developed a diffusion transformer-based denoising framework. IMPT plans and 3D CT images from 80 H&N patients were used to generate noisy and high-statistics dose maps using MCsquare (1 min and 10 min per plan, respectively). Data were standardized into uniform chunks with zero-padding, normalized, and transformed into quasi-Gaussian distributions. Testing was done on 10 H&N, 10 lung, 10 breast, and 10 prostate cancer cases, preprocessed identically. The model was trained with noisy dose maps and CT images as input and high-statistics dose maps as ground truth, using a combined loss of mean square error (MSE), residual loss, and regional MAE (focusing on top/bottom 10% dose voxels). Performance was assessed via MAE, 3D Gamma passing rate, and DVH indices. Results: The model achieved MAEs of 0.195 (H&N), 0.120 (lung), 0.172 (breast), and 0.376 Gy[RBE] (prostate). 3D Gamma passing rates exceeded 92% (3%/2mm) across all sites. DVH indices for clinical target volumes (CTVs) and OARs closely matched the ground truth. Conclusion: A diffusion transformer-based denoising framework was developed and, though trained only on H&N data, generalizes well across multiple disease sites. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2505.18839 [pdf, ps, other]

DNF Learning via Locally Mixing Random Walks

Authors: Josh Alman, Shivam Nadimpalli, Shyamal Patel, Rocco A. Servedio

Abstract: We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$. (1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF… ▽ More We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$. (1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF formula $f = T_1 \vee \cdots \vee T_s$ over $\{0,1\}^n$ and any unknown distribution $D$ over $\{0,1\}^n$, our algorithm, which uses membership queries and random examples from $D$, runs in $\textsf{quasipoly}(n,s)$ time and outputs a list $L$ of candidate terms such that with high probability some term $T_i$ of $f$ belongs to $L$. (2) We then use result (1) to give a $\textsf{quasipoly}(n,s)$-time algorithm, in the distribution-free PAC learning model with membership queries, for learning the class of size-$s$ DNFs in which all terms have the same size. Our algorithm learns using a DNF hypothesis. The key tool used to establish result (1) is a new result on "locally mixing random walks," which, roughly speaking, shows that a random walk on a graph that is covered by a small number of expanders has a non-negligible probability of mixing quickly in a subset of these expanders. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.06201 [pdf, other]

Decoding Algorithms for Two-dimensional Constacyclic Codes over $\mathbb{F}_q$

Authors: Vidya Sagar, Shikha Patel, Shayan Srinivasa Garani

Abstract: We derive the spectral domain properties of two-dimensional (2-D) $(λ_1, λ_2)$-constacyclic codes over $\mathbb{F}_q$ using the 2-D finite field Fourier transform (FFFT). Based on the spectral nulls of 2-D $(λ_1, λ_2)$-constacyclic codes, we characterize the structure of 2-D constacyclic coded arrays. The proposed 2-D construction has flexible code rates and works for any code areas, be it odd or… ▽ More We derive the spectral domain properties of two-dimensional (2-D) $(λ_1, λ_2)$-constacyclic codes over $\mathbb{F}_q$ using the 2-D finite field Fourier transform (FFFT). Based on the spectral nulls of 2-D $(λ_1, λ_2)$-constacyclic codes, we characterize the structure of 2-D constacyclic coded arrays. The proposed 2-D construction has flexible code rates and works for any code areas, be it odd or even area. We present an algorithm to detect the location of 2-D errors. Further, we also propose decoding algorithms for extracting the error values using both time and frequency domain properties by exploiting the sparsity that arises due to duality in the time and frequency domains. Through several illustrative examples, we demonstrate the working of the proposed decoding algorithms. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 26 pages, 1 figure

arXiv:2505.06046 [pdf, other]

Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information

Authors: Joshua Harris, Fan Grayson, Felix Feldman, Timothy Laurence, Toby Nonnenmacher, Oliver Higgins, Leo Loman, Selina Patel, Thomas Finnie, Samuel Collins, Michael Borowitz

Abstract: As Large Language Models (LLMs) become widely accessible, a detailed understanding of their knowledge within specific domains becomes necessary for successful real world use. This is particularly critical in public health, where failure to retrieve relevant, accurate, and current information could significantly impact UK residents. However, currently little is known about LLM knowledge of UK Gover… ▽ More As Large Language Models (LLMs) become widely accessible, a detailed understanding of their knowledge within specific domains becomes necessary for successful real world use. This is particularly critical in public health, where failure to retrieve relevant, accurate, and current information could significantly impact UK residents. However, currently little is known about LLM knowledge of UK Government public health information. To address this issue, this paper introduces a new benchmark, PubHealthBench, with over 8000 questions for evaluating LLMs' Multiple Choice Question Answering (MCQA) and free form responses to public health queries. To create PubHealthBench we extract free text from 687 current UK government guidance documents and implement an automated pipeline for generating MCQA samples. Assessing 24 LLMs on PubHealthBench we find the latest private LLMs (GPT-4.5, GPT-4.1 and o1) have a high degree of knowledge, achieving >90% accuracy in the MCQA setup, and outperform humans with cursory search engine use. However, in the free form setup we see lower performance with no model scoring >75%. Importantly we find in both setups LLMs have higher accuracy on guidance intended for the general public. Therefore, there are promising signs that state of the art (SOTA) LLMs are an increasingly accurate source of public health information, but additional safeguards or tools may still be needed when providing free form responses on public health topics. △ Less

Submitted 15 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

Comments: 24 pages, 10 pages main text

MSC Class: 68T50

arXiv:2505.03784 [pdf, ps, other]

Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers

Authors: Ahmed A. Metwally, A. Ali Heydari, Daniel McDuff, Alexandru Solot, Zeinab Esmaeilpour, Anthony Z Faranesh, Menglian Zhou, David B. Savage, Conor Heneghan, Shwetak Patel, Cathy Speed, Javier L. Prieto

Abstract: Insulin resistance, a precursor to type 2 diabetes, is characterized by impaired insulin action in tissues. Current methods for measuring insulin resistance, while effective, are expensive, inaccessible, not widely available and hinder opportunities for early intervention. In this study, we remotely recruited the largest dataset to date across the US to study insulin resistance (N=1,165 participan… ▽ More Insulin resistance, a precursor to type 2 diabetes, is characterized by impaired insulin action in tissues. Current methods for measuring insulin resistance, while effective, are expensive, inaccessible, not widely available and hinder opportunities for early intervention. In this study, we remotely recruited the largest dataset to date across the US to study insulin resistance (N=1,165 participants, with median BMI=28 kg/m2, age=45 years, HbA1c=5.4%), incorporating wearable device time series data and blood biomarkers, including the ground-truth measure of insulin resistance, homeostatic model assessment for insulin resistance (HOMA-IR). We developed deep neural network models to predict insulin resistance based on readily available digital and blood biomarkers. Our results show that our models can predict insulin resistance by combining both wearable data and readily available blood biomarkers better than either of the two data sources separately (R2=0.5, auROC=0.80, Sensitivity=76%, and specificity 84%). The model showed 93% sensitivity and 95% adjusted specificity in obese and sedentary participants, a subpopulation most vulnerable to developing type 2 diabetes and who could benefit most from early intervention. Rigorous evaluation of model performance, including interpretability, and robustness, facilitates generalizability across larger cohorts, which is demonstrated by reproducing the prediction performance on an independent validation cohort (N=72 participants). Additionally, we demonstrated how the predicted insulin resistance can be integrated into a large language model agent to help understand and contextualize HOMA-IR values, facilitating interpretation and safe personalized recommendations. This work offers the potential for early detection of people at risk of type 2 diabetes and thereby facilitate earlier implementation of preventative strategies. △ Less

Submitted 30 April, 2025; originally announced May 2025.

arXiv:2504.20086 [pdf, ps, other]

doi 10.1145/3715275.3732168

Understanding and Mitigating Risks of Generative AI in Financial Services

Authors: Sebastian Gehrmann, Claire Huang, Xian Teng, Sergei Yurovski, Iyanuoluwa Shode, Chirag S. Patel, Arjun Bhorkar, Naveen Thomas, John Doucette, David Rosenberg, Mark Dredze, David Rabinowitz

Abstract: To responsibly develop Generative AI (GenAI) products, it is critical to define the scope of acceptable inputs and outputs. What constitutes a "safe" response is an actively debated question. Academic work puts an outsized focus on evaluating models by themselves for general purpose aspects such as toxicity, bias, and fairness, especially in conversational applications being used by a broad audien… ▽ More To responsibly develop Generative AI (GenAI) products, it is critical to define the scope of acceptable inputs and outputs. What constitutes a "safe" response is an actively debated question. Academic work puts an outsized focus on evaluating models by themselves for general purpose aspects such as toxicity, bias, and fairness, especially in conversational applications being used by a broad audience. In contrast, less focus is put on considering sociotechnical systems in specialized domains. Yet, those specialized systems can be subject to extensive and well-understood legal and regulatory scrutiny. These product-specific considerations need to be set in industry-specific laws, regulations, and corporate governance requirements. In this paper, we aim to highlight AI content safety considerations specific to the financial services domain and outline an associated AI content risk taxonomy. We compare this taxonomy to existing work in this space and discuss implications of risk category violations on various stakeholders. We evaluate how existing open-source technical guardrail solutions cover this taxonomy by assessing them on data collected via red-teaming activities. Our results demonstrate that these guardrails fail to detect most of the content risks we discuss. △ Less

Submitted 25 April, 2025; originally announced April 2025.

Comments: Accepted to FAccT 2025

arXiv:2504.16791 [pdf, other]

Radiometer Calibration using Machine Learning

Authors: S. A. K. Leeney, H. T. J. Bevins, E. de Lera Acedo, W. J. Handley, C. Kirkham, R. S. Patel, J. Zhu, D. Molnar, J. Cumner, D. Anstey, K. Artuc, G. Bernardi, M. Bucher, S. Carey, J. Cavillot, R. Chiello, W. Croukamp, D. I. L. de Villiers, J. A. Ely, A. Fialkov, T. Gessey-Jones, G. Kulkarni, A. Magro, P. D. Meerburg, S. Mittal , et al. (13 additional authors not shown)

Abstract: Radiometers are crucial instruments in radio astronomy, forming the primary component of nearly all radio telescopes. They measure the intensity of electromagnetic radiation, converting this radiation into electrical signals. A radiometer's primary components are an antenna and a Low Noise Amplifier (LNA), which is the core of the ``receiver'' chain. Instrumental effects introduced by the receiver… ▽ More Radiometers are crucial instruments in radio astronomy, forming the primary component of nearly all radio telescopes. They measure the intensity of electromagnetic radiation, converting this radiation into electrical signals. A radiometer's primary components are an antenna and a Low Noise Amplifier (LNA), which is the core of the ``receiver'' chain. Instrumental effects introduced by the receiver are typically corrected or removed during calibration. However, impedance mismatches between the antenna and receiver can introduce unwanted signal reflections and distortions. Traditional calibration methods, such as Dicke switching, alternate the receiver input between the antenna and a well-characterised reference source to mitigate errors by comparison. Recent advances in Machine Learning (ML) offer promising alternatives. Neural networks, which are trained using known signal sources, provide a powerful means to model and calibrate complex systems where traditional analytical approaches struggle. These methods are especially relevant for detecting the faint sky-averaged 21-cm signal from atomic hydrogen at high redshifts. This is one of the main challenges in observational Cosmology today. Here, for the first time, we introduce and test a machine learning-based calibration framework capable of achieving the precision required for radiometric experiments aiming to detect the 21-cm line. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: Under peer review for publication in Nature Scientific Reports as part of the Radio Astronomy collection

arXiv:2504.16065 [pdf, ps, other]

A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions

Authors: Xi Chen, Shyamal Patel, Rocco A. Servedio

Abstract: The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems. In more detail, - We give a dist… ▽ More The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems. In more detail, - We give a distribution-free algorithm for agnostically PAC learning conjunctions over $\{\pm 1\}^n$ that runs in time $2^{\widetilde{O}(n^{1/3})}$, for constant excess error $\varepsilon$. This improves on the fastest previously published algorithm, which runs in time $2^{\widetilde{O}(n^{1/2})}$ [KKMS08]. - Building on the ideas in our agnostic conjunction learner and using significant additional technical ingredients, we give an adaptive tolerant testing algorithm for $k$-juntas that makes $2^{\widetilde{O}(k^{1/3})}$ queries, for constant "gap parameter" $\varepsilon$ between the "near" and "far" cases. This improves on the best previous results, due to [ITW21, NP24], which make $2^{\widetilde{O}(\sqrt{k})}$ queries. Since there is a known $2^{\widetildeΩ(\sqrt{k})}$ lower bound for non-adaptive tolerant junta testers, our result shows that adaptive tolerant junta testing algorithms provably outperform non-adaptive ones. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2503.23339 [pdf, other]

A Scalable Framework for Evaluating Health Language Models

Authors: Neil Mallinar, A. Ali Heydari, Xin Liu, Anthony Z. Faranesh, Brent Winslow, Nova Hammerquist, Benjamin Graef, Cathy Speed, Mark Malhotra, Shwetak Patel, Javier L. Prieto, Daniel McDuff, Ahmed A. Metwally

Abstract: Large language models (LLMs) have emerged as powerful tools for analyzing complex datasets. Recent studies demonstrate their potential to generate useful, personalized responses when provided with patient-specific health information that encompasses lifestyle, biomarkers, and context. As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodolog… ▽ More Large language models (LLMs) have emerged as powerful tools for analyzing complex datasets. Recent studies demonstrate their potential to generate useful, personalized responses when provided with patient-specific health information that encompasses lifestyle, biomarkers, and context. As LLM-driven health applications are increasingly adopted, rigorous and efficient one-sided evaluation methodologies are crucial to ensure response quality across multiple dimensions, including accuracy, personalization and safety. Current evaluation practices for open-ended text responses heavily rely on human experts. This approach introduces human factors and is often cost-prohibitive, labor-intensive, and hinders scalability, especially in complex domains like healthcare where response assessment necessitates domain expertise and considers multifaceted patient data. In this work, we introduce Adaptive Precise Boolean rubrics: an evaluation framework that streamlines human and automated evaluation of open-ended questions by identifying gaps in model responses using a minimal set of targeted rubrics questions. Our approach is based on recent work in more general evaluation settings that contrasts a smaller set of complex evaluation targets with a larger set of more precise, granular targets answerable with simple boolean responses. We validate this approach in metabolic health, a domain encompassing diabetes, cardiovascular disease, and obesity. Our results demonstrate that Adaptive Precise Boolean rubrics yield higher inter-rater agreement among expert and non-expert human evaluators, and in automated assessments, compared to traditional Likert scales, while requiring approximately half the evaluation time of Likert-based methods. This enhanced efficiency, particularly in automated evaluation and non-expert contributions, paves the way for more extensive and cost-effective evaluation of LLMs in health. △ Less

Submitted 1 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

arXiv:2503.19328 [pdf, ps, other]

Substance over Style: Evaluating Proactive Conversational Coaching Agents

Authors: Vidya Srinivas, Xuhai Xu, Xin Liu, Kumar Ayush, Isaac Galatzer-Levy, Shwetak Patel, Daniel McDuff, Tim Althoff

Abstract: While NLP research has made strides in conversational tasks, many approaches focus on single-turn responses with well-defined objectives or evaluation criteria. In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, mixed-initiative dialogue. In this work, we describe and implement five multi-turn… ▽ More While NLP research has made strides in conversational tasks, many approaches focus on single-turn responses with well-defined objectives or evaluation criteria. In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, mixed-initiative dialogue. In this work, we describe and implement five multi-turn coaching agents that exhibit distinct conversational styles, and evaluate them through a user study, collecting first-person feedback on 155 conversations. We find that users highly value core functionality, and that stylistic components in absence of core components are viewed negatively. By comparing user feedback with third-person evaluations from health experts and an LM, we reveal significant misalignment across evaluation approaches. Our findings provide insights into design and evaluation of conversational coaching agents and contribute toward improving human-centered NLP applications. △ Less

Submitted 8 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

Comments: Accepted to ACL 2025

arXiv:2503.14893 [pdf, other]

Incorporating Sustainability in Electronics Design: Obstacles and Opportunities

Authors: Zachary Englhardt, Felix Hähnlein, Yuxuan Mei, Tong Lin, Connor Masahiro Sun, Zhihan Zhang, Adriana Schulz, Shwetak Patel, Vikram Iyer

Abstract: Life cycle assessment (LCA) is a methodology for holistically measuring the environmental impact of a product from initial manufacturing to end-of-life disposal. However, the extent to which LCA informs the design of computing devices remains unclear. To understand how this information is collected and applied, we interviewed 17 industry professionals with experience in LCA or electronics design,… ▽ More Life cycle assessment (LCA) is a methodology for holistically measuring the environmental impact of a product from initial manufacturing to end-of-life disposal. However, the extent to which LCA informs the design of computing devices remains unclear. To understand how this information is collected and applied, we interviewed 17 industry professionals with experience in LCA or electronics design, systematically coded the interviews, and investigated common themes. These themes highlight the challenge of LCA data collection and reveal distributed decision-making processes where responsibility for sustainable design choices, and their associated costs, is often ambiguous. Our analysis identifies opportunities for HCI technologies to support LCA computation and its integration into the design process to facilitate sustainability-oriented decision-making. While this work provides a nuanced discussion about sustainable design in the information and communication technologies (ICT) hardware industry, we hope our insights will also be valuable to other sectors. △ Less

Submitted 19 March, 2025; originally announced March 2025.

arXiv:2503.03783 [pdf, other]

Passive Heart Rate Monitoring During Smartphone Use in Everyday Life

Authors: Shun Liao, Paolo Di Achille, Jiang Wu, Silviu Borac, Jonathan Wang, Xin Liu, Eric Teasley, Lawrence Cai, Yuzhe Yang, Yun Liu, Daniel McDuff, Hao-Wei Su, Brent Winslow, Anupam Pathak, Shwetak Patel, James A. Taylor, Jameson K. Rogers, Ming-Zher Poh

Abstract: Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos… ▽ More Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions, representing the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) < 10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error < 5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring. △ Less

Submitted 21 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

Comments: Updated author list

arXiv:2503.00890 [pdf, other]

Estimating Blood Pressure with a Camera: An Exploratory Study of Ambulatory Patients with Cardiovascular Disease

Authors: Theodore Curran, Chengqian Ma, Xin Liu, Daniel McDuff, Girish Narayanswamy, George Stergiou, Shwetak Patel, Eugene Yang

Abstract: Hypertension is a leading cause of morbidity and mortality worldwide. The ability to diagnose and treat hypertension in the ambulatory population is hindered by limited access and poor adherence to current methods of monitoring blood pressure (BP), specifically, cuff-based devices. Remote photoplethysmography (rPPG) evaluates an individual's pulse waveform through a standard camera without physica… ▽ More Hypertension is a leading cause of morbidity and mortality worldwide. The ability to diagnose and treat hypertension in the ambulatory population is hindered by limited access and poor adherence to current methods of monitoring blood pressure (BP), specifically, cuff-based devices. Remote photoplethysmography (rPPG) evaluates an individual's pulse waveform through a standard camera without physical contact. Cameras are readily available to the majority of the global population via embedded technologies such as smartphones, thus rPPG is a scalable and promising non-invasive method of BP monitoring. The few studies investigating rPPG for BP measurement have excluded high-risk populations, including those with cardiovascular disease (CVD) or its risk factors, as well as subjects in active cardiac arrhythmia. The impact of arrhythmia, like atrial fibrillation, on the prediction of BP using rPPG is currently uncertain. We performed a study to better understand the relationship between rPPG and BP in a real-world sample of ambulatory patients from a cardiology clinic with established CVD or risk factors for CVD. We collected simultaneous rPPG, PPG, BP, ECG, and other vital signs data from 143 subjects while at rest, and used this data plus demographics to train a deep learning model to predict BP. We report that facial rPPG yields a signal that is comparable to finger PPG. Pulse wave analysis (PWA)-based BP estimates on this cohort performed comparably to studies on healthier subjects, and notably, the accuracy of BP prediction in subjects with atrial fibrillation was not inferior to subjects with normal sinus rhythm. In a binary classification task, the rPPG model identified subjects with systolic BP $\geq$ 130 mm Hg with a positive predictive value of 71% (baseline prevalence 48.3%), highlighting the potential of rPPG for hypertension monitoring. △ Less

Submitted 2 March, 2025; originally announced March 2025.

arXiv:2502.08643 [pdf, other]

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

Authors: Shivansh Patel, Xinchen Yin, Wenlong Huang, Shubham Garg, Hooshang Nayyeri, Li Fei-Fei, Svetlana Lazebnik, Yunzhu Li

Abstract: Task specification for robotic manipulation in open-world environments is challenging, requiring flexible and adaptive objectives that align with human intentions and can evolve through iterative feedback. We introduce Iterative Keypoint Reward (IKER), a visually grounded, Python-based reward function that serves as a dynamic task specification. Our framework leverages VLMs to generate and refine… ▽ More Task specification for robotic manipulation in open-world environments is challenging, requiring flexible and adaptive objectives that align with human intentions and can evolve through iterative feedback. We introduce Iterative Keypoint Reward (IKER), a visually grounded, Python-based reward function that serves as a dynamic task specification. Our framework leverages VLMs to generate and refine these reward functions for multi-step manipulation tasks. Given RGB-D observations and free-form language instructions, we sample keypoints in the scene and generate a reward function conditioned on these keypoints. IKER operates on the spatial relationships between keypoints, leveraging commonsense priors about the desired behaviors, and enabling precise SE(3) control. We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning (RL) policies, which are then deployed into the real world-forming a real-to-sim-to-real loop. Our approach demonstrates notable capabilities across diverse scenarios, including both prehensile and non-prehensile tasks, showcasing multi-step task execution, spontaneous error recovery, and on-the-fly strategy adjustments. The results highlight IKER's effectiveness in enabling robots to perform multi-step tasks in dynamic environments through iterative reward shaping. △ Less

Submitted 18 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

Comments: ICRA 2025, Project Page: https://iker-robot.github.io/

arXiv:2502.03540 [pdf, other]

Path Planning for Masked Diffusion Model Sampling

Authors: Fred Zhangzhi Peng, Zachary Bezemek, Sawan Patel, Jarrid Rector-Brooks, Sherwood Yao, Avishek Joey Bose, Alexander Tong, Pranam Chatterjee

Abstract: Any order generation of discrete data using masked diffusion models (MDMs) offers a compelling alternative to traditional autoregressive models, especially in domains that lack a natural causal ordering of data. However, current popular MDMs depart from their successful continuous diffusion model counterparts with simplified masked inference wherein unmasked tokens cannot be iteratively refined --… ▽ More Any order generation of discrete data using masked diffusion models (MDMs) offers a compelling alternative to traditional autoregressive models, especially in domains that lack a natural causal ordering of data. However, current popular MDMs depart from their successful continuous diffusion model counterparts with simplified masked inference wherein unmasked tokens cannot be iteratively refined -- even if there is a mistake. In this paper, we extract the full power of MDMs by introducing a novel inference sampling strategy termed Path Planning (P2) that decomposes each generation step into two sub-stages: planning and denoising. Under P2, the planner at every step selects appropriate tokens that are marked to be updated, which can then be sampled using the denoiser. We demonstrate that P2 generalizes all existing sampling strategies for MDMs and critically enhances generative quality through the new capability of refining and updating existing unmasked tokens. We theoretically prove that P2 establishes a (new) expanded evidence lower bound (ELBO) on the log marginal likelihood of data. We instantiate P2 with a family of planners including: 1.) Self-Planning, 2.) BERT-Planning, and 3.) Trained-Planning with a learned planner leading to SOTA generative performance for MDMs on a suite of domains. Specifically, solely using P2 inference, we observe relative improvements of 22% in protein sequence foldability, 8% in RNA sequence pLDDT, 4% in math reasoning, 68% in story generation (ROUGE score), and 33% in code generation for the challenging pass@1 metric. △ Less

Submitted 27 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2501.17286 [pdf]

Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology

Authors: Peilong Wang, Zhengliang Liu, Yiwei Li, Jason Holmes, Peng Shu, Lian Zhang, Xiang Li, Quanzheng Li, Brady S. Laughlin, Diego Santos Toesca, Sujay A. Vora, Samir H. Patel, Terence T. Sio, Tianming Liu, Wei Liu

Abstract: Background: The radiation oncology clinical practice involves many steps relying on the dynamic interplay of abundant text data. Large language models have displayed remarkable capabilities in processing complex text information. But their direct applications in specific fields like radiation oncology remain underexplored. Purpose: This study aims to investigate whether fine-tuning LLMs with dom… ▽ More Background: The radiation oncology clinical practice involves many steps relying on the dynamic interplay of abundant text data. Large language models have displayed remarkable capabilities in processing complex text information. But their direct applications in specific fields like radiation oncology remain underexplored. Purpose: This study aims to investigate whether fine-tuning LLMs with domain knowledge can improve the performance on Task (1) treatment regimen generation, Task (2) treatment modality selection (photon, proton, electron, or brachytherapy), and Task (3) ICD-10 code prediction in radiation oncology. Methods: Data for 15,724 patient cases were extracted. Cases where patients had a single diagnostic record, and a clearly identifiable primary treatment plan were selected for preprocessing and manual annotation to have 7,903 cases of the patient diagnosis, treatment plan, treatment modality, and ICD-10 code. Each case was used to construct a pair consisting of patient diagnostics details and an answer (treatment regimen, treatment modality, or ICD-10 code respectively) for the supervised fine-tuning of these three tasks. Open source LLaMA2-7B and Mistral-7B models were utilized for the fine-tuning with the Low-Rank Approximations method. Accuracy and ROUGE-1 score were reported for the fine-tuned models and original models. Clinical evaluation was performed on Task (1) by radiation oncologists, while precision, recall, and F-1 score were evaluated for Task (2) and (3). One-sided Wilcoxon signed-rank tests were used to statistically analyze the results. Results: Fine-tuned LLMs outperformed original LLMs across all tasks with p-value <= 0.001. Clinical evaluation demonstrated that over 60% of the fine-tuned LLMs-generated treatment regimens were clinically acceptable. Precision, recall, and F1-score showed improved performance of fine-tuned LLMs. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.16680 [pdf, ps, other]

Differentially Private Set Representations

Authors: Sarvar Patel, Giuseppe Persiano, Joon Young Seo, Kevin Yeo

Abstract: We study the problem of differentially private (DP) mechanisms for representing sets of size $k$ from a large universe. Our first construction creates $(ε,δ)$-DP representations with error probability of $1/(e^ε+ 1)$ using space at most $1.05 k ε\cdot \log(e)$ bits where the time to construct a representation is $O(k \log(1/δ))$ while decoding time is $O(\log(1/δ))$. We also present a second algor… ▽ More We study the problem of differentially private (DP) mechanisms for representing sets of size $k$ from a large universe. Our first construction creates $(ε,δ)$-DP representations with error probability of $1/(e^ε+ 1)$ using space at most $1.05 k ε\cdot \log(e)$ bits where the time to construct a representation is $O(k \log(1/δ))$ while decoding time is $O(\log(1/δ))$. We also present a second algorithm for pure $ε$-DP representations with the same error using space at most $k ε\cdot \log(e)$ bits, but requiring large decoding times. Our algorithms match our lower bounds on privacy-utility trade-offs (including constants but ignoring $δ$ factors) and we also present a new space lower bound matching our constructions up to small constant factors. To obtain our results, we design a new approach embedding sets into random linear systems deviating from most prior approaches that inject noise into non-private solutions. △ Less

Submitted 21 July, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

Comments: Appears at NeurIPS 2024

arXiv:2501.16309 [pdf, other]

Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology

Authors: Meiyun Cao, Shaw Hu, Jason Sharp, Edward Clouser, Jason Holmes, Linda L. Lam, Xiaoning Ding, Diego Santos Toesca, Wendy S. Lindholm, Samir H. Patel, Sujay A. Vora, Peilong Wang, Wei Liu

Abstract: Purpose: This study aims to use a large language model (LLM) to automate the generation of summaries from the CT simulation orders and evaluate its performance. Materials and Methods: A total of 607 CT simulation orders for patients were collected from the Aria database at our institution. A locally hosted Llama 3.1 405B model, accessed via the Application Programming Interface (API) service, wa… ▽ More Purpose: This study aims to use a large language model (LLM) to automate the generation of summaries from the CT simulation orders and evaluate its performance. Materials and Methods: A total of 607 CT simulation orders for patients were collected from the Aria database at our institution. A locally hosted Llama 3.1 405B model, accessed via the Application Programming Interface (API) service, was used to extract keywords from the CT simulation orders and generate summaries. The downloaded CT simulation orders were categorized into seven groups based on treatment modalities and disease sites. For each group, a customized instruction prompt was developed collaboratively with therapists to guide the Llama 3.1 405B model in generating summaries. The ground truth for the corresponding summaries was manually derived by carefully reviewing each CT simulation order and subsequently verified by therapists. The accuracy of the LLM-generated summaries was evaluated by therapists using the verified ground truth as a reference. Results: About 98% of the LLM-generated summaries aligned with the manually generated ground truth in terms of accuracy. Our evaluations showed an improved consistency in format and enhanced readability of the LLM-generated summaries compared to the corresponding therapists-generated summaries. This automated approach demonstrated a consistent performance across all groups, regardless of modality or disease site. Conclusions: This study demonstrated the high precision and consistency of the Llama 3.1 405B model in extracting keywords and summarizing CT simulation orders, suggesting that LLMs have great potential to help with this task, reduce the workload of therapists and improve workflow efficiency. △ Less

Submitted 27 January, 2025; originally announced January 2025.

arXiv:2501.14249 [pdf, other]

Humanity's Last Exam

Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai. △ Less

Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: 29 pages, 6 figures

arXiv:2501.03277 [pdf, other]

HonkaiChat: Companions from Anime that feel alive!

Authors: Yueze Liu, Yichi Zhang, Shaan Om Patel, Zhaoyang Zhu, Shilong Guo

Abstract: Modern conversational agents, including anime-themed chatbots, are frequently reactive and personality-driven but fail to capture the dynamic nature of human interactions. We propose an event-driven dialogue framework to address these limitations by embedding dynamic events in conversation prompts and fine-tuning models on character-specific data. Evaluations on GPT-4 and comparisons with industry… ▽ More Modern conversational agents, including anime-themed chatbots, are frequently reactive and personality-driven but fail to capture the dynamic nature of human interactions. We propose an event-driven dialogue framework to address these limitations by embedding dynamic events in conversation prompts and fine-tuning models on character-specific data. Evaluations on GPT-4 and comparisons with industry-leading baselines demonstrate that event-driven prompts significantly improve conversational engagement and naturalness while reducing hallucinations. This paper explores the application of this approach in creating lifelike chatbot interactions within the context of Honkai: Star Rail, showcasing the potential for dynamic event-based systems to transform role-playing and interactive dialogue. △ Less

Submitted 5 January, 2025; originally announced January 2025.

Comments: 5 pages, 4 figures. This is a preprint. Not yet submitted to a journal or conference. More iterated versions to be updated

arXiv:2412.11964 [pdf, other]

BetaExplainer: A Probabilistic Method to Explain Graph Neural Networks

Authors: Whitney Sloneker, Shalin Patel, Michael Wang, Lorin Crawford, Ritambhara Singh

Abstract: Graph neural networks (GNNs) are powerful tools for conducting inference on graph data but are often seen as "black boxes" due to difficulty in extracting meaningful subnetworks driving predictive performance. Many interpretable GNN methods exist, but they cannot quantify uncertainty in edge weights and suffer in predictive accuracy when applied to challenging graph structures. In this work, we pr… ▽ More Graph neural networks (GNNs) are powerful tools for conducting inference on graph data but are often seen as "black boxes" due to difficulty in extracting meaningful subnetworks driving predictive performance. Many interpretable GNN methods exist, but they cannot quantify uncertainty in edge weights and suffer in predictive accuracy when applied to challenging graph structures. In this work, we proposed BetaExplainer which addresses these issues by using a sparsity-inducing prior to mask unimportant edges during model training. To evaluate our approach, we examine various simulated data sets with diverse real-world characteristics. Not only does this implementation provide a notion of edge importance uncertainty, it also improves upon evaluation metrics for challenging datasets compared to state-of-the art explainer methods. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2412.09915 [pdf, other]

Two-dimensional Constacyclic Codes over $\mathbb{F}_q$

Authors: Vidya Sagar, Shikha Patel, Shayan Srinivasa Garani

Abstract: We consider two-dimensional $(λ_1, λ_2)$-constacyclic codes over $\mathbb{F}_{q}$ of area $M N$, where $q$ is some power of prime $p$ with $\gcd(M,p)=1$ and $\gcd(N,p)=1$. With the help of common zero (CZ) set, we characterize 2-D constacyclic codes. Further, we provide an algorithm to construct an ideal basis of these codes by using their essential common zero (ECZ) sets. We describe the dual of… ▽ More We consider two-dimensional $(λ_1, λ_2)$-constacyclic codes over $\mathbb{F}_{q}$ of area $M N$, where $q$ is some power of prime $p$ with $\gcd(M,p)=1$ and $\gcd(N,p)=1$. With the help of common zero (CZ) set, we characterize 2-D constacyclic codes. Further, we provide an algorithm to construct an ideal basis of these codes by using their essential common zero (ECZ) sets. We describe the dual of 2-D constacyclic codes. Finally, we provide an encoding scheme for generating 2-D constacyclic codes. We present an example to illustrate that 2-D constacyclic codes can have better minimum distance compared to their cyclic counterparts with the same code area and code rate. △ Less

Submitted 1 May, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

Comments: 25 pages, 1 figure

arXiv:2412.08984 [pdf, other]

Predicting Emergency Department Visits for Patients with Type II Diabetes

Authors: Javad M Alizadeh, Jay S Patel, Gabriel Tajeu, Yuzhou Chen, Ilene L Hollin, Mukesh K Patel, Junchao Fei, Huanmei Wu

Abstract: Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnos… ▽ More Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnoses, and vital signs. Our sample contained 34,151 patients diagnosed with T2D which resulted in 703,065 visits overall between 2017 and 2021. A workflow integrated EMR data with SDoH for ML predictions. A total of 87 out of 2,555 features were selected for model construction. Various machine learning algorithms, including CatBoost, Ensemble Learning, K-nearest Neighbors (KNN), Support Vector Classification (SVC), Random Forest, and Extreme Gradient Boosting (XGBoost), were employed with tenfold cross-validation to predict whether a patient is at risk of an ED visit. The ROC curves for Random Forest, XGBoost, Ensemble Learning, CatBoost, KNN, and SVC, were 0.82, 0.82, 0.82, 0.81, 0.72, 0.68, respectively. Ensemble Learning and Random Forest models demonstrated superior predictive performance in terms of discrimination, calibration, and clinical applicability. These models are reliable tools for predicting risk of ED visits among patients with T2D. They can estimate future ED demand and assist clinicians in identifying critical factors associated with ED utilization, enabling early interventions to reduce such visits. The top five important features were age, the difference between visitation gaps, visitation gaps, R10 or abdominal and pelvic pain, and the Index of Concentration at the Extremes (ICE) for income. △ Less

Submitted 12 December, 2024; originally announced December 2024.

Comments: This manuscript has been accepted and presented at AI-PHSS 2024: The 2024 International Workshop on AI Applications in Public Health and Social Services in conjunction with the 22nd International Conference of Artificial Intelligence in Medicine (AIME 2024)

arXiv:2412.06936 [pdf, other]

Creating a Cooperative AI Policymaking Platform through Open Source Collaboration

Authors: Aiden Lewington, Alekhya Vittalam, Anshumaan Singh, Anuja Uppuluri, Arjun Ashok, Ashrith Mandayam Athmaram, Austin Milt, Benjamin Smith, Charlie Weinberger, Chatanya Sarin, Christoph Bergmeir, Cliff Chang, Daivik Patel, Daniel Li, David Bell, Defu Cao, Donghwa Shin, Edward Kang, Edwin Zhang, Enhui Li, Felix Chen, Gabe Smithline, Haipeng Chen, Henry Gasztowtt, Hoon Shin , et al. (26 additional authors not shown)

Abstract: Advances in artificial intelligence (AI) present significant risks and opportunities, requiring improved governance to mitigate societal harms and promote equitable benefits. Current incentive structures and regulatory delays may hinder responsible AI development and deployment, particularly in light of the transformative potential of large language models (LLMs). To address these challenges, we p… ▽ More Advances in artificial intelligence (AI) present significant risks and opportunities, requiring improved governance to mitigate societal harms and promote equitable benefits. Current incentive structures and regulatory delays may hinder responsible AI development and deployment, particularly in light of the transformative potential of large language models (LLMs). To address these challenges, we propose developing the following three contributions: (1) a large multimodal text and economic-timeseries foundation model that integrates economic and natural language policy data for enhanced forecasting and decision-making, (2) algorithmic mechanisms for eliciting diverse and representative perspectives, enabling the creation of data-driven public policy recommendations, and (3) an AI-driven web platform for supporting transparent, inclusive, and data-driven policymaking. △ Less

Submitted 9 December, 2024; originally announced December 2024.

arXiv:2411.18855 [pdf, other]

Improving Accuracy and Generalization for Efficient Visual Tracking

Authors: Ram Zaveri, Shivang Patel, Yu Gu, Gianfranco Doretto

Abstract: Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly… ▽ More Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6\% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU. Our code and models are available at https://wvuvl.github.io/SiamABC/. △ Less

Submitted 6 February, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

Comments: WACV 2025

arXiv:2411.07553 [pdf, ps, other]

A Simple Algorithm for Dynamic Carpooling with Recourse

Authors: Yuval Efron, Shyamal Patel, Cliff Stein

Abstract: We give an algorithm for the fully-dynamic carpooling problem with recourse: Edges arrive and depart online from a graph $G$ with $n$ nodes according to an adaptive adversary. Our goal is to maintain an orientation $H$ of $G$ that keeps the discrepancy, defined as $\max_{v \in V} |\text{deg}_H^+(v) - \text{deg}_H^-(v)|$, small at all times. We present a simple algorithm and analysis for this probl… ▽ More We give an algorithm for the fully-dynamic carpooling problem with recourse: Edges arrive and depart online from a graph $G$ with $n$ nodes according to an adaptive adversary. Our goal is to maintain an orientation $H$ of $G$ that keeps the discrepancy, defined as $\max_{v \in V} |\text{deg}_H^+(v) - \text{deg}_H^-(v)|$, small at all times. We present a simple algorithm and analysis for this problem with recourse based on cycles that simplifies and improves on a result of Gupta et al. [SODA '22]. △ Less

Submitted 22 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

Comments: To appear in SOSA 2025

arXiv:2411.04303 [pdf]

Analysis of Droughts and Their Intensities in California from 2000 to 2020

Authors: Ujjwal, Shikha C. Patel, Bansari K. Shah, Nicholas Ogbonna, Huthaifa I Ashqar

Abstract: Drought has been perceived as a persistent threat globally and the complex mechanism of various factors contributing to its emergence makes it more troublesome to understand. Droughts and their severity trends have been a point of concern in the USA as well, since the economic impact of droughts has been substantial, especially in parts that contribute majorly to US agriculture. California is the… ▽ More Drought has been perceived as a persistent threat globally and the complex mechanism of various factors contributing to its emergence makes it more troublesome to understand. Droughts and their severity trends have been a point of concern in the USA as well, since the economic impact of droughts has been substantial, especially in parts that contribute majorly to US agriculture. California is the biggest agricultural contributor to the United States with its share amounting up to 12% approximately for all of US agricultural produce. Although, according to a 20-year average, California ranks fifth on the list of the highest average percentage of drought-hit regions. Therefore, drought analysis and drought prediction are of crucial importance for California in order to mitigate the associated risks. However, the design of a consistent drought prediction model based on the dynamic relationship of the drought index remains a challenging task. In the present study, we trained a Voting Ensemble classifier utilizing a soft voting system and three different Random Forest models, to predict the presence of drought and also its intensity. In this paper, initially, we have discussed the trends of droughts and their intensities in various California counties reviewed the correlation of meteorological indicators with drought intensities and used these meteorological indicators for drought prediction so as to evaluate their effectiveness as well as significance. △ Less

Submitted 6 November, 2024; originally announced November 2024.

arXiv:2410.13638 [pdf, other]

Scaling Wearable Foundation Models

Authors: Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

Abstract: Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre… ▽ More Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.12935 [pdf, other]

Quantum Boltzmann machine learning of ground-state energies

Authors: Dhrumil Patel, Daniel Koch, Saahil Patel, Mark M. Wilde

Abstract: Estimating the ground-state energy of Hamiltonians is a fundamental task for which it is believed that quantum computers can be helpful. Several approaches have been proposed toward this goal, including algorithms based on quantum phase estimation and hybrid quantum-classical optimizers involving parameterized quantum circuits, the latter falling under the umbrella of the variational quantum eigen… ▽ More Estimating the ground-state energy of Hamiltonians is a fundamental task for which it is believed that quantum computers can be helpful. Several approaches have been proposed toward this goal, including algorithms based on quantum phase estimation and hybrid quantum-classical optimizers involving parameterized quantum circuits, the latter falling under the umbrella of the variational quantum eigensolver. Here, we analyze the performance of quantum Boltzmann machines for this task, which is a less explored ansatz based on parameterized thermal states and which is not known to suffer from the barren-plateau problem. We delineate a hybrid quantum-classical algorithm for this task and rigorously prove that it converges to an $\varepsilon$-approximate stationary point of the energy function optimized over parameter space, while using a number of parameterized-thermal-state samples that is polynomial in $\varepsilon^{-1}$, the number of parameters, and the norm of the Hamiltonian being optimized. Our algorithm estimates the gradient of the energy function efficiently by means of a novel quantum circuit construction that combines classical sampling, Hamiltonian simulation, and the Hadamard test, thus overcoming a key obstacle to quantum Boltzmann machine learning that has been left open since [Amin et al., Phys. Rev. X 8, 021050 (2018)]. Additionally supporting our main claims are calculations of the gradient and Hessian of the energy function, as well as an upper bound on the matrix elements of the latter that is used in the convergence analysis. △ Less

Submitted 30 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

Comments: v2: 7 pages of main text, 29 pages of supplementary material, 5 figures

Report number: AFRL-2024-0949

arXiv:2410.11176 [pdf]

Improving Bias in Facial Attribute Classification: A Combined Impact of KL Divergence induced Loss Function and Dual Attention

Authors: Shweta Patel, Dakshina Ranjan Kisku

Abstract: Ensuring that AI-based facial recognition systems produce fair predictions and work equally well across all demographic groups is crucial. Earlier systems often exhibited demographic bias, particularly in gender and racial classification, with lower accuracy for women and individuals with darker skin tones. To tackle this issue and promote fairness in facial recognition, researchers have introduce… ▽ More Ensuring that AI-based facial recognition systems produce fair predictions and work equally well across all demographic groups is crucial. Earlier systems often exhibited demographic bias, particularly in gender and racial classification, with lower accuracy for women and individuals with darker skin tones. To tackle this issue and promote fairness in facial recognition, researchers have introduced several bias-mitigation techniques for gender classification and related algorithms. However, many challenges remain, such as data diversity, balancing fairness with accuracy, disparity, and bias measurement. This paper presents a method using a dual attention mechanism with a pre-trained Inception-ResNet V1 model, enhanced by KL-divergence regularization and a cross-entropy loss function. This approach reduces bias while improving accuracy and computational efficiency through transfer learning. The experimental results show significant improvements in both fairness and classification accuracy, providing promising advances in addressing bias and enhancing the reliability of facial recognition systems. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: 15 pages, 9 figures, 5 tables

MSC Class: 68T06 ACM Class: I.2.10

arXiv:2410.01657 [pdf, other]

Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling

Authors: Shivam Barwey, Riccardo Balin, Bethany Lusch, Saumil Patel, Ramesh Balakrishnan, Pinaki Pal, Romit Maulik, Venkatram Vishwanath

Abstract: This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is… ▽ More This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2410.00016 [pdf, other]

A Dataset of the Operating Station Heat Rate for 806 Indian Coal Plant Units using Machine Learning

Authors: Yifu Ding, Jansen Wong, Serena Patel, Dharik Mallapragada, Guiyan Zang, Robert Stoner

Abstract: India aims to achieve net-zero emissions by 2070 and has set an ambitious target of 500 GW of renewable power generation capacity by 2030. Coal plants currently contribute to more than 60\% of India's electricity generation in 2022. Upgrading and decarbonizing high-emission coal plants became a pressing energy issue. A key technical parameter for coal plants is the operating station heat rate (SHR… ▽ More India aims to achieve net-zero emissions by 2070 and has set an ambitious target of 500 GW of renewable power generation capacity by 2030. Coal plants currently contribute to more than 60\% of India's electricity generation in 2022. Upgrading and decarbonizing high-emission coal plants became a pressing energy issue. A key technical parameter for coal plants is the operating station heat rate (SHR), which represents the thermal efficiency of a coal plant. Yet, the operating SHR of Indian coal plants varies and is not comprehensively documented. This study extends from several existing databases and creates an SHR dataset for 806 Indian coal plant units using machine learning (ML), presenting the most comprehensive coverage to date. Additionally, it incorporates environmental factors such as water stress risk and coal prices as prediction features to improve accuracy. This dataset, easily downloadable from our visualization platform, could inform energy and environmental policies for India's coal power generation as the country transitions towards its renewable energy targets. △ Less

Submitted 14 September, 2024; originally announced October 2024.

arXiv:2409.18642 [pdf]

Enhanced Convolution Neural Network with Optimized Pooling and Hyperparameter Tuning for Network Intrusion Detection

Authors: Ayush Kumar Sharma, Sourav Patel, Supriya Bharat Wakchaure, Abirami S

Abstract: Network Intrusion Detection Systems (NIDS) are essential for protecting computer networks from malicious activities, including Denial of Service (DoS), Probing, User-to-Root (U2R), and Remote-to-Local (R2L) attacks. Without effective NIDS, networks are vulnerable to significant security breaches and data loss. Machine learning techniques provide a promising approach to enhance NIDS by automating t… ▽ More Network Intrusion Detection Systems (NIDS) are essential for protecting computer networks from malicious activities, including Denial of Service (DoS), Probing, User-to-Root (U2R), and Remote-to-Local (R2L) attacks. Without effective NIDS, networks are vulnerable to significant security breaches and data loss. Machine learning techniques provide a promising approach to enhance NIDS by automating threat detection and improving accuracy. In this research, we propose an Enhanced Convolutional Neural Network (EnCNN) for NIDS and evaluate its performance using the KDDCUP'99 dataset. Our methodology includes comprehensive data preprocessing, exploratory data analysis (EDA), and feature engineering. We compare EnCNN with various machine learning algorithms, including Logistic Regression, Decision Trees, Support Vector Machines (SVM), and ensemble methods like Random Forest, AdaBoost, and Voting Ensemble. The results show that EnCNN significantly improves detection accuracy, with a notable 10% increase over state-of-art approaches. This demonstrates the effectiveness of EnCNN in real-time network intrusion detection, offering a robust solution for identifying and mitigating security threats, and enhancing overall network resilience. △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: 7 Pages , 2 figures , 4 Tables , Conference paper

arXiv:2409.18301 [pdf, other]

Wavelet-Driven Generalizable Framework for Deepfake Face Forgery Detection

Authors: Lalith Bharadwaj Baru, Rohit Boddeda, Shilhora Akshay Patel, Sai Mohan Gajapaka

Abstract: The evolution of digital image manipulation, particularly with the advancement of deep generative models, significantly challenges existing deepfake detection methods, especially when the origin of the deepfake is obscure. To tackle the increasing complexity of these forgeries, we propose \textbf{Wavelet-CLIP}, a deepfake detection framework that integrates wavelet transforms with features derived… ▽ More The evolution of digital image manipulation, particularly with the advancement of deep generative models, significantly challenges existing deepfake detection methods, especially when the origin of the deepfake is obscure. To tackle the increasing complexity of these forgeries, we propose \textbf{Wavelet-CLIP}, a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion. Wavelet-CLIP utilizes Wavelet Transforms to deeply analyze both spatial and frequency features from images, thus enhancing the model's capability to detect sophisticated deepfakes. To verify the effectiveness of our approach, we conducted extensive evaluations against existing state-of-the-art methods for cross-dataset generalization and detection of unseen images generated by standard diffusion models. Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes, outperforming all compared methods. The code can be reproduced from the repo: \url{https://github.com/lalithbharadwajbaru/Wavelet-CLIP} △ Less

Submitted 7 January, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: 9 Pages, 2 Figures, 3 Tables

arXiv:2409.18290 [pdf, other]

Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams

Authors: Yuexing Hao, Jason M. Holmes, Jared Hobson, Alexandra Bennett, Daniel K. Ebner, David M. Routman, Satomi Shiraishi, Samir H. Patel, Nathan Y. Yu, Chris L. Hallemeier, Brooke E. Ball, Mark R. Waddle, Wei Liu

Abstract: In-basket message interactions play a crucial role in physician-patient communication, occurring during all phases (pre-, during, and post) of a patient's care journey. However, responding to these patients' inquiries has become a significant burden on healthcare workflows, consuming considerable time for clinical care teams. To address this, we introduce RadOnc-GPT, a specialized Large Language M… ▽ More In-basket message interactions play a crucial role in physician-patient communication, occurring during all phases (pre-, during, and post) of a patient's care journey. However, responding to these patients' inquiries has become a significant burden on healthcare workflows, consuming considerable time for clinical care teams. To address this, we introduce RadOnc-GPT, a specialized Large Language Model (LLM) powered by GPT-4 that has been designed with a focus on radiotherapeutic treatment of prostate cancer with advanced prompt engineering, and specifically designed to assist in generating responses. We integrated RadOnc-GPT with patient electronic health records (EHR) from both the hospital-wide EHR database and an internal, radiation-oncology-specific database. RadOnc-GPT was evaluated on 158 previously recorded in-basket message interactions. Quantitative natural language processing (NLP) analysis and two grading studies with clinicians and nurses were used to assess RadOnc-GPT's responses. Our findings indicate that RadOnc-GPT slightly outperformed the clinical care team in "Clarity" and "Empathy," while achieving comparable scores in "Completeness" and "Correctness." RadOnc-GPT is estimated to save 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response. Employing RadOnc-GPT for in-basket message draft generation has the potential to alleviate the workload of clinical care teams and reduce healthcare costs by producing high-quality, timely responses. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2409.13571 [pdf, other]

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Authors: Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

Abstract: Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide schedul… ▽ More Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.10325 [pdf, other]

PASS: An Asynchronous Probabilistic Processor for Next Generation Intelligence

Authors: Saavan Patel, Philip Canoza, Adhiraj Datar, Steven Lu, Chirag Garg, Sayeef Salahuddin

Abstract: New computing paradigms are required to solve the most challenging computational problems where no exact polynomial time solution exists.Probabilistic Ising Accelerators has gained promise on these problems with the ability to model complex probability distributions and find ground states of intractable problems. In this context, we have demonstrated the Parallel Asynchronous Stochastic Sampler (P… ▽ More New computing paradigms are required to solve the most challenging computational problems where no exact polynomial time solution exists.Probabilistic Ising Accelerators has gained promise on these problems with the ability to model complex probability distributions and find ground states of intractable problems. In this context, we have demonstrated the Parallel Asynchronous Stochastic Sampler (PASS), the first fully on-chip integrated, asynchronous, probabilistic accelerator that takes advantage of the intrinsic fine-grained parallelism of the Ising Model and built in state of the art 14nm CMOS FinFET technology. We have demonstrated broad applicability of this accelerator on problems ranging from Combinatorial Optimization, Neural Simulation, to Machine Learning along with up to $23,000$x energy to solution improvement compared to CPUs on probabilistic problems. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 13 page main text, 5 main figures, 21 pages supplementary and methods, 7 supplementary figures, 2 supplementary tables

arXiv:2409.07769 [pdf, other]

Mesh-based Super-Resolution of Fluid Flows with Multiscale Graph Neural Networks

Authors: Shivam Barwey, Pinaki Pal, Saumil Patel, Riccardo Balin, Bethany Lusch, Venkatram Vishwanath, Romit Maulik, Ramesh Balakrishnan

Abstract: A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizatio… ▽ More A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizations, a baseline GNN layer (termed a message passing layer, which updates local node properties) is modified to account for synchronization of coincident graph nodes, rendering compatibility with commonly used element-based mesh connectivities. The architecture is multiscale in nature, and is comprised of a combination of coarse-scale and fine-scale message passing layer sequences (termed processors) separated by a graph unpooling layer. The coarse-scale processor embeds a query element (alongside a set number of neighboring coarse elements) into a single latent graph representation using coarse-scale synchronized message passing over the element neighborhood, and the fine-scale processor leverages additional message passing operations on this latent graph to correct for interpolation errors. Demonstration studies are performed using hexahedral mesh-based data from Taylor-Green Vortex and backward-facing step flow simulations at Reynolds numbers of 1600 and 3200. Through analysis of both global and local errors, the results ultimately show how the GNN is able to produce accurate super-resolved fields compared to targets in both coarse-scale and multiscale model configurations. Reconstruction errors for fixed architectures were found to increase in proportion to the Reynolds number. Geometry extrapolation studies on a separate cavity flow configuration show promising cross-mesh capabilities of the super-resolution strategy. △ Less

Submitted 1 May, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

arXiv:2408.05692 [pdf, other]

A Novel Momentum-Based Deep Learning Techniques for Medical Image Classification and Segmentation

Authors: Koushik Biswas, Ridal Pal, Shaswat Patel, Debesh Jha, Meghana Karri, Amit Reza, Gorkem Durak, Alpay Medetalibeyoglu, Matthew Antalek, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci

Abstract: Accurately segmenting different organs from medical images is a critical prerequisite for computer-assisted diagnosis and intervention planning. This study proposes a deep learning-based approach for segmenting various organs from CT and MRI scans and classifying diseases. Our study introduces a novel technique integrating momentum within residual blocks for enhanced training dynamics in medical i… ▽ More Accurately segmenting different organs from medical images is a critical prerequisite for computer-assisted diagnosis and intervention planning. This study proposes a deep learning-based approach for segmenting various organs from CT and MRI scans and classifying diseases. Our study introduces a novel technique integrating momentum within residual blocks for enhanced training dynamics in medical image analysis. We applied our method in two distinct tasks: segmenting liver, lung, & colon data and classifying abdominal pelvic CT and MRI scans. The proposed approach has shown promising results, outperforming state-of-the-art methods on publicly available benchmarking datasets. For instance, in the lung segmentation dataset, our approach yielded significant enhancements over the TransNetR model, including a 5.72% increase in dice score, a 5.04% improvement in mean Intersection over Union (mIoU), an 8.02% improvement in recall, and a 4.42% improvement in precision. Hence, incorporating momentum led to state-of-the-art performance in both segmentation and classification tasks, representing a significant advancement in the field of medical imaging. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: 8 pages

arXiv:2408.04948 [pdf, other]

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Authors: Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta

Abstract: Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to… ▽ More Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to challenges such as domain specific terminology and complex formats of the documents. We introduce a novel approach based on a combination, called HybridRAG, of the Knowledge Graphs (KGs) based RAG techniques (called GraphRAG) and VectorRAG techniques to enhance question-answer (Q&A) systems for information extraction from financial documents that is shown to be capable of generating accurate and contextually relevant answers. Using experiments on a set of financial earning call transcripts documents which come in the form of Q&A format, and hence provide a natural set of pairs of ground-truth Q&As, we show that HybridRAG which retrieves context from both vector database and KG outperforms both traditional VectorRAG and GraphRAG individually when evaluated at both the retrieval and generation stages in terms of retrieval accuracy and answer generation. The proposed technique has applications beyond the financial domain △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 9 pages, 2 figures, 5 tables

Showing 1–50 of 231 results for author: Patel, S