这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 155 results for author: Thomas, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.24452  [pdf, ps, other

    cs.DC cs.LG

    ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery

    Authors: Xi Cheng, Weijie Shen, Haoming Chen, Chaoyi Shen, Jean Ortega, Jiashang Liu, Steve Thomas, Honglin Zheng, Haoyun Wu, Yuxiang Li, Casey Lichtendahl, Jenny Ortiz, Gang Liu, Haiyang Qi, Omid Fatemieh, Chris Fry, Jing Jing Long

    Abstract: Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) efficiently and accurately forecasting time series or detecting anomalies in large volumes automatically; and (2) ensuring interpretability of results to effectively incorporate business insights. We present ARIMA… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  2. arXiv:2510.21017  [pdf, ps, other

    cs.LG

    Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference

    Authors: Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz

    Abstract: Representation learning is increasingly applied to generate representations that generalize well across multiple downstream tasks. Ensuring fairness guarantees in representation learning is crucial to prevent unfairness toward specific demographic groups in downstream tasks. In this work, we formally introduce the task of learning representations that achieve high-confidence fairness. We aim to gu… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  3. arXiv:2510.16513  [pdf, ps, other

    cs.LG stat.ML

    eDCF: Estimating Intrinsic Dimension using Local Connectivity

    Authors: Dhruv Gupta, Aditya Nagarsekar, Vraj Shah, Sujith Thomas

    Abstract: Modern datasets often contain high-dimensional features exhibiting complex dependencies. To effectively analyze such data, dimensionality reduction methods rely on estimating the dataset's intrinsic dimension (id) as a measure of its underlying complexity. However, estimating id is challenging due to its dependence on scale: at very fine scales, noise inflates id estimates, while at coarser scales… ▽ More

    Submitted 18 October, 2025; originally announced October 2025.

    Comments: 58 pages (35 (main) + 23 (appendix)), 54 figures (27 (main) + 27 (appendix))

  4. arXiv:2510.00144  [pdf, ps, other

    cs.LG cs.AI

    Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback

    Authors: Shreyas Chaudhari, Renhao Zhang, Philip S. Thomas, Bruno Castro da Silva

    Abstract: The ability of reinforcement learning algorithms to learn effective policies is determined by the rewards available during training. However, for practical problems, obtaining large quantities of reward labels is often infeasible due to computational or financial constraints, particularly when relying on human feedback. When reinforcement learning must proceed with limited feedback -- only a fract… ▽ More

    Submitted 30 September, 2025; originally announced October 2025.

  5. arXiv:2509.25149  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Pretraining Large Language Models with NVFP4

    Authors: NVIDIA, Felix Abecassis, Anjulie Agrusa, Dong Ahn, Jonah Alben, Stefania Alborghetti, Michael Andersch, Sivakumar Arayandi, Alexis Bjorlin, Aaron Blakeman, Evan Briones, Ian Buck, Bryan Catanzaro, Jinhang Choi, Mike Chrzanowski, Eric Chung, Victor Cui, Steve Dai, Bita Darvish Rouhani, Carlo del Mundo, Deena Donia, Burc Eryilmaz, Henry Estela, Abhinav Goel, Oleg Goncharov , et al. (64 additional authors not shown)

    Abstract: Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive research and experimentation across the industry. Training a frontier model today requires on the order of tens to hundreds of yottaflops, which is a massive investment of time, compute… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  6. arXiv:2509.04969  [pdf, ps, other

    cs.CL cs.LG

    Classification of kinetic-related injury in hospital triage data using NLP

    Authors: Midhun Shyam, Jim Basilakis, Kieran Luken, Steven Thomas, John Crozier, Paul M. Middleton, X. Rosalind Wang

    Abstract: Triage notes, created at the start of a patient's hospital visit, contain a wealth of information that can help medical staff and researchers understand Emergency Department patient epidemiology and the degree of time-dependent illness or injury. Unfortunately, applying modern Natural Language Processing and Machine Learning techniques to analyse triage data faces some challenges: Firstly, hospita… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: Accepted as a short paper for publishing at ADMA 2025 (https://adma2025.github.io), with Supplementary Material available at https://github.com/CRMDS/Kinetic-Injury-Triage

  7. arXiv:2509.02661  [pdf, ps, other

    cs.AI astro-ph.IM cond-mat.mtrl-sci cs.LG physics.data-an stat.ML

    The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)

    Authors: Andrew Ferguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos , et al. (75 additional authors not shown)

    Abstract: This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and… ▽ More

    Submitted 2 October, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

    Comments: Community Paper from the NSF Future of AI+MPS Workshop, Cambridge, Massachusetts, March 24-26, 2025, supported by NSF Award Number 2512945; v2: minor clarifications

  8. arXiv:2508.14444  [pdf, ps, other

    cs.CL cs.AI cs.LG

    NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

    Authors: NVIDIA, :, Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Alexander Bukharin, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan , et al. (192 additional authors not shown)

    Abstract: We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achi… ▽ More

    Submitted 2 September, 2025; v1 submitted 20 August, 2025; originally announced August 2025.

  9. arXiv:2508.13077  [pdf, ps, other

    eess.IV cs.AI

    From Transthoracic to Transesophageal: Cross-Modality Generation using LoRA Diffusion

    Authors: Emmanuel Oladokun, Yuxuan Ou, Anna Novikova, Daria Kulikova, Sarina Thomas, Jurica Šprem, Vicente Grau

    Abstract: Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality. We address this gap by adapting a TTE-tra… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

    Comments: MICCAI 2025; ASMUS

  10. WiseLVAM: A Novel Framework For Left Ventricle Automatic Measurements

    Authors: Durgesh Kumar Singh, Qing Cao, Sarina Thomas, Ahcène Boubekki, Robert Jenssen, Michael Kampffmeyer

    Abstract: Clinical guidelines recommend performing left ventricular (LV) linear measurements in B-mode echocardiographic images at the basal level -- typically at the mitral valve leaflet tips -- and aligned perpendicular to the LV long axis along a virtual scanline (SL). However, most automated methods estimate landmarks directly from B-mode images for the measurement task, where even small shifts in predi… ▽ More

    Submitted 15 September, 2025; v1 submitted 16 August, 2025; originally announced August 2025.

  11. arXiv:2508.04179  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    The State Of TTS: A Case Study with Human Fooling Rates

    Authors: Praveen Srinivasa Varadhan, Sherry Thomas, Sai Teja M. S., Suvrat Bhooshan, Mitesh M. Khapra

    Abstract: While subjective evaluations in recent years indicate rapid progress in TTS, can current TTS systems truly pass a human deception test in a Turing-like evaluation? We introduce Human Fooling Rate (HFR), a metric that directly measures how often machine-generated speech is mistaken for human. Our large-scale evaluation of open-source and commercial TTS models reveals critical insights: (i) CMOS-bas… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

    Comments: Accepted at InterSpeech 2025

  12. arXiv:2507.03802  [pdf, ps, other

    cs.AI

    Generating Novelty in Open-World Multi-Agent Strategic Board Games

    Authors: Mayank Kejriwal, Shilpa Thomas

    Abstract: We describe GNOME (Generating Novelty in Open-world Multi-agent Environments), an experimental platform that is designed to test the effectiveness of multi-agent AI systems when faced with \emph{novelty}. GNOME separates the development of AI gameplaying agents with the simulator, allowing \emph{unanticipated} novelty (in essence, novelty that is not subject to model-selection bias). Using a Web G… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 16 pages, shorter version demonstrated in NeurIPS 2020

  13. arXiv:2506.08266  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.AP

    Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints

    Authors: Yaswanth Chittepu, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, Philip S. Thomas

    Abstract: Existing approaches to language model alignment often treat safety as a tradeoff against helpfulness, which can lead to unacceptable responses in sensitive domains. To ensure reliable performance in such settings, we propose High-Confidence Safe Reinforcement Learning from Human Feedback (HC-RLHF), a method that provides high-confidence safety guarantees while maximizing helpfulness. Similar to pr… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 20 pages, 6 figures, 4 tables, Second Reinforcement Learning Conference (RLC 2025)

  14. arXiv:2505.18893  [pdf, ps, other

    cs.CY cs.AI

    Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

    Authors: Reva Schwartz, Rumman Chowdhury, Akash Kundu, Heather Frase, Marzieh Fadaee, Tom David, Gabriella Waters, Afaf Taik, Morgan Briggs, Patrick Hall, Shomik Jain, Kyra Yee, Spencer Thomas, Sundeep Bhandari, Paul Duncan, Andrew Thompson, Maya Carlyle, Qinghua Lu, Matthew Holmes, Theodora Skeadas

    Abstract: Conventional AI evaluation approaches concentrated within the AI stack exhibit systemic limitations for exploring, navigating and resolving the human and societal factors that play out in real world deployment such as in education, finance, healthcare, and employment sectors. AI capability evaluations can capture detail about first-order effects, such as whether immediate system outputs are accura… ▽ More

    Submitted 30 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: 9 pages

  15. arXiv:2505.18609  [pdf, other

    cs.CL

    RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations

    Authors: Ashwin Sankar, Yoach Lacombe, Sherry Thomas, Praveen Srinivasa Varadhan, Sanchit Gandhi, Mitesh M Khapra

    Abstract: We introduce RASMALAI, a large-scale speech dataset with rich text descriptions, designed to advance controllable and expressive text-to-speech (TTS) synthesis for 23 Indian languages and English. It comprises 13,000 hours of speech and 24 million text-description annotations with fine-grained attributes like speaker identity, accent, emotion, style, and background conditions. Using RASMALAI, we d… ▽ More

    Submitted 27 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  16. arXiv:2505.10779  [pdf, ps, other

    cs.AI

    Qualia Optimization

    Authors: Philip S. Thomas

    Abstract: This report explores the speculative question: what if current or future AI systems have qualia, such as pain or pleasure? It does so by assuming that AI systems might someday possess qualia -- and that the quality of these subjective experiences should be considered alongside performance metrics. Concrete mathematical problem settings, inspired by reinforcement learning formulations and theories… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Technical Report, College of Information and Computer Science, University of Massachusetts

    Report number: UM-CICS-2025-001

  17. arXiv:2505.09439  [pdf, ps, other

    eess.AS cs.SD

    Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

    Authors: Andrew Rouditchenko, Saurabhchand Bhati, Edson Araujo, Samuel Thomas, Hilde Kuehne, Rogerio Feris, James Glass

    Abstract: We propose Omni-R1 which fine-tunes a recent multi-modal LLM, Qwen2.5-Omni, on an audio question answering dataset with the reinforcement learning method GRPO. This leads to new State-of-the-Art performance on the recent MMAU and MMAR benchmarks. Omni-R1 achieves the highest accuracies on the sounds, music, speech, and overall average categories, both on the Test-mini and Test-full splits. To unde… ▽ More

    Submitted 2 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  18. arXiv:2505.07046  [pdf, ps, other

    math.NA cs.LG

    Streaming Krylov-Accelerated Stochastic Gradient Descent

    Authors: Stephen Thomas

    Abstract: We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient Descent), a novel optimization approach that accelerates convergence for ill-conditioned problems by projecting stochastic gradients onto a low-dimensional Krylov subspace. Directly inspired by recent advances in s-step Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers \cite{dambra2025sstep}, our method extends… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  19. arXiv:2505.01237  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

    Authors: Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogerio Feris, James R. Glass, Hilde Kuehne

    Abstract: Recent advances in audio-visual learning have shown promising results in learning representations across modalities. However, most approaches rely on global audio representations that fail to capture fine-grained temporal correspondences with visual frames. Additionally, existing methods often struggle with conflicting optimization objectives when trying to jointly learn reconstruction and cross-m… ▽ More

    Submitted 21 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: To be published at CVPR 2025, code available at https://github.com/edsonroteia/cav-mae-sync

  20. arXiv:2504.03624  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 5 September, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  21. arXiv:2504.03623  [pdf, other

    cs.CV

    Quantifying the uncertainty of model-based synthetic image quality metrics

    Authors: Ciaran Bench, Spencer A. Thomas

    Abstract: The quality of synthetically generated images (e.g. those produced by diffusion models) are often evaluated using information about image contents encoded by pretrained auxiliary models. For example, the Fréchet Inception Distance (FID) uses embeddings from an InceptionV3 model pretrained to classify ImageNet. The effectiveness of this feature embedding model has considerable impact on the trustwo… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  22. arXiv:2503.11627  [pdf, other

    cs.SD cs.LG eess.AS

    Are Deep Speech Denoising Models Robust to Adversarial Noise?

    Authors: Will Schwarzer, Philip S. Thomas, Andrea Fanelli, Xiaoyu Liu

    Abstract: Deep noise suppression (DNS) models enjoy widespread use throughout a variety of high-stakes speech applications. However, in this paper, we show that four recent DNS models can each be reduced to outputting unintelligible gibberish through the addition of imperceptible adversarial noise. Furthermore, our results show the near-term plausibility of targeted attacks, which could induce models to out… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 13 pages, 5 figures

  23. arXiv:2502.18447  [pdf, other

    cs.LG

    Supervised Reward Inference

    Authors: Will Schwarzer, Jordan Schneider, Philip S. Thomas, Scott Niekum

    Abstract: Existing approaches to reward inference from behavior typically assume that humans provide demonstrations according to specific models of behavior. However, humans often indicate their goals through a wide range of behaviors, from actions that are suboptimal due to poor planning or execution to behaviors which are intended to communicate goals rather than achieve them. We propose that supervised l… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 16 pages, 4 figures

  24. Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction

    Authors: WonJin Yoon, Boyu Ren, Spencer Thomas, Chanhwi Kim, Guergana Savova, Mei-Hua Hall, Timothy Miller

    Abstract: Recent progress in large language models (LLMs) has enabled the automated processing of lengthy documents even without supervised training on a task-specific dataset. Yet, their zero-shot performance in complex tasks as opposed to straightforward information extraction tasks remains suboptimal. One feasible approach for tasks with lengthy, complex input is to first summarize the document and then… ▽ More

    Submitted 10 November, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: Published in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025, Main Track)

    Journal ref: In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28025-28042, Suzhou, China. Association for Computational Linguistics

  25. arXiv:2502.02475  [pdf, other

    eess.IV cs.CV physics.med-ph

    Style transfer as data augmentation: evaluating unpaired image-to-image translation models in mammography

    Authors: Emir Ahmed, Spencer A. Thomas, Ciaran Bench

    Abstract: Several studies indicate that deep learning models can learn to detect breast cancer from mammograms (X-ray images of the breasts). However, challenges with overfitting and poor generalisability prevent their routine use in the clinic. Models trained on data from one patient population may not perform well on another due to differences in their data domains, emerging due to variations in scanning… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  26. arXiv:2502.01547  [pdf, other

    eess.AS cs.CV cs.SD

    mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

    Authors: Andrew Rouditchenko, Samuel Thomas, Hilde Kuehne, Rogerio Feris, James Glass

    Abstract: Audio-Visual Speech Recognition (AVSR) combines lip-based video with audio and can improve performance in noise, but most methods are trained only on English data. One limitation is the lack of large-scale multilingual video data, which makes it hard to train models from scratch. In this work, we propose mWhisper-Flamingo for multilingual AVSR which combines the strengths of a pre-trained audio mo… ▽ More

    Submitted 7 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Accepted in Signal Processing Letters. Code at https://github.com/roudimit/whisper-flamingo

  27. arXiv:2501.17570  [pdf, other

    eess.IV cs.CV physics.med-ph

    Trustworthy image-to-image translation: evaluating uncertainty calibration in unpaired training scenarios

    Authors: Ciaran Bench, Emir Ahmed, Spencer A. Thomas

    Abstract: Mammographic screening is an effective method for detecting breast cancer, facilitating early diagnosis. However, the current need to manually inspect images places a heavy burden on healthcare systems, spurring a desire for automated diagnostic protocols. Techniques based on deep neural networks have been shown effective in some studies, but their tendency to overfit leaves considerable risk for… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  28. arXiv:2501.09819  [pdf, other

    cs.RO

    Torque Responsive Metamaterials Enable High Payload Soft Robot Arms

    Authors: Ian Good, Srivatsan Balaji, David Oh, Sawyer Thomas, Jeffrey I. Lipton

    Abstract: Soft robots have struggled to support large forces and moments while also supporting their own weight against gravity. This limits their ability to reach certain configurations necessary for tasks such as inspection and pushing objects up. We have overcome this limitation by creating an electrically driven metamaterial soft arm using handed shearing auxetics (HSA) and bendable extendable torque re… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 9 pages, 8 figures, currently under review

  29. arXiv:2501.09104  [pdf, other

    cs.SD cs.AI eess.AS

    A Non-autoregressive Model for Joint STT and TTS

    Authors: Vishal Sunder, Brian Kingsbury, George Saon, Samuel Thomas, Slava Shechtman, Hagai Aronowitz, Eric Fosler-Lussier, Luis Lastras

    Abstract: In this paper, we take a step towards jointly modeling automatic speech recognition (STT) and speech synthesis (TTS) in a fully non-autoregressive way. We develop a novel multimodal framework capable of handling the speech and text modalities as input either individually or together. The proposed model can also be trained with unpaired speech or text data owing to its multimodal nature. We further… ▽ More

    Submitted 20 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: 5 pages, 3 figures, 3 tables

  30. arXiv:2410.23744  [pdf, other

    cs.CV

    EchoNarrator: Generating natural text explanations for ejection fraction predictions

    Authors: Sarina Thomas, Qing Cao, Anna Novikova, Daria Kulikova, Guy Ben-Yosef

    Abstract: Ejection fraction (EF) of the left ventricle (LV) is considered as one of the most important measurements for diagnosing acute heart failure and can be estimated during cardiac ultrasound acquisition. While recent successes in deep learning research successfully estimate EF values, the proposed models often lack an explanation for the prediction. However, providing clear and intuitive explanations… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: accepted for MICCAI 2024

  31. arXiv:2410.02172  [pdf, other

    cs.LG cs.AI stat.ML

    Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation

    Authors: Shreyas Chaudhari, Ameet Deshpande, Bruno Castro da Silva, Philip S. Thomas

    Abstract: Evaluating policies using off-policy data is crucial for applying reinforcement learning to real-world problems such as healthcare and autonomous driving. Previous methods for off-policy evaluation (OPE) generally suffer from high variance or irreducible bias, leading to unacceptably high prediction errors. In this work, we introduce STAR, a framework for OPE that encompasses a broad range of esti… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted at the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

  32. arXiv:2409.05356  [pdf, other

    cs.CL cs.LG cs.SD eess.SP

    IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

    Authors: Ashwin Sankar, Srija Anand, Praveen Srinivasa Varadhan, Sherry Thomas, Mehak Singal, Shridhar Kumar, Deovrat Mehendale, Aditi Krishana, Giri Raju, Mitesh Khapra

    Abstract: Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations… ▽ More

    Submitted 7 October, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted to NeurIPS 2024 Datasets and Benchmarks track

  33. arXiv:2409.00779  [pdf, other

    cs.CV

    Unbalanced Fingerprint Classification for Hybrid Fingerprint Orientation Maps

    Authors: Ravi Prakash, Sinnu Susan Thomas

    Abstract: This paper introduces a novel fingerprint classification technique based on a multi-layered fuzzy logic classifier. We target the cause of missed detection by identifying the fingerprints at an early stage among dry, standard, and wet. Scanned images are classified based on clarity correlated with the proposed feature points. We also propose a novel adaptive algorithm based on eigenvector space fo… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 10 pages, 18 figures, 4 Tables The work mainly focuses on fingerprint classification and hybrid fingerprint orientation map (HFOM) generation. It highlights the security use cases of HFOM, eg. data encryption

  34. arXiv:2408.06469  [pdf, ps, other

    quant-ph cs.ET

    Design and architecture of the IBM Quantum Engine Compiler

    Authors: Michael B. Healy, Reza Jokar, Soolu Thomas, Vincent R. Pascuzzi, Kit Barton, Thomas A. Alexander, Roy Elkabetz, Brian C. Donovan, Hiroshi Horii, Marius Hillenbrand

    Abstract: In this work, we describe the design and architecture of the open-source Quantum Engine Compiler (qe-compiler) currently used in production for IBM Quantum systems. The qe-compiler is built using LLVM's Multi-Level Intermediate Representation (MLIR) framework and includes definitions for several dialects to represent parameterized quantum computation at multiple levels of abstraction. The compiler… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: To be published in the proceedings of the IEEE International Conference on Quantum Computing and Engineering 2024 (QCE24)

  35. arXiv:2408.03933  [pdf, ps, other

    cs.DS

    Lower Bounds for Approximate (& Exact) k-Disjoint-Shortest-Paths

    Authors: Rajesh Chitnis, Samuel Thomas, Anthony Wirth

    Abstract: Given a graph $G=(V,E)$ and a set $T=\{ (s_i, t_i) : 1\leq i\leq k \}\subseteq V\times V$ of $k$ pairs, the $k$-vertex-disjoint-paths (resp. $k$-edge-disjoint-paths) problem asks to determine whether there exist~$k$ pairwise vertex-disjoint (resp. edge-disjoint) paths $P_1, P_2, ..., P_k$ in $G$ such that, for each $1\leq i\leq k$, $P_i$ connects $s_i$ to $t_i$. Both the edge-disjoint and vertex-d… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  36. Influence of Personality Traits on Plagiarism Through Collusion in Programming Assignments

    Authors: Parthasarathy PD, Ishaan Kapoor, Swaroop Joshi, Sujith Thomas

    Abstract: Educating students about academic integrity expectations has been suggested as one of the ways to reduce malpractice in take-home programming assignments. We test this hypothesis using data collected from an artificial intelligence course with 105 participants (N=105) at a university in India. The AI course had two programming assignments. Plagiarism through collusion was quantified using the Meas… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, To be published in ACM International Conference on Computing Education Research (ICER) 2024

  37. arXiv:2406.16241  [pdf, other

    cs.LG stat.ME

    Position: Benchmarking is Limited in Reinforcement Learning Research

    Authors: Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas

    Abstract: Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, The Forty-first International Conference on Machine Learning (ICML 2024)

  38. arXiv:2406.10082  [pdf, other

    eess.AS cs.CV cs.SD

    Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

    Authors: Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

    Abstract: Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data differe… ▽ More

    Submitted 19 November, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. V3: Added results on LRS2. Code at https://github.com/roudimit/whisper-flamingo

  39. arXiv:2406.05646  [pdf, other

    cs.LG

    ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

    Authors: Kartik Choudhary, Dhawal Gupta, Philip S. Thomas

    Abstract: We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challenging real-world problem. However, creating usable M… ▽ More

    Submitted 14 October, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Reinforcement Learning Conference 2024

  40. arXiv:2403.10652  [pdf, other

    cs.LG q-fin.RM

    Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization

    Authors: Cecilia Ying, Stephen Thomas

    Abstract: In an effort to improve the accuracy of credit lending decisions, many financial intuitions are now using predictions from machine learning models. While such predictions enjoy many advantages, recent research has shown that the predictions have the potential to be biased and unfair towards certain subgroups of the population. To combat this, several techniques have been introduced to help remove… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Neural Information Processing Systems (NeurIPS) Workshop in Strategic ML

  41. arXiv:2402.19450  [pdf, other

    cs.AI cs.CL

    Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

    Authors: Saurabh Srivastava, Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

    Abstract: We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 37 pages, 10 figures

  42. arXiv:2402.19062  [pdf, other

    eess.IV cs.CV cs.LG

    Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach

    Authors: Sarina Thomas, Cristiana Tiago, Børge Solli Andreassen, Svein Arne Aase, Jurica Šprem, Erik Steen, Anne Solberg, Guy Ben-Yosef

    Abstract: To facilitate diagnosis on cardiac ultrasound (US), clinical practice has established several standard views of the heart, which serve as reference points for diagnostic measurements and define viewports from which images are acquired. Automatic view recognition involves grouping those images into classes of standard views. Although deep learning techniques have been successful in achieving this,… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Presented at ASMUS - MICCAI conference 2023, Vancouver

  43. arXiv:2402.12008  [pdf, other

    cs.LG cs.AI stat.ML

    Cluster Metric Sensitivity to Irrelevant Features

    Authors: Miles McCrory, Spencer A. Thomas

    Abstract: Clustering algorithms are used extensively in data analysis for data exploration and discovery. Technological advancements lead to continually growth of data in terms of volume, dimensionality and complexity. This provides great opportunities in data analytics as the data can be interrogated for many different purposes. This however leads challenges, such as identification of relevant features for… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  44. arXiv:2402.09390  [pdf, other

    cs.AI cs.CL

    HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

    Authors: Yihao Fang, Stephen W. Thomas, Xiaodan Zhu

    Abstract: With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations has emerged as a significant concern. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrie… ▽ More

    Submitted 2 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  45. arXiv:2401.02843  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Thousands of AI Authors on the Future of AI

    Authors: Katja Grace, Harlan Stewart, Julia Fabienne Sandkühler, Stephen Thomas, Ben Weinstein-Raun, Jan Brauner, Richard C. Korzekwa

    Abstract: In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating… ▽ More

    Submitted 8 October, 2025; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: The asterisk indicates the corresponding author. The dagger indicates equal contribution

    Journal ref: Journal of Artificial Intelligence Research 84:9 (2025)

  46. arXiv:2312.12972  [pdf, other

    cs.LG

    From Past to Future: Rethinking Eligibility Traces

    Authors: Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

    Abstract: In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value functio… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted in The 38th Annual AAAI Conference on Artificial Intelligence

  47. arXiv:2310.19007  [pdf, other

    cs.LG

    Behavior Alignment via Reward Function Optimization

    Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

    Abstract: Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outco… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: (Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  48. arXiv:2310.15358  [pdf, other

    cs.LG cs.CY stat.ML

    Learning Fair Representations with High-Confidence Guarantees

    Authors: Yuhong Luo, Austin Hoag, Philip S. Thomas

    Abstract: Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks. The development of representation learning algorithms that provide strong fairness guarantees is thus important because it can prevent unfairness towards disadvantaged groups for all downstream prediction tasks. To prevent unfairness towards disadvantaged groups in all… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  49. arXiv:2310.01210  [pdf, other

    eess.IV cs.CV cs.LG

    Towards Robust Cardiac Segmentation using Graph Convolutional Networks

    Authors: Gilles Van De Vyver, Sarina Thomas, Guy Ben-Yosef, Sindre Hellum Olaisen, Håvard Dalen, Lasse Løvstakken, Erik Smistad

    Abstract: Fully automatic cardiac segmentation can be a fast and reproducible method to extract clinical measurements from an echocardiography examination. The U-Net architecture is the current state-of-the-art deep learning architecture for medical segmentation and can segment cardiac structures in real-time with average errors comparable to inter-observer variability. However, this architecture still gene… ▽ More

    Submitted 2 July, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  50. arXiv:2308.13517  [pdf, other

    cs.CL cs.AI

    ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection

    Authors: Yihao Fang, Xianzhi Li, Stephen W. Thomas, Xiaodan Zhu

    Abstract: Open intent detection, a crucial aspect of natural language understanding, involves the identification of previously unseen intents in user-generated text. Despite the progress made in this field, challenges persist in handling new combinations of language components, which is essential for compositional generalization. In this paper, we present a case study exploring the use of ChatGPT as a data… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the Joint Workshop of the 5th Financial Technology and Natural Language Processing (FinNLP) and 2nd Multimodal AI For Financial Forecasting (Muffin), Macao, August 20, 2023