这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 206 results for author: Goel, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.13363  [pdf, ps, other

    cs.CV cs.AI

    Just Add Geometry: Gradient-Free Open-Vocabulary 3D Detection Without Human-in-the-Loop

    Authors: Atharv Goel, Mehar Khurana

    Abstract: Modern 3D object detection datasets are constrained by narrow class taxonomies and costly manual annotations, limiting their ability to scale to open-world settings. In contrast, 2D vision-language models trained on web-scale image-text pairs exhibit rich semantic understanding and support open-vocabulary detection via natural language prompts. In this work, we leverage the maturity and category d… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2507.08128  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

    Authors: Arushi Goel, Sreyan Ghosh, Jaehyeon Kim, Sonal Kumar, Zhifeng Kong, Sang-gil Lee, Chao-Han Huck Yang, Ramani Duraiswami, Dinesh Manocha, Rafael Valle, Bryan Catanzaro

    Abstract: We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the mode… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Code, Datasets and Models: https://research.nvidia.com/labs/adlr/AF3/

  3. arXiv:2507.06574  [pdf, ps, other

    cs.RO

    AI Space Cortex: An Experimental System for Future Era Space Exploration

    Authors: Thomas Touma, Ersin Daş, Erica Tevere, Martin Feather, Ksenia Kolcio, Maurice Prather, Alberto Candela, Ashish Goel, Erik Kramer, Hari Nayar, Lorraine Fesq, Joel W. Burdick

    Abstract: Our Robust, Explainable Autonomy for Scientific Icy Moon Operations (REASIMO) effort contributes to NASA's Concepts for Ocean worlds Life Detection Technology (COLDTech) program, which explores science platform technologies for ocean worlds such as Europa and Enceladus. Ocean world missions pose significant operational challenges. These include long communication lags, limited power, and lifetime… ▽ More

    Submitted 21 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

  4. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  5. arXiv:2505.11228  [pdf, ps, other

    cs.SI cs.LG

    Learning hidden cascades via classification

    Authors: Derrick Gilchrist Edward Manoharan, Anubha Goel, Alexandros Iosifidis, Henri Hansen, Juho Kanniainen

    Abstract: The spreading dynamics in social networks are often studied under the assumption that individuals' statuses, whether informed or infected, are fully observable. However, in many real-world situations, such statuses remain unobservable, which is crucial for determining an individual's potential to further spread the infection. While this final status is hidden, intermediate indicators such as sympt… ▽ More

    Submitted 12 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  6. arXiv:2505.08084  [pdf, other

    cs.CV

    Visually Interpretable Subtask Reasoning for Visual Question Answering

    Authors: Yu Cheng, Arushi Goel, Hakan Bilen

    Abstract: Answering complex visual questions like `Which red furniture can be used for sitting?' requires multi-step reasoning, including object recognition, attribute filtering, and relational understanding. Recent work improves interpretability in multimodal large language models (MLLMs) by decomposing tasks into sub-task programs, but these methods are computationally expensive and less accurate due to p… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.06314  [pdf, ps, other

    cs.CY cs.AI

    A4L: An Architecture for AI-Augmented Learning

    Authors: Ashok Goel, Ploy Thajchayapong, Vrinda Nandan, Harshvardhan Sikka, Spencer Rugaber

    Abstract: AI promises personalized learning and scalable education. As AI agents increasingly permeate education in support of teaching and learning, there is a critical and urgent need for data architectures for collecting and analyzing data on learning, and feeding the results back to teachers, learners, and the AI agents for personalization of learning at scale. At the National AI Institute for Adult Lea… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 14 pages, 7 figures

  8. arXiv:2505.03770  [pdf, other

    cs.AI

    Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

    Authors: Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter, Raghav Awasthi, Soumya Banerjee, Joe M. Barnby, Rhea Basappa, Severin Bergsmann, Djallel Bouneffouf, Patrick Callaghan, Marc Cavazza, Thierry Chaminade, Sonia Chernova, Mohamed Chetouan, Moumita Choudhury, Axel Cleeremans, Jacek B. Cywinski, Fabio Cuzzolin , et al. (83 additional authors not shown)

    Abstract: This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

    Submitted 28 April, 2025; originally announced May 2025.

    Comments: workshop proceedings

  9. arXiv:2505.03165  [pdf, other

    cs.LG cs.SE

    Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis

    Authors: Nikita Ravi, Abhinav Goel, James C. Davis, George K. Thiruvathukal

    Abstract: The field of deep learning has witnessed significant breakthroughs, spanning various applications, and fundamentally transforming current software capabilities. However, alongside these advancements, there have been increasing concerns about reproducing the results of these deep learning methods. This is significant because reproducibility is the foundation of reliability and validity in software… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  10. arXiv:2504.13884  [pdf, other

    cs.HC cs.AI cs.CV

    Towards a Multimodal Document-grounded Conversational AI System for Education

    Authors: Karan Taneja, Anjali Singh, Ashok K. Goel

    Abstract: Multimedia learning using text and images has been shown to improve learning outcomes compared to text-only instruction. But conversational AI systems in education predominantly rely on text-based interactions while multimodal conversations for multimedia learning remain unexplored. Moreover, deploying conversational AI in learning contexts requires grounding in reliable sources and verifiability… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 15 pages, 4 figures, AIED 2025

  11. arXiv:2504.07463  [pdf, ps, other

    cs.AI

    Enhanced Question-Answering for Skill-based learning using Knowledge-based AI and Generative AI

    Authors: Rahul K. Dass, Rochan H. Madhusudhana, Erin C. Deye, Shashank Verma, Timothy A. Bydlon, Grace Brazil, Ashok K. Goel

    Abstract: Supporting learners' understanding of taught skills in online settings is a longstanding challenge. While exercises and chat-based agents can evaluate understanding in limited contexts, this challenge is magnified when learners seek explanations that delve into procedural knowledge (how things are done) and reasoning (why things happen). We hypothesize that an intelligent agent's ability to unders… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  12. arXiv:2504.07403  [pdf, other

    cs.LG

    Multi-Selection for Recommendation Systems

    Authors: Sahasrajit Sarmasarkar, Zhihao Jiang, Ashish Goel, Aleksandra Korolova, Kamesh Munagala

    Abstract: We present the construction of a multi-selection model to answer differentially private queries in the context of recommendation systems. The server sends back multiple recommendations and a ``local model'' to the user, which the user can run locally on its device to select the item that best fits its private features. We study a setup where the server uses a deep neural network (trained on the Mo… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  13. arXiv:2504.06500  [pdf, other

    eess.SY cs.LG cs.RO

    Data-driven Fuzzy Control for Time-Optimal Aggressive Trajectory Following

    Authors: August Phelps, Juan Augusto Paredes Salazar, Ankit Goel

    Abstract: Optimal trajectories that minimize a user-defined cost function in dynamic systems require the solution of a two-point boundary value problem. The optimization process yields an optimal control sequence that depends on the initial conditions and system parameters. However, the optimal sequence may result in undesirable behavior if the system's initial conditions and parameters are erroneous. This… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 6 pages, 10 figures, submitted to MECC 2025

  14. arXiv:2502.18504  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice

    Authors: Aman Goel, Xian Carrie Wu, Zhe Wang, Dmitriy Bespalov, Yanjun Qi

    Abstract: Jailbreaking large-language models (LLMs) involves testing their robustness against adversarial prompts and evaluating their ability to withstand prompt attacks that could elicit unauthorized or malicious responses. In this paper, we present TurboFuzzLLM, a mutation-based fuzzing technique for efficiently finding a collection of effective jailbreaking templates that, when combined with harmful que… ▽ More

    Submitted 4 June, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: Oral presentation at NAACL 2025 industry track

  15. arXiv:2502.09843  [pdf, other

    cs.AI cs.HC cs.MM

    MuDoC: An Interactive Multimodal Document-grounded Conversational AI System

    Authors: Karan Taneja, Ashok K. Goel

    Abstract: Multimodal AI is an important step towards building effective tools to leverage multiple modalities in human-AI communication. Building a multimodal document-grounded AI system to interact with long documents remains a challenge. Our work aims to fill the research gap of directly leveraging grounded visuals from documents alongside textual content in documents for response generation. We present a… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 5 pages, 3 figures, AAAI-MAKE 2025

  16. arXiv:2502.01380  [pdf, other

    cs.GT

    Metric Distortion of Small-group Deliberation

    Authors: Ashish Goel, Mohak Goyal, Kamesh Munagala

    Abstract: We consider models for social choice where voters rank a set of choices (or alternatives) by deliberating in small groups of size at most $k$, and these outcomes are aggregated by a social choice rule to find the winning alternative. We ground these models in the metric distortion framework, where the voters and alternatives are embedded in a latent metric space, with closer alternative being more… ▽ More

    Submitted 20 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: To appear in ACM STOC 2025

  17. arXiv:2501.18532  [pdf, other

    cs.CL cs.LG

    Differentially Private Steering for Large Language Model Alignment

    Authors: Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal

    Abstract: Aligning Large Language Models (LLMs) with human values and away from undesirable behaviors (such as hallucination) has become increasingly important. Recently, steering LLMs towards a desired behavior via activation editing has emerged as an effective method to mitigate harmful generations at inference-time. Activation editing modifies LLM representations by preserving information from positive d… ▽ More

    Submitted 20 March, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: ICLR 2025 Camera Ready; Code: https://github.com/UKPLab/iclr2025-psa

  18. arXiv:2501.15464  [pdf, other

    cs.CV cs.AI

    TractoGPT: A GPT architecture for White Matter Segmentation

    Authors: Anoushkrit Goel, Simroop Singh, Ankita Joshi, Ranjeet Ranjan Jha, Chirag Ahuja, Aditya Nigam, Arnav Bhavsar

    Abstract: White matter bundle segmentation is crucial for studying brain structural connectivity, neurosurgical planning, and neurological disorders. White Matter Segmentation remains challenging due to structural similarity in streamlines, subject variability, symmetry in 2 hemispheres, etc. To address these challenges, we propose TractoGPT, a GPT-based architecture trained on streamline, cluster, and fusi… ▽ More

    Submitted 21 February, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

    Comments: Accepted as a conference paper at 23rd IEEE International Symposium on Biomedical Imaging 2025. IEEE holds the copyright for this publication

  19. Self-Explanation in Social AI Agents

    Authors: Rhea Basappa, Mustafa Tekman, Hong Lu, Benjamin Faught, Sandeep Kakar, Ashok K. Goel

    Abstract: Social AI agents interact with members of a community, thereby changing the behavior of the community. For example, in online learning, an AI social assistant may connect learners and thereby enhance social interaction. These social AI assistants too need to explain themselves in order to enhance transparency and trust with the learners. We present a method of self-explanation that uses introspect… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    Comments: Extended version of the paper published in International Conference on Intelligent Tutoring Systems, pages 351-360, 2024, Springer. Images corrected, and live deployment, ablation, and precision study results added

  20. arXiv:2501.03618  [pdf, other

    cs.HC

    The Textbook of Tomorrow: Rethinking Course Material Interfacing in the Era of GPT

    Authors: Audrey Olson, Pratyusha Maiti, Ashok Goel

    Abstract: Online Learning Management Systems (LMSs), such as Blackboard and Canvas, have existed for decades. Yet, course readings, when provided at all, consistently exist as simple digital twins to their real-life counterparts. While online tools and resources exist to help students process digital texts more efficiently or in ways better suited to their learning styles, knowledge about such resources is… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: 5 pages, 2 figures

  21. arXiv:2412.20760  [pdf, other

    cs.CL cs.AI

    Attributing Culture-Conditioned Generations to Pretraining Corpora

    Authors: Huihan Li, Arnav Goel, Keyu He, Xiang Ren

    Abstract: In open-ended generative tasks like narrative writing or dialogue, large language models often exhibit cultural biases, showing limited knowledge and generating templated outputs for less prevalent cultures. Recent works show that these biases may stem from uneven cultural representation in pretraining corpora. This work investigates how pretraining leads to biased culture-conditioned generations… ▽ More

    Submitted 19 March, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

  22. arXiv:2412.19351  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    ETTA: Elucidating the Design Space of Text-to-Audio Models

    Authors: Sang-gil Lee, Zhifeng Kong, Arushi Goel, Sungwon Kim, Rafael Valle, Bryan Catanzaro

    Abstract: Recent years have seen significant progress in Text-To-Audio (TTA) synthesis, enabling users to enrich their creative workflows with synthetic audio generated from natural language prompts. Despite this progress, the effects of data, model architecture, training objective functions, and sampling strategies on target benchmarks are not well understood. With the purpose of providing a holistic under… ▽ More

    Submitted 30 June, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: ICML 2025. Demo: https://research.nvidia.com/labs/adlr/ETTA/ Code: https://github.com/NVIDIA/elucidated-text-to-audio

  23. arXiv:2411.15128  [pdf, other

    cs.LG cs.AI cs.CV cs.MM eess.IV

    Health AI Developer Foundations

    Authors: Atilla P. Kiraly, Sebastien Baur, Kenneth Philbrick, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Nick George, Fayaz Jamil, Jing Tang, Kai Bailey, Faruk Ahmed, Akshay Goel, Abbi Ward, Lin Yang, Andrew Sellergren, Yossi Matias, Avinatan Hassidim, Shravya Shetty, Daniel Golden, Shekoofeh Azizi, David F. Steiner, Yun Liu, Tim Thelin, Rory Pilgrim , et al. (1 additional authors not shown)

    Abstract: Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer… ▽ More

    Submitted 26 November, 2024; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 16 pages, 8 figures

  24. TractoEmbed: Modular Multi-level Embedding framework for white matter tract segmentation

    Authors: Anoushkrit Goel, Bipanjit Singh, Ankita Joshi, Ranjeet Ranjan Jha, Chirag Ahuja, Aditya Nigam, Arnav Bhavsar

    Abstract: White matter tract segmentation is crucial for studying brain structural connectivity and neurosurgical planning. However, segmentation remains challenging due to issues like class imbalance between major and minor tracts, structural similarity, subject variability, symmetric streamlines between hemispheres etc. To address these challenges, we propose TractoEmbed, a modular multi-level embedding f… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Accepted at 27th International Conference on Pattern Recognition (ICPR), 2024 15 pages, 2 figures

  25. arXiv:2411.06643  [pdf, other

    cs.RO

    Flight Demonstration and Model Validation of a Prototype Variable-Altitude Venus Aerobot

    Authors: Jacob S. Izraelevitz, Siddharth Krishnamoorthy, Ashish Goel, Caleb Turner, Carolina Aiazzi, Michael Pauken, Kevin Carlson, Gerald Walsh, Carl Leake, Carlos Quintana, Christopher Lim, Abhi Jain, Leonard Dorsky, Kevin Baines, James Cutts, Paul K. Byrne, Tim Lachenmeier, Jeffery L. Hall

    Abstract: This paper details a significant milestone towards maturing a buoyant aerial robotic platform, or aerobot, for flight in the Venus clouds. We describe two flights of our subscale altitude-controlled aerobot, fabricated from the materials necessary to survive Venus conditions. During these flights over the Nevada Black Rock desert, the prototype flew at the identical atmospheric densities as 54 to… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Preprint submitted to AIAA Journal of Aircraft

  26. arXiv:2411.05757  [pdf, other

    cs.LG

    Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network

    Authors: Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel, Ranjeet Ranjan Jha, Chirag Ahuja, Arnav Bhavsar, Aditya Nigam

    Abstract: Fiber tractography is a cornerstone of neuroimaging, enabling the detailed mapping of the brain's white matter pathways through diffusion MRI. This is crucial for understanding brain connectivity and function, making it a valuable tool in neurological applications. Despite its importance, tractography faces challenges due to its complexity and susceptibility to false positives, misrepresenting vit… ▽ More

    Submitted 14 November, 2024; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: Accepted at 27th International Conference on Pattern Recognition (ICPR), 2024

  27. arXiv:2410.17161  [pdf, ps, other

    cs.CL cs.LG cs.LO

    Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence

    Authors: İlker Işık, Ramazan Gokberk Cinbis, Ebru Aydin Gol

    Abstract: Language models lack the notion of interchangeable tokens: symbols that are semantically equivalent yet distinct, such as bound variables in formal logic. This limitation prevents generalization to larger vocabularies and hinders the model's ability to recognize alpha-equivalence, where renaming bound variables preserves meaning. We formalize this machine learning problem and introduce alpha-covar… ▽ More

    Submitted 18 June, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: ICML 2025 Poster Paper, Camera Ready Version

  28. arXiv:2410.12109  [pdf, other

    cs.CL cs.CV

    OMCAT: Omni Context Aware Transformer

    Authors: Arushi Goel, Karan Sapra, Matthieu Le, Rafael Valle, Andrew Tao, Bryan Catanzaro

    Abstract: Large Language Models (LLMs) have made significant strides in text generation and comprehension, with recent advancements extending into multimodal LLMs that integrate visual and audio inputs. However, these models continue to struggle with fine-grained, cross-modal temporal understanding, particularly when correlating events across audio and video streams. We address these challenges with two key… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Demo page: https://om-cat.github.io

  29. arXiv:2410.11773  [pdf, other

    q-fin.RM cs.AI

    Time-Series Foundation AI Model for Value-at-Risk Forecasting

    Authors: Anubha Goel, Puneet Pasricha, Juho Kanniainen

    Abstract: This study is the first to analyze the performance of a time-series foundation AI model for Value-at-Risk (VaR), which essentially forecasts the left-tail quantiles of returns. Foundation models, pre-trained on diverse datasets, can be applied in a zero-shot setting with minimal data or further improved through finetuning. We compare Google's TimesFM model to conventional parametric and non-parame… ▽ More

    Submitted 12 May, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  30. arXiv:2409.13654  [pdf, ps, other

    cs.LG math.DS

    A Novel Neural Filter to Improve Accuracy of Neural Network Models of Dynamic Systems

    Authors: Parham Oveissi, Turibius Rozario, Ankit Goel

    Abstract: The application of neural networks in modeling dynamic systems has become prominent due to their ability to estimate complex nonlinear functions. Despite their effectiveness, neural networks face challenges in long-term predictions, where the prediction error diverges over time, thus degrading their accuracy. This paper presents a neural filter to enhance the accuracy of long-term state prediction… ▽ More

    Submitted 7 June, 2025; v1 submitted 20 September, 2024; originally announced September 2024.

  31. arXiv:2408.11936  [pdf, other

    cs.AI cs.HC

    Estimating Contribution Quality in Online Deliberations Using a Large Language Model

    Authors: Lodewijk Gelauff, Mohak Goyal, Bhargav Dindukurthi, Ashish Goel, Alice Siu

    Abstract: Deliberation involves participants exchanging knowledge, arguments, and perspectives and has been shown to be effective at addressing polarization. The Stanford Online Deliberation Platform facilitates large-scale deliberations. It enables video-based online discussions on a structured agenda for small groups without requiring human moderators. This paper's data comes from various deliberation eve… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    ACM Class: I.2.1; J.5; H.5.3

  32. arXiv:2408.02949  [pdf, other

    cs.RO cs.AI eess.SY

    Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

    Authors: Yifan Zhu, Pranay Thangeda, Erica L Tevere, Ashish Goel, Erik Kramer, Hari D Nayar, Melkior Ornik, Kris Hauser

    Abstract: Autonomous lander missions on extraterrestrial bodies need to sample granular materials while coping with domain shifts, even when sampling strategies are extensively tuned on Earth. To tackle this challenge, this paper studies the few-shot scooping problem and proposes a vision-based adaptive scooping strategy that uses the deep kernel Gaussian process method trained with a novel meta-training st… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.02893

  33. arXiv:2407.19393  [pdf, other

    cs.AI

    Integrating Cognitive AI with Generative Models for Enhanced Question Answering in Skill-based Learning

    Authors: Rochan H. Madhusudhana, Rahul K. Dass, Jeanette Luu, Ashok K. Goel

    Abstract: In online learning, the ability to provide quick and accurate feedback to learners is crucial. In skill-based learning, learners need to understand the underlying concepts and mechanisms of a skill to be able to apply it effectively. While videos are a common tool in online learning, they cannot comprehend or assess the skills being taught. Additionally, while Generative AI methods are effective i… ▽ More

    Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: 9 pages, 6 figures, 1 table

  34. arXiv:2407.18335  [pdf, other

    cs.AI

    Combining Cognitive and Generative AI for Self-explanation in Interactive AI Agents

    Authors: Shalini Sushri, Rahul Dass, Rhea Basappa, Hong Lu, Ashok Goel

    Abstract: The Virtual Experimental Research Assistant (VERA) is an inquiry-based learning environment that empowers a learner to build conceptual models of complex ecological systems and experiment with agent-based simulations of the models. This study investigates the convergence of cognitive AI and generative AI for self-explanation in interactive AI agents such as VERA. From a cognitive AI viewpoint, we… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 10 pages, 2 figures, 2 tables, 1 appendix, HEXED Workshop @EDM July 2024

  35. arXiv:2407.17429  [pdf, other

    cs.CY cs.AI

    How Do Students Interact with an LLM-powered Virtual Teaching Assistant in Different Educational Settings?

    Authors: Pratyusha Maiti, Ashok K. Goel

    Abstract: Jill Watson, a virtual teaching assistant powered by LLMs, answers student questions and engages them in extended conversations on courseware provided by the instructors. In this paper, we analyze student interactions with Jill across multiple courses and colleges, focusing on the types and complexity of student questions based on Bloom's Revised Taxonomy and tool usage patterns. We find that, by… ▽ More

    Submitted 25 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted in the Seventeenth International Conference on Educational Data Mining (EDM) Workshop: Leveraging LLMs for Next Generation Educational Technologies, July 2024

  36. arXiv:2407.14641  [pdf, other

    cs.DS cs.CR

    Differential Privacy with Multiple Selections

    Authors: Ashish Goel, Zhihao Jiang, Aleksandra Korolova, Kamesh Munagala, Sahasrajit Sarmasarkar

    Abstract: We consider the setting where a user with sensitive features wishes to obtain a recommendation from a server in a differentially private fashion. We propose a ``multi-selection'' architecture where the server can send back multiple recommendations and the user chooses one from these that matches best with their private features. When the user feature is one-dimensional -- on an infinite line -- an… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.01757  [pdf, other

    astro-ph.EP astro-ph.IM cs.MA physics.ao-ph physics.geo-ph

    Distributed Instruments for Planetary Surface Science: Scientific Opportunities and Technology Feasibility

    Authors: Federico Rossi, Robert C. Anderson, Saptarshi Bandyopadhyay, Erik Brandon, Ashish Goel, Joshua Vander Hook, Michael Mischna, Michaela Villarreal, Mark Wronkiewicz

    Abstract: In this paper, we assess the scientific promise and technology feasibility of distributed instruments for planetary science. A distributed instrument is an instrument designed to collect spatially and temporally correlated data from multiple networked, geographically distributed point sensors. Distributed instruments are ubiquitous in Earth science, where they are routinely employed for weather an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  38. arXiv:2406.08931  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

    Authors: Arnav Goel, Medha Hira, Anubha Gupta

    Abstract: Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusi… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, Accepted to INTERSPEECH 2024. The first two authors contributed equally

  39. arXiv:2406.00022  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning

    Authors: Arnav Goel, Medha Hira, Anubha Gupta

    Abstract: The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Me… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

    Comments: 7 pages, Accepted to ICLR 2024 - Tiny Track

  40. arXiv:2406.00021  [pdf, other

    cs.CL cs.SD eess.AS

    CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

    Authors: Medha Hira, Arnav Goel, Anubha Gupta

    Abstract: This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

    Comments: 8 pages, Accepted at ICLR 2024 - Tiny Track

  41. arXiv:2405.20917  [pdf, other

    cs.CL cs.LG cs.LO

    Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba

    Authors: İlker Işık, Ebru Aydin Gol, Ramazan Gokberk Cinbis

    Abstract: Temporal logic is a framework for representing and reasoning about propositions that evolve over time. It is commonly used for specifying requirements in various domains, including hardware and software systems, as well as robotics. Specification mining or formula generation involves extracting temporal logic formulae from system traces and has numerous applications, such as detecting bugs and imp… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 20 pages, 15 figures

  42. arXiv:2405.19631  [pdf, other

    cs.AI

    Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

    Authors: Akul Goel, Surya Narayanan Hari, Belinda Waltman, Matt Thomson

    Abstract: Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  43. arXiv:2405.16355  [pdf, other

    cs.HC cs.AI

    Navigating AI Fallibility: Examining People's Reactions and Perceptions of AI after Encountering Personality Misrepresentations

    Authors: Qiaosi Wang, Chidimma L. Anyi, Vedant Das Swain, Ashok K. Goel

    Abstract: Many hyper-personalized AI systems profile people's characteristics (e.g., personality traits) to provide personalized recommendations. These systems are increasingly used to facilitate interactions among people, such as providing teammate recommendations. Despite improved accuracy, such systems are not immune to errors when making inferences about people's most personal traits. These errors manif… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 37 pages, 11 figures

    ACM Class: I.2.0

  44. arXiv:2405.11775  [pdf, other

    cs.CL cs.LG

    Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques

    Authors: Siva Rajesh Kasa, Aniket Goel, Karan Gupta, Sumegh Roychowdhury, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy

    Abstract: Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Findings of ACL 2024

  45. arXiv:2405.11070  [pdf, other

    cs.AI cs.CL cs.LG

    Jill Watson: A Virtual Teaching Assistant powered by ChatGPT

    Authors: Karan Taneja, Pratyusha Maiti, Sandeep Kakar, Pranav Guruprasad, Sanjeev Rao, Ashok K. Goel

    Abstract: Conversational AI agents often require extensive datasets for training that are not publicly released, are limited to social chit-chat or handling a specific domain, and may not be easily extended to accommodate the latest advances in AI technologies. This paper introduces Jill Watson, a conversational Virtual Teaching Assistant (VTA) leveraging the capabilities of ChatGPT. Jill Watson based on Ch… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  46. arXiv:2405.05572  [pdf, other

    cs.CL cs.AI

    From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences

    Authors: Prashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Ponnurangam Kumaraguru, Manish Shrivastava

    Abstract: Current computational approaches for analysing or generating code-mixed sentences do not explicitly model ``naturalness'' or ``acceptability'' of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-contro… ▽ More

    Submitted 5 May, 2025; v1 submitted 9 May, 2024; originally announced May 2024.

  47. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  48. arXiv:2404.07616  [pdf, other

    cs.CL cs.SD eess.AS

    Audio Dialogues: Dialogues dataset for audio and music understanding

    Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

    Abstract: Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Demo website: https://audiodialogues.github.io/

  49. arXiv:2403.03029  [pdf, other

    cs.CL

    Socratic Reasoning Improves Positive Text Rewriting

    Authors: Anmol Goel, Nico Daheim, Christian Montag, Iryna Gurevych

    Abstract: Reframing a negative into a positive thought is at the crux of several cognitive approaches to mental health and psychotherapy that could be made more accessible by large language model-based solutions. Such reframing is typically non-trivial and requires multiple rationalization steps to uncover the underlying issue of a negative thought and transform it to be more positive. However, this rationa… ▽ More

    Submitted 20 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

  50. arXiv:2403.00826  [pdf, other

    cs.CL cs.CR cs.LG

    LLMGuard: Guarding Against Unsafe LLM Behavior

    Authors: Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

    Abstract: Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content aga… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: accepted in demonstration track of AAAI-24