这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–48 of 48 results for author: Gulati, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.01854  [pdf, ps, other

    cs.CL

    Tool-to-Agent Retrieval: Bridging Tools and Agents for Scalable LLM Multi-Agent Systems

    Authors: Elias Lumer, Faheem Nizar, Anmol Gulati, Pradeep Honaganahalli Basavaraju, Vamse Kumar Subbiah

    Abstract: Recent advances in LLM Multi-Agent Systems enable scalable orchestration of sub-agents, each coordinating hundreds or thousands of tools or Model Context Protocol (MCP) servers. However, existing retrieval methods typically match queries against coarse agent-level descriptions before routing, which obscures fine-grained tool functionality and often results in suboptimal agent selection. We introdu… ▽ More

    Submitted 4 November, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

  2. arXiv:2510.15955  [pdf, ps, other

    cs.LG cs.AI

    How Good Are LLMs at Processing Tool Outputs?

    Authors: Kiran Kate, Yara Rizk, Poulami Ghosh, Ashu Gulati, Tathagata Chakraborti, Zidane Wright, Mayank Agarwal

    Abstract: Most realistic task automation problems require large language models (LLMs) to call tools, which often return complex JSON responses. These responses must be further processed to derive the information necessary for task completion. The ability of LLMs to do so is under-studied. In this paper, we study the tool response processing task and LLMs' abilities to process structured (JSON) responses. W… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  3. Gluing Random Unitaries with Inverses and Applications to Strong Pseudorandom Unitaries

    Authors: Prabhanjan Ananth, John Bostanci, Aditya Gulati, Yao-Ting Lin

    Abstract: Gluing theorem for random unitaries [Schuster, Haferkamp, Huang, QIP 2025] have found numerous applications, including designing low depth random unitaries [Schuster, Haferkamp, Huang, QIP 2025], random unitaries in ${\sf QAC0}$ [Foxman, Parham, Vasconcelos, Yuen'25] and generically shortening the key length of pseudorandom unitaries [Ananth, Bostanci, Gulati, Lin EUROCRYPT'25]. We present an alte… ▽ More

    Submitted 5 October, 2025; originally announced October 2025.

    Comments: 55 pages. A preliminary version, merging this paper and arXiv:2509.24432, appears in the proceedings of the 45th Annual International Cryptology Conference (CRYPTO 2025) under the title "Pseudorandom Unitaries in the Haar Random Oracle Model". This is Part II of the full version

    Journal ref: Advances in Cryptology, CRYPTO 2025 Proceedings, Part II, Lecture Notes in Computer Science, volume 16001, pages 301-333

  4. arXiv:2509.24484  [pdf, ps, other

    quant-ph cs.CR

    On the Limitations of Pseudorandom Unitaries

    Authors: Prabhanjan Ananth, Aditya Gulati, Yao-Ting Lin

    Abstract: Pseudorandom unitaries (PRUs), one of the key quantum pseudorandom notions, are efficiently computable unitaries that are computationally indistinguishable from Haar random unitaries. While there is evidence to believe that PRUs are weaker than one-way functions, so far its relationship with other quantum cryptographic primitives (that are plausibly weaker than one-way functions) has not been full… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 48 pages, 4 figures. To appear in the proceedings of the 23rd Theory of Cryptography Conference (TCC 2025)

  5. Pseudorandom Unitaries in the Haar Random Oracle Model

    Authors: Prabhanjan Ananth, John Bostanci, Aditya Gulati, Yao-Ting Lin

    Abstract: The quantum Haar random oracle model is an idealized model where every party has access to a single Haar random unitary and its inverse. We construct strong pseudorandom unitaries in the quantum Haar random oracle model. This strictly improves upon prior works who either only prove the existence of pseudorandom unitaries in the inverseless quantum Haar random oracle model [Ananth, Bostanci, Gulati… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 66 pages

    Journal ref: Advances in Cryptology, CRYPTO 2025 Proceedings, Part II, Lecture Notes in Computer Science, volume 16001, pages 301-333

  6. arXiv:2509.23579  [pdf, ps, other

    cs.CL

    Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks

    Authors: Kevin Frank, Anmol Gulati, Elias Lumer, Sindy Campagna, Vamse Kumar Subbiah

    Abstract: Enterprise teams rely on the Jira Query Language (JQL) to retrieve and filter issues from Jira. Yet, to our knowledge, there is no open, real-world, execution-based benchmark for mapping natural language queries to JQL. We introduce Jackal, a novel, large-scale text-to-JQL benchmark comprising 100,000 natural language (NL) requests paired with validated JQL queries and execution-based results on a… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 17 pages

  7. arXiv:2508.08292  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.LO cs.NE

    Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs

    Authors: Aryan Gulati, Brando Miranda, Eric Chen, Emily Xia, Kai Fronsdal, Bruno Dumont, Elyas Obbad, Sanmi Koyejo

    Abstract: Current mathematical reasoning benchmarks for large language models (LLMs) are approaching saturation, with some achieving > 90% accuracy, and are increasingly compromised by training-set contamination. We introduce Putnam-AXIOM, a benchmark of 522 university-level competition problems drawn from the prestigious William Lowell Putnam Mathematical Competition, and Putnam-AXIOM Variation, an unseen… ▽ More

    Submitted 26 August, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

    Comments: 27 pages total (10-page main paper + 17-page appendix), 12 figures, 6 tables. Submitted to ICML 2025 (under review)

    MSC Class: 68T20; 68T05; 68Q32 ACM Class: F.2.2; I.2.3; I.2.6; I.2.8

    Journal ref: ICML 2025

  8. arXiv:2507.21428  [pdf, ps, other

    cs.CL

    MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Calling in LLM Agent Multi-Turn Conversations

    Authors: Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, James A. Burke

    Abstract: Large Language Model (LLM) agents have shown significant autonomous capabilities in dynamically searching and incorporating relevant tools or Model Context Protocol (MCP) servers for individual queries. However, fixed context windows limit effectiveness in multi-turn interactions requiring repeated, independent tool usage. We introduce MemTool, a short-term memory framework enabling LLM agents to… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: 23 Pages, 20 Figures

  9. arXiv:2507.16860  [pdf, ps, other

    cs.SI cs.CV cs.CY

    Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs

    Authors: Apoorva Gulati, Rajesh Kumar, Vinti Agarwal, Aditya Sharma

    Abstract: Large Language Models (LLMs) have made it easier to create realistic fake profiles on platforms like LinkedIn. This poses a significant risk for text-based fake profile detectors. In this study, we evaluate the robustness of existing detectors against LLM-generated profiles. While highly effective in detecting manually created fake profiles (False Accept Rate: 6-7%), the existing detectors fail to… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 10 pages, 3 figures, 1 table, accepted for publication at ASONAM 2025. https://sites.google.com/view/weaklinksinlinkedin/home

  10. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  11. arXiv:2506.11025  [pdf, ps, other

    cs.LG cs.AI cs.CV

    When Algorithms Play Favorites: Lookism in the Generation and Perception of Faces

    Authors: Miriam Doh, Aditya Gulati, Matei Mancas, Nuria Oliver

    Abstract: This paper examines how synthetically generated faces and machine learning-based gender classification algorithms are affected by algorithmic lookism, the preferential treatment based on appearance. In experiments with 13,200 synthetically generated faces, we find that: (1) text-to-image (T2I) systems tend to associate facial attractiveness to unrelated positive traits like intelligence and trustw… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

    Comments: Accepted as an extended abstract at the Fourth European Workshop on Algorithmic Fairness (EWAF) (URL: https://2025.ewaf.org/home)

    Journal ref: Proceedings of Machine Learning Research 294 (2025) 474-480

  12. arXiv:2505.06416  [pdf, ps, other

    cs.CL

    ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents

    Authors: Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, James A. Burke

    Abstract: Recent advancements in Large Language Models (LLMs) and the introduction of the Model Context Protocol (MCP) have significantly expanded LLM agents' capability to interact dynamically with external tools and APIs. However, existing tool selection frameworks do not integrate MCP servers, instead relying heavily on error-prone manual updates to monolithic local tool repositories, leading to duplicat… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 17 pages

  13. arXiv:2505.02009  [pdf, ps, other

    cs.CL cs.LG

    Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs

    Authors: Sai Krishna Mendu, Harish Yenala, Aditi Gulati, Shanu Kumar, Parag Agrawal

    Abstract: Large language models (LLMs) have become integral to various real-world applications, leveraging massive, web-sourced datasets like Common Crawl, C4, and FineWeb for pretraining. While these datasets provide linguistic data essential for high-quality natural language generation, they often contain harmful content, such as hate speech, misinformation, and biased narratives. Training LLMs on such un… ▽ More

    Submitted 12 August, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures. Accepted at the International Joint Conferences on Artificial Intelligence IJCAI 2025 (main track)

  14. arXiv:2504.16104  [pdf, ps, other

    cs.CY

    Beauty and the Bias: Exploring the Impact of Attractiveness on Multimodal Large Language Models

    Authors: Aditya Gulati, Moreno D'Incà, Nicu Sebe, Bruno Lepri, Nuria Oliver

    Abstract: Physical attractiveness matters. It has been shown to influence human perception and decision-making, often leading to biased judgments that favor those deemed attractive in what is referred to as the "attractiveness halo effect". While extensively studied in human judgments in a broad set of domains, including hiring, judicial sentencing or credit granting, the role that attractiveness plays in t… ▽ More

    Submitted 11 August, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 39 pages, 4 figures, 33 tables; Accepted for publication at the Eighth AAAI/ACM Conference on AI, Ethics and Society (AIES 2025) (https://www.aies-conference.com/2025/)

  15. arXiv:2501.12862  [pdf, other

    cs.SE cs.AI cs.LG

    Mutation-Guided LLM-based Test Generation at Meta

    Authors: Christopher Foster, Abhishek Gulati, Mark Harman, Inna Harper, Ke Mao, Jillian Ritchey, Hervé Robert, Shubho Sengupta

    Abstract: This paper describes Meta's ACH system for mutation-guided LLM-based test generation. ACH generates relatively few mutants (aka simulated faults), compared to traditional mutation testing. Instead, it focuses on generating currently undetected faults that are specific to an issue of concern. From these currently uncaught faults, ACH generates tests that can catch them, thereby `killing' the mutant… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: Submitted to FSE 2025 Industry Track

  16. arXiv:2411.12719  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation

    Authors: Praveen Srinivasa Varadhan, Amogh Gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M. Khapra

    Abstract: Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS's pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference sp… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: Accepted in TMLR

  17. arXiv:2411.04512  [pdf, other

    cs.LG

    Normalized Space Alignment: A Versatile Metric for Representation Analysis

    Authors: Danish Ebadulla, Aditya Gulati, Ambuj Singh

    Abstract: We introduce a manifold analysis technique for neural network representations. Normalized Space Alignment (NSA) compares pairwise distances between two point clouds derived from the same source and having the same size, while potentially possessing differing dimensionalities. NSA can act as both an analytical tool and a differentiable loss function, providing a robust means of comparing and aligni… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Under Review

  18. arXiv:2410.19320  [pdf, ps, other

    quant-ph cs.CC cs.CR

    Pseudorandomness in the (Inverseless) Haar Random Oracle Model

    Authors: Prabhanjan Ananth, John Bostanci, Aditya Gulati, Yao-Ting Lin

    Abstract: We study the (in)feasibility of quantum pseudorandom notions in a quantum analog of the random oracle model, where all the parties, including the adversary, have oracle access to the same Haar random unitary. In this model, we show the following: - (Unbounded-query secure) pseudorandom unitaries (PRU) exist. Moreover, the PRU construction makes two calls to the Haar oracle. - We consider const… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: 51 pages; 4 figures

  19. arXiv:2410.05495  [pdf, other

    cs.CL

    Self-rationalization improves LLM as a fine-grained judge

    Authors: Prapti Trivedi, Aditya Gulati, Oliver Molenschot, Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Keith Stevens, Tanveesh Singh Chaudhery, Jahnavi Jambholkar, James Zou, Nazneen Rajani

    Abstract: LLM-as-a-judge models have been used for evaluating both human and AI generated content, specifically by providing scores and rationales. Rationales, in addition to increasing transparency, help models learn to calibrate its judgments. Enhancing a model's rationale can therefore improve its calibration abilities and ultimately the ability to score content. We introduce Self-Rationalization, an ite… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  20. arXiv:2408.11448  [pdf, other

    cs.CV cs.AI cs.CY

    Lookism: The overlooked bias in computer vision

    Authors: Aditya Gulati, Bruno Lepri, Nuria Oliver

    Abstract: In recent years, there have been significant advancements in computer vision which have led to the widespread deployment of image recognition and generation systems in socially relevant applications, from hiring to security screening. However, the prevalence of biases within these systems has raised significant ethical and social concerns. The most extensively studied biases in this context are re… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Paper accepted at the ECCV 2024 workshop named "Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED)", https://failed-workshop-eccv-2024.github.io/

    ACM Class: I.2.0; I.4.0; K.4.2

  21. arXiv:2407.13141  [pdf, other

    cs.LG

    Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

    Authors: Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega

    Abstract: As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regress… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  22. What is Beautiful is Still Good: The Attractiveness Halo Effect in the era of Beauty Filters

    Authors: Aditya Gulati, Marina Martinez-Garcia, Daniel Fernandez, Miguel Angel Lozano, Bruno Lepri, Nuria Oliver

    Abstract: The impact of cognitive biases on decision-making in the digital world remains under-explored despite its well-documented effects in physical contexts. This study addresses this gap by investigating the attractiveness halo effect using AI-based beauty filters. We conduct a large-scale online user study involving 2,748 participants who rated facial images from a diverse set of 462 distinct individu… ▽ More

    Submitted 28 November, 2024; v1 submitted 29 May, 2024; originally announced July 2024.

    Comments: 40 pages, 15 figures, 13 tables; Version 2 incorporates feedback from the reviews and the format has been updated to match the requirements of the Royal Society Open Science

    Journal ref: R. Soc. Open Sci. 11: 240882 (2024)

  23. arXiv:2407.07908  [pdf, ps, other

    quant-ph cs.CR

    Cryptography in the Common Haar State Model: Feasibility Results and Separations

    Authors: Prabhanjan Ananth, Aditya Gulati, Yao-Ting Lin

    Abstract: Common random string model is a popular model in classical cryptography. We study a quantum analogue of this model called the common Haar state (CHS) model. In this model, every party participating in the cryptographic system receives many copies of one or more i.i.d Haar random states. We study feasibility and limitations of cryptographic primitives in this model and its variants: - We present… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  24. arXiv:2406.06555  [pdf, other

    cs.LG cs.AI cs.CL cs.PL

    An Evaluation Benchmark for Autoformalization in Lean4

    Authors: Aryan Gulati, Devanshu Ladsaria, Shubhra Mishra, Jasdeep Sidhu, Brando Miranda

    Abstract: Large Language Models (LLMs) hold the potential to revolutionize autoformalization. The introduction of Lean4, a mathematical programming language, presents an unprecedented opportunity to rigorously assess the autoformalization capabilities of LLMs. This paper introduces a novel evaluation benchmark designed for Lean4, applying it to test the abilities of state-of-the-art LLMs, including GPT-3.5,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: To appear at ICLR 2024 as part of the Tiny Papers track

  25. arXiv:2404.05227  [pdf, ps, other

    quant-ph cs.CR

    A Note on the Common Haar State Model

    Authors: Prabhanjan Ananth, Aditya Gulati, Yao-Ting Lin

    Abstract: Common random string model is a popular model in classical cryptography with many constructions proposed in this model. We study a quantum analogue of this model called the common Haar state model, which was also studied in an independent work by Chen, Coladangelo and Sattath (arXiv 2024). In this model, every party in the cryptographic system receives many copies of one or more i.i.d Haar states.… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages, 1 figure. arXiv admin note: text overlap with arXiv:2311.18566 by other authors

  26. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2402.18032  [pdf, other

    cs.CV

    Human Shape and Clothing Estimation

    Authors: Aayush Gupta, Aditya Gulati, Himanshu, Lakshya LNU

    Abstract: Human shape and clothing estimation has gained significant prominence in various domains, including online shopping, fashion retail, augmented reality (AR), virtual reality (VR), and gaming. The visual representation of human shape and clothing has become a focal point for computer vision researchers in recent years. This paper presents a comprehensive survey of the major works in the field, focus… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  28. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

  29. arXiv:2311.02901  [pdf, other

    quant-ph cs.CC cs.CR

    Pseudorandom Isometries

    Authors: Prabhanjan Ananth, Aditya Gulati, Fatih Kaleoglu, Yao-Ting Lin

    Abstract: We introduce a new notion called ${\cal Q}$-secure pseudorandom isometries (PRI). A pseudorandom isometry is an efficient quantum circuit that maps an $n$-qubit state to an $(n+m)$-qubit state in an isometric manner. In terms of security, we require that the output of a $q$-fold PRI on $ρ$, for $ ρ\in {\cal Q}$, for any polynomial $q$, should be computationally indistinguishable from the output of… ▽ More

    Submitted 10 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  30. arXiv:2304.06277  [pdf

    cs.LG cs.AI cs.CV

    Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

    Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga

    Abstract: Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based fram… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 13 pages, 20 figures, draft work previously published as a medium story

  31. arXiv:2304.00171  [pdf, other

    cs.CL cs.SD eess.AS

    Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

    Authors: Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

    Abstract: Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to impr… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  32. arXiv:2211.01444  [pdf, other

    quant-ph cs.CC cs.CR

    Pseudorandom (Function-Like) Quantum State Generators: New Definitions and Applications

    Authors: Prabhanjan Ananth, Aditya Gulati, Luowen Qian, Henry Yuen

    Abstract: Pseudorandom quantum states (PRS) are efficiently constructible states that are computationally indistinguishable from being Haar-random, and have recently found cryptographic applications. We explore new definitions, new properties and applications of pseudorandom states, and present the following contributions: 1. New Definitions: We study variants of pseudorandom function-like state (PRFS) ge… ▽ More

    Submitted 9 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  33. arXiv:2210.01122  [pdf, other

    cs.HC cs.AI

    BIASeD: Bringing Irrationality into Automated System Design

    Authors: Aditya Gulati, Miguel Angel Lozano, Bruno Lepri, Nuria Oliver

    Abstract: Human perception, memory and decision-making are impacted by tens of cognitive biases and heuristics that influence our actions and decisions. Despite the pervasiveness of such biases, they are generally not leveraged by today's Artificial Intelligence (AI) systems that model human behavior and interact with humans. In this theoretical paper, we claim that the future of human-machine collaboration… ▽ More

    Submitted 1 December, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 14 pages, 1 figure; Accepted for presentation at the AAAI Fall Symposium 2022 on Thinking Fast and Slow and Other Cognitive Theories in AI. Corrected typos; v3: Updated figure 1, added table 4

  34. arXiv:2207.02107  [pdf, other

    cs.MA

    EasyABM: a lightweight and easy to use heterogeneous agent-based modelling tool written in Julia

    Authors: Renu Solanki, Monisha Khanna, Shailly Anand, Anita Gulati, Prateek Kumar, Munendra Kumar, Dushyant Kumar

    Abstract: Agent based modelling is a computational approach that aims to understand the behaviour of complex systems through simplified interactions of programmable objects in computer memory called agents. Agent based models (ABMs) are predominantly used in fields of biology, ecology, social sciences and economics where the systems of interest often consist of several interacting entities. In this work, we… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: 18 pages, 7 figures

  35. arXiv:2110.10329  [pdf, other

    cs.CL cs.LG

    SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

    Authors: Ankur Bapna, Yu-an Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang

    Abstract: Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-trai… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  36. arXiv:2109.13226  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Authors: Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang , et al. (1 additional authors not shown)

    Abstract: We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled da… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography updated

  37. arXiv:2104.14830  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling End-to-End Models for Large-Scale Multilingual ASR

    Authors: Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

    Abstract: Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity.… ▽ More

    Submitted 11 September, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: ASRU 2021

  38. arXiv:2101.06914  [pdf, other

    cs.CY cs.SI

    Capitol (Pat)riots: A comparative study of Twitter and Parler

    Authors: Hitkul, Avinash Prabhu, Dipanwita Guhathakurta, Jivitesh jain, Mallika Subramanian, Manvith Reddy, Shradha Sehgal, Tanvi Karandikar, Amogh Gulati, Udit Arora, Rajiv Ratn Shah, Ponnurangam Kumaraguru

    Abstract: On 6 January 2021, a mob of right-wing conservatives stormed the USA Capitol Hill interrupting the session of congress certifying 2020 Presidential election results. Immediately after the start of the event, posts related to the riots started to trend on social media. A social media platform which stood out was a free speech endorsing social media platform Parler; it is being claimed as the platfo… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  39. arXiv:2011.10978  [pdf, ps, other

    math.NT cs.CC

    On algorithms to find p-ordering

    Authors: Aditya Gulati, Sayak Chakrabarti, Rajat Mittal

    Abstract: The concept of p-ordering for a prime p was introduced by Manjul Bhargava (in his PhD thesis) to develop a generalized factorial function over an arbitrary subset of integers. This notion of p-ordering provides a representation of polynomials modulo prime powers, and has been used to prove properties of roots sets modulo prime powers. We focus on the complexity of finding a p-ordering given a prim… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: 26 pages

  40. arXiv:2011.10798  [pdf, other

    eess.AS cs.SD

    A Better and Faster End-to-End Model for Streaming ASR

    Authors: Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

    Abstract: End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this i… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: Accepted in ICASSP 2021

  41. Interleaving Fast and Slow Decision Making

    Authors: Aditya Gulati, Sarthak Soni, Shrisha Rao

    Abstract: The "Thinking, Fast and Slow" paradigm of Kahneman proposes that we use two different styles of thinking -- a fast and intuitive System 1 for certain tasks, along with a slower but more analytical System 2 for others. While the idea of using this two-system style of thinking is gaining popularity in AI and robotics, our work considers how to interleave the two styles of decision-making, i.e., how… ▽ More

    Submitted 26 March, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 7 pages, 11 figures; typos corrected, references added

  42. arXiv:2010.11148  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

    Authors: Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

    Abstract: Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties and Constrained Alignments penalize emission delay by manipulating per-token or per-frame probability prediction i… ▽ More

    Submitted 3 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in ICASSP 2021

  43. arXiv:2010.06030  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

    Authors: Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang

    Abstract: Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses. In this work, we propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech reco… ▽ More

    Submitted 27 January, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted in ICLR 2021

  44. arXiv:2009.05566  [pdf, other

    cs.CR cs.LG

    Accelerating 2PC-based ML with Limited Trusted Hardware

    Authors: Muqsit Nawaz, Aditya Gulati, Kunlong Liu, Vishwajeet Agrawal, Prabhanjan Ananth, Trinabh Gupta

    Abstract: This paper describes the design, implementation, and evaluation of Otak, a system that allows two non-colluding cloud providers to run machine learning (ML) inference without knowing the inputs to inference. Prior work for this problem mostly relies on advanced cryptography such as two-party secure computation (2PC) protocols that provide rigorous guarantees but suffer from high resource overhead.… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 19 pages

  45. arXiv:2005.10627  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Dynamic Sparsity Neural Networks for Automatic Speech Recognition

    Authors: Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang

    Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints. However, multiple models with different sparsity levels usually need to be separately trained and deployed to heterogeneous target hardware with different resource specifications and for applications that h… ▽ More

    Submitted 8 February, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  46. arXiv:2005.08100  [pdf, other

    eess.AS cs.LG cs.SD

    Conformer: Convolution-augmented Transformer for Speech Recognition

    Authors: Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

    Abstract: Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution ne… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  47. arXiv:2005.05513  [pdf, other

    cs.CL cs.CY cs.SI

    Psychometric Analysis and Coupling of Emotions Between State Bulletins and Twitter in India during COVID-19 Infodemic

    Authors: Baani Leen Kaur Jolly, Palash Aggrawal, Amogh Gulati, Amarjit Singh Sethi, Ponnurangam Kumaraguru, Tavpritesh Sethi

    Abstract: COVID-19 infodemic has been spreading faster than the pandemic itself. The misinformation riding upon the infodemic wave poses a major threat to people's health and governance systems. Since social media is the largest source of information, managing the infodemic not only requires mitigating of misinformation but also an early understanding of psychological patterns resulting from it. During the… ▽ More

    Submitted 13 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  48. arXiv:2005.03191  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

    Authors: Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu

    Abstract: Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder that incorporates global context information into… ▽ More

    Submitted 15 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020