这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 158 results for author: Agarwal, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.16217  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Towards Compute-Optimal Many-Shot In-Context Learning

    Authors: Shahriar Golchin, Yanfei Chen, Rujun Han, Manan Gandhi, Tianli Yu, Swaroop Mishra, Mihai Surdeanu, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

    Abstract: Long-context large language models (LLMs) are able to process inputs containing up to several million tokens. In the scope of in-context learning (ICL), this translates into using hundreds/thousands of demonstrations in the input prompt, enabling many-shot ICL. In practice, a fixed set of demonstrations is often selected at random in many-shot settings due to (1) high inference costs, (2) the bene… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: Final version; accepted at COLM 2025

  2. arXiv:2507.07229  [pdf, ps, other

    cs.CL

    SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

    Authors: Krithika Ramesh, Daniel Smolyak, Zihao Zhao, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, Anjalie Field

    Abstract: We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerous applications, such as reducing the risks of privacy violations in the development and deployment of AI systems in high-stakes domains. Realizing this potential, however, requires principled consiste… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  3. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  4. arXiv:2506.20804  [pdf, ps, other

    cs.RO

    Online Planning for Cooperative Air-Ground Robot Systems with Unknown Fuel Requirements

    Authors: Ritvik Agarwal, Behnoushsadat Hatami, Alvika Gautam, Parikshit Maini

    Abstract: We consider an online variant of the fuel-constrained UAV routing problem with a ground-based mobile refueling station (FCURP-MRS), where targets incur unknown fuel costs. We develop a two-phase solution: an offline heuristic-based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real-time fuel consumption during… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Submitted to RSS (MRS Workshop)

  5. FEWSim: A Visual Analytic Framework for Exploring the Nexus of Food-Energy-Water Simulations

    Authors: Fan Lei, David A. Sampson, Jiayi Hong, Yuxin Ma, Giuseppe Mascaro, Dave White, Rimjhim Agarwal, Ross Maciejewski

    Abstract: The interdependencies of food, energy, and water (FEW) systems create a nexus opportunity to explore the strengths and vulnerabilities of individual and cross-sector interactions within FEW systems. However, the variables quantifying nexus interactions are hard to observe, which hinders the cross-sector analysis. To overcome such challenges, we present FEWSim, a visual analytics framework designed… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Computer Graphics and Applications (CG&A)

  6. arXiv:2506.06798  [pdf, ps, other

    cs.RO

    SARAL-Bot: Autonomous Robot for Strawberry Plant Care

    Authors: Arif Ahmed, Ritvik Agarwal, Gaurav Srikar, Nathaniel Rose, Parikshit Maini

    Abstract: Strawberry farming demands intensive labor for monitoring and maintaining plant health. To address this, Team SARAL develops an autonomous robot for the 2024 ASABE Student Robotics Challenge, capable of navigation, unhealthy leaf detection, and removal. The system addresses labor shortages, reduces costs, and supports sustainable farming through vision-based plant assessment. This work demonstrate… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Awarded Best Written Report @ Robotics Design Challenge (Advanced), ASABE 2024

  7. arXiv:2506.02887  [pdf, ps, other

    cs.LG cs.DC

    Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review

    Authors: Mrinmay Sen, Shruti Aparna, Rohit Agarwal, Chalavadi Krishna Mohan

    Abstract: Federated Learning (FL) is a learning mechanism that falls under the distributed training umbrella, which collaboratively trains a shared global model without disclosing the raw data from different clients. This paper presents an extensive survey on the impact of partial client participation in federated learning. While much of the existing research focuses on addressing issues such as generalizat… ▽ More

    Submitted 6 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: 15 pages, 6 tables, comprehensive survey of federated learning with partial client participation

  8. arXiv:2505.23231  [pdf, ps, other

    cs.CY

    REDDIX-NET: A Novel Dataset and Benchmark for Moderating Online Explicit Services

    Authors: MSVPJ Sathvik, Manan Roy Choudhury, Rishita Agarwal, Sathwik Narkedimilli, Vivek Gupta

    Abstract: The rise of online platforms has enabled covert illicit activities, including online prostitution, to pose challenges for detection and regulation. In this study, we introduce REDDIX-NET, a novel benchmark dataset specifically designed for moderating online sexual services and going beyond traditional NSFW filters. The dataset is derived from thousands of web-scraped NSFW posts on Reddit and categ… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 29 pages, 15 figures

  9. arXiv:2505.09024  [pdf

    cs.AI cs.CL cs.LG

    Automated Meta Prompt Engineering for Alignment with the Theory of Mind

    Authors: Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay

    Abstract: We introduce a method of meta-prompting that jointly produces fluent text for complex tasks while optimizing the similarity of neural states between a human's mental expectation and a Large Language Model's (LLM) neural processing. A technique of agentic reinforcement learning is applied, in which an LLM as a Judge (LLMaaJ) teaches another LLM, through in-context learning, how to produce content b… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 9 pages, 6 figures, 3 tables

  10. arXiv:2505.04842  [pdf, other

    cs.LG cs.AI

    Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

    Authors: Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini

    Abstract: Prevalent reinforcement learning~(RL) methods for fine-tuning LLM reasoners, such as GRPO or Leave-one-out PPO, abandon the learned value function in favor of empirically estimated returns. This hinders test-time compute scaling that relies on using the value-function for verification. In this work, we propose RL$^V$ that augments any ``value-free'' RL method by jointly training the LLM as both a… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  11. arXiv:2505.00035  [pdf, other

    cs.CL cs.AI

    Linguistic Complexity and Socio-cultural Patterns in Hip-Hop Lyrics

    Authors: Aayam Bansal, Raghav Agarwal, Kaashvi Jain

    Abstract: This paper presents a comprehensive computational framework for analyzing linguistic complexity and socio-cultural trends in hip-hop lyrics. Using a dataset of 3,814 songs from 146 influential artists spanning four decades (1980-2020), we employ natural language processing techniques to quantify multiple dimensions of lyrical complexity. Our analysis reveals a 23.7% increase in vocabulary diversit… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 12 pages

  12. arXiv:2504.16828  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Process Reward Models That Think

    Authors: Muhammad Khalifa, Rishabh Agarwal, Lajanugen Logeswaran, Jaekyeom Kim, Hao Peng, Moontae Lee, Honglak Lee, Lu Wang

    Abstract: Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling. PRMs require step-level supervision, making them expensive to train. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier… ▽ More

    Submitted 23 June, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  13. arXiv:2504.02912  [pdf, other

    cs.CV cs.AI cs.ET cs.LG

    Haphazard Inputs as Images in Online Learning

    Authors: Rohit Agarwal, Aryan Dessai, Arif Ahmed Sekh, Krishna Agarwal, Alexander Horsch, Dilip K. Prasad

    Abstract: The field of varying feature space in online learning settings, also known as haphazard inputs, is very prominent nowadays due to its applicability in various fields. However, the current solutions to haphazard inputs are model-dependent and cannot benefit from the existing advanced deep-learning methods, which necessitate inputs of fixed dimensions. Therefore, we propose to transform the varying… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Accepted at IJCNN 2025

  14. Finding Interest Needle in Popularity Haystack: Improving Retrieval by Modeling Item Exposure

    Authors: Rahul Agarwal, Amit Jaspal, Saurabh Gupta, Omkar Vichare

    Abstract: Recommender systems operate in closed feedback loops, where user interactions reinforce popularity bias, leading to over-recommendation of already popular items while under-exposing niche or novel content. Existing bias mitigation methods, such as Inverse Propensity Scoring (IPS) and Off-Policy Correction (OPC), primarily operate at the ranking stage or during training, lacking explicit real-time… ▽ More

    Submitted 8 June, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

    Comments: 2 pages. UMAP '25: 33rd ACM Conference on User Modeling, Adaptation and Personalization, New York City, USA, June 2025

  15. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  16. arXiv:2502.05740  [pdf, other

    cs.HC cs.AI

    RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care

    Authors: Ziqi Yang, Yuxuan Lu, Jennifer Bagdasarian, Vedant Das Swain, Ritu Agarwal, Collin Campbell, Waddah Al-Refaire, Jehan El-Bayoumi, Guodong Gao, Dakuo Wang, Bingsheng Yao, Nawar Shara

    Abstract: Cancer surgery is a key treatment for gastrointestinal (GI) cancers, a group of cancers that account for more than 35% of cancer-related deaths worldwide, but postoperative complications are unpredictable and can be life-threatening. In this paper, we investigate how recent advancements in large language models (LLMs) can benefit remote patient monitoring (RPM) systems through clinical integration… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  17. arXiv:2501.18837  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

    Authors: Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin , et al. (18 additional authors not shown)

    Abstract: Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by promptin… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  18. arXiv:2412.16656  [pdf, other

    cs.CV cs.AI

    Generalizable Articulated Object Perception with Superpoints

    Authors: Qiaojun Yu, Ce Hao, Xibin Yuan, Li Zhang, Liu Liu, Yukang Huo, Rohit Agarwal, Cewu Lu

    Abstract: Manipulating articulated objects with robotic arms is challenging due to the complex kinematic structure, which requires precise part segmentation for efficient manipulation. In this work, we introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that ef… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  19. arXiv:2412.16335  [pdf, other

    cs.LG cs.CY

    Improving Equity in Health Modeling with GPT4-Turbo Generated Synthetic Data: A Comparative Study

    Authors: Daniel Smolyak, Arshana Welivita, Margrét V. Bjarnadóttir, Ritu Agarwal

    Abstract: Objective. Demographic groups are often represented at different rates in medical datasets. These differences can create bias in machine learning algorithms, with higher levels of performance for better-represented groups. One promising solution to this problem is to generate synthetic data to mitigate potential adverse effects of non-representative data sets. Methods. We build on recent advance… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: 26 pages, 4 figures

  20. arXiv:2412.15287  [pdf, other

    cs.CL cs.AI cs.LG

    Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

    Authors: Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust

    Abstract: Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  21. arXiv:2412.09727  [pdf

    q-bio.QM cs.AI cs.LG

    Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management

    Authors: Junjie Luo, Abhimanyu Kumbara, Mansur Shomali, Rui Han, Anand Iyer, Ritu Agarwal, Gordon Gao

    Abstract: While previous studies of AI in diabetes management focus on long-term risk, research on near-future glucose prediction remains limited but important as it enables timely diabetes self-management. Integrating AI with continuous glucose monitoring (CGM) holds promise for near-future glucose prediction. However, existing models have limitations in capturing patterns of blood glucose fluctuations and… ▽ More

    Submitted 17 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

  22. arXiv:2411.16096  [pdf, other

    cs.CV cs.AI cs.MM

    ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images

    Authors: Prithviraj Purushottam Naik, Rohit Agarwal

    Abstract: Multimodal search has revolutionized the fashion industry, providing a seamless and intuitive way for users to discover and explore fashion items. Based on their preferences, style, or specific attributes, users can search for products by combining text and image information. Text-to-image searches enable users to find visually similar items or describe products using natural language. This paper… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  23. arXiv:2411.09228  [pdf, ps, other

    cs.CR

    Injection Attacks Against End-to-End Encrypted Applications

    Authors: Andrés Fábrega, Carolina Ortega Pérez, Armin Namavari, Ben Nassi, Rachit Agarwal, Thomas Ristenpart

    Abstract: We explore an emerging threat model for end-to-end (E2E) encrypted applications: an adversary sends chosen messages to a target client, thereby "injecting" adversarial content into the application state. Such state is subsequently encrypted and synchronized to an adversarially-visible storage. By observing the lengths of the resulting cloud-stored ciphertexts, the attacker backs out confidential i… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: Published in IEEE Security and Privacy 2024

  24. arXiv:2411.00062  [pdf, other

    cs.CL cs.AI physics.data-an stat.ML

    Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

    Authors: Ziyu Ye, Rishabh Agarwal, Tianqi Liu, Rishabh Joshi, Sarmishta Velury, Quoc V. Le, Qijun Tan, Yuan Liu

    Abstract: Current reinforcement learning (RL) frameworks for large language models (LLM) post-training typically assume a fixed prompt distribution, which is sub-optimal and bottlenecks scalability. Prior works have explored prompt evolving, but are often limited to the supervised fine-tuning stage, and prompts are sampled and evolved uniformly without signals. This empirical work presents a paradigm shift:… ▽ More

    Submitted 9 April, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

    Comments: spotlight @ neurips language gamification workshop. updated the problem description and added new online RL experiments in this version

  25. arXiv:2410.18252  [pdf, other

    cs.LG cs.AI cs.CL

    Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

    Authors: Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville

    Abstract: The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling with a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is computationally inefficient. Inspired by classical deep RL literature, we propose separating generation and learning in RLHF. This enables asynchronous… ▽ More

    Submitted 26 April, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: accepted at ICLR 2025, code at https://github.com/mnoukhov/async_rlhf, integrated into the open-instruct library https://github.com/allenai/open-instruct

  26. arXiv:2410.17394  [pdf, other

    cs.LG cs.AI

    packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

    Authors: Rohit Agarwal, Karaka Prasanth Naidu, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

    Abstract: We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  27. arXiv:2410.11325  [pdf, other

    cs.CL cs.AI

    Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

    Authors: Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei Li, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

    Abstract: Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference o… ▽ More

    Submitted 27 April, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: ICLR2025

  28. arXiv:2410.08146  [pdf, other

    cs.LG cs.CL

    Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

    Authors: Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar

    Abstract: A promising approach for improving reasoning in large language models is to use process reward models (PRMs). PRMs provide feedback at each step of a multi-step reasoning trace, potentially improving credit assignment over outcome reward models (ORMs) that only provide feedback at the final step. However, collecting dense, per-step human labels is not scalable, and training PRMs from automatically… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  29. arXiv:2410.01748  [pdf, other

    cs.LG

    Not All LLM Reasoners Are Created Equal

    Authors: Arian Hosseini, Alessandro Sordoni, Daniel Toyama, Aaron Courville, Rishabh Agarwal

    Abstract: We study the depth of grade-school math (GSM) problem-solving capabilities of LLMs. To this end, we evaluate their performance on pairs of existing math word problems together so that the answer to the second problem depends on correctly answering the first problem. Our findings reveal a significant reasoning gap in most LLMs, that is performance difference between solving the compositional pairs… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  30. arXiv:2409.16291  [pdf, other

    cs.HC cs.AI

    Beyond Following: Mixing Active Initiative into Computational Creativity

    Authors: Zhiyu Lin, Upol Ehsan, Rohan Agarwal, Samihan Dani, Vidushi Vashishth, Mark Riedl

    Abstract: Generative Artificial Intelligence (AI) encounters limitations in efficiency and fairness within the realm of Procedural Content Generation (PCG) when human creators solely drive and bear responsibility for the generative process. Alternative setups, such as Mixed-Initiative Co-Creative (MI-CC) systems, exhibited their promise. Still, the potential of an active mixed initiative, where AI takes a r… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures

  31. arXiv:2409.12917  [pdf, other

    cs.LG

    Training Language Models to Self-Correct via Reinforcement Learning

    Authors: Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust

    Abstract: Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) app… ▽ More

    Submitted 4 October, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  32. arXiv:2409.10242  [pdf, other

    cs.LG cs.AI

    Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

    Authors: Himanshu Buckchash, Momojit Biswas, Rohit Agarwal, Dilip K. Prasad

    Abstract: Handling haphazard streaming data, such as data from edge devices, presents a challenging problem. Over time, the incoming data becomes inconsistent, with missing, faulty, or new inputs reappearing. Therefore, it requires models that are reliable. Recent methods to solve this problem depend on a hedging-based solution and require specialized elements like auxiliary dropouts, forked architectures,… ▽ More

    Submitted 30 December, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

  33. arXiv:2408.16737  [pdf, other

    cs.CL cs.AI

    Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

    Authors: Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi

    Abstract: Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) model versus a weaker but cheaper… ▽ More

    Submitted 7 October, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  34. arXiv:2408.15575  [pdf, other

    cs.IR

    Lyrically Speaking: Exploring the Link Between Lyrical Emotions, Themes and Depression Risk

    Authors: Pavani Chowdary, Bhavyajeet Singh, Rajat Agarwal, Vinoo Alluri

    Abstract: Lyrics play a crucial role in affecting and reinforcing emotional states by providing meaning and emotional connotations that interact with the acoustic properties of the music. Specific lyrical themes and emotions may intensify existing negative states in listeners and may lead to undesirable outcomes, especially in listeners with mood disorders such as depression. Hence, it is important for such… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR) 2024, San Francisco, United States

  35. arXiv:2408.15240  [pdf, other

    cs.LG

    Generative Verifiers: Reward Modeling as Next-Token Prediction

    Authors: Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

    Abstract: Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation ca… ▽ More

    Submitted 22 February, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: ICLR 2025

  36. arXiv:2408.14927  [pdf, other

    eess.IV cs.CV

    Automatic Detection of COVID-19 from Chest X-ray Images Using Deep Learning Model

    Authors: Alloy Das, Rohit Agarwal, Rituparna Singh, Arindam Chowdhury, Debashis Nandi

    Abstract: The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in AIP Conference Proceedings (Vol. 2424, No. 1)

  37. arXiv:2408.07054  [pdf, other

    cs.CR

    Exploiting Leakage in Password Managers via Injection Attacks

    Authors: Andrés Fábrega, Armin Namavari, Rachit Agarwal, Ben Nassi, Thomas Ristenpart

    Abstract: This work explores injection attacks against password managers. In this setting, the adversary (only) controls their own application client, which they use to "inject" chosen payloads to a victim's client via, for example, sharing credentials with them. The injections are interleaved with adversarial observations of some form of protected state (such as encrypted vault exports or the network traff… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Full version of the paper published in USENIX Security 2024

  38. arXiv:2408.04114  [pdf, ps, other

    cs.CL cs.LG

    Zero-shot Factual Consistency Evaluation Across Domains

    Authors: Raunak Agarwal

    Abstract: This work addresses the challenge of factual consistency in text generation systems. We unify the tasks of Natural Language Inference, Summarization Evaluation, Factuality Verification and Factual Consistency Evaluation to train models capable of evaluating the factual consistency of source-target pairs across diverse domains. We rigorously evaluate these against eight baselines on a comprehensive… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  39. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  40. arXiv:2407.10456  [pdf, other

    cs.CL

    Don't Throw Away Data: Better Sequence Knowledge Distillation

    Authors: Jun Wang, Eleftheria Briakou, Hamid Dadkhahi, Rishabh Agarwal, Colin Cherry, Trevor Cohn

    Abstract: A critical component in knowledge distillation is the means of coupling the teacher and student. The predominant sequence knowledge distillation method involves supervised learning of the student against teacher-decoded outputs, and is exemplified by the current state of the art, which incorporates minimum Bayes risk (MBR) decoding. In this paper we seek to integrate MBR more tightly in distillati… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  41. arXiv:2407.04622  [pdf, other

    cs.LG

    On scalable oversight with weak LLMs judging strong LLMs

    Authors: Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

    Abstract: Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI a… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 pages (53 including appendices). V2: minor correction to Figure 3; add Figure A.9 comparing open vs assigned consultancy; add a reference

  42. arXiv:2406.18537  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale

    Authors: Keenon Werling, Janelle Kaneda, Alan Tan, Rishi Agarwal, Six Skov, Tom Van Wouwe, Scott Uhlrich, Nicholas Bianco, Carmichael Ong, Antoine Falisse, Shardul Sapkota, Aidan Chandra, Joshua Carter, Ezio Preatoni, Benjamin Fregly, Jennifer Hicks, Scott Delp, C. Karen Liu

    Abstract: While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of m… ▽ More

    Submitted 16 May, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures, 4 tables

  43. arXiv:2406.15025  [pdf, other

    cs.LG

    SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning

    Authors: Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

    Abstract: An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 main pages, accepted to ICML2024

  44. arXiv:2404.14448  [pdf

    cs.SE

    Object-Oriented Architecture: A Software Engineering-Inspired Shape Grammar for Durands Plates

    Authors: Rohan Agarwal

    Abstract: Addressing the challenge of modular architectural design, this study presents a novel approach through the implementation of a shape grammar system using functional and object-oriented programming principles from computer science. The focus lies on the modular generation of plates in the style of French Neoclassical architect Jean-Nicolas-Louis Durand, known for his modular rule-based method to ar… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  45. arXiv:2404.11018  [pdf, other

    cs.LG cs.AI cs.CL

    Many-Shot In-Context Learning

    Authors: Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

    Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative… ▽ More

    Submitted 17 October, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: NeurIPS (Spotlight)

  46. arXiv:2404.04903  [pdf, other

    cs.LG cs.AI

    Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

    Authors: Rohit Agarwal, Arijit Das, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

    Abstract: The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  47. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  48. arXiv:2403.03950  [pdf, other

    cs.LG cs.AI stat.ML

    Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

    Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

    Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  49. arXiv:2402.15514  [pdf

    cs.CL cs.AI

    Large Scale Generative AI Text Applied to Sports and Music

    Authors: Aaron Baughman, Stephen Hammer, Rahul Agarwal, Gozde Akay, Eduardo Morales, Tony Johnson, Leonid Karlinsky, Rogerio Feris

    Abstract: We address the problem of scaling up the production of media content, including commentary and personalized news stories, for large-scale sports and music events worldwide. Our approach relies on generative AI models to transform a large volume of multimodal data (e.g., videos, articles, real-time scoring feeds, statistics, and fact sheets) into coherent and fluent text. Based on this approach, we… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures, 5 tables

  50. arXiv:2402.09371  [pdf, other

    cs.LG cs.AI cs.CL

    Transformers Can Achieve Length Generalization But Not Robustly

    Authors: Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

    Abstract: Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively straightforward tasks. In this paper, we test the Transformer's ability of length generalization using the task of addition of two integers. We show that the succe… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.