这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 451 results for author: Garg, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15153  [pdf, ps, other

    cs.CV

    SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection

    Authors: Chun-Jung Lin, Tat-Jun Chin, Sourav Garg, Feras Dayoub

    Abstract: Accurate, up-to-date High-Definition (HD) maps are critical for urban planning, infrastructure monitoring, and autonomous navigation. However, these maps quickly become outdated as environments evolve, creating a need for robust methods that not only detect changes but also incorporate them into updated 3D representations. While change detection techniques have advanced significantly, there remain… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Comments: accepted by WACV 2026

  2. arXiv:2511.13685  [pdf, ps, other

    cs.LG cs.AI

    Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

    Authors: Disha Varshney, Samarth Garg, Sarthak Tyagi, Deeksha Varshney, Nayan Deep, Asif Ekbal

    Abstract: In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 40 pages

  3. arXiv:2511.12642  [pdf, ps, other

    gr-qc astro-ph.IM cs.LG

    Auto-encoder model for faster generation of effective one-body gravitational waveform approximations

    Authors: Suyog Garg, Feng-Li Lin, Kipp Cannon

    Abstract: Upgrades to current gravitational wave detectors for the next observation run and upcoming third-generation observatories, like the Einstein telescope, are expected to have enormous improvements in detection sensitivities and compact object merger event rates. Estimation of source parameters for a wider parameter space that these detectable signals will lie in, will be a computational challenge. T… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: Submitting to PRD

  4. arXiv:2511.06083  [pdf, ps, other

    cs.LG

    Event-driven physics-informed operator learning for reliability analysis

    Authors: Shailesh Garg, Souvik Chakraborty

    Abstract: Reliability analysis of engineering systems under uncertainty poses significant computational challenges, particularly for problems involving high-dimensional stochastic inputs, nonlinear system responses, and multiphysics couplings. Traditional surrogate modeling approaches often incur high energy consumption, which severely limits their scalability and deployability in resource-constrained envir… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

  5. arXiv:2510.20726  [pdf, ps, other

    cs.CV

    AutoScape: Geometry-Consistent Long-Horizon Scene Generation

    Authors: Jiacheng Chen, Ziyu Jiang, Mingfu Liang, Bingbing Zhuang, Jong-Chyi Su, Sparsh Garg, Ying Wu, Manmohan Chandraker

    Abstract: This paper proposes AutoScape, a long-horizon driving scene generation framework. At its core is a novel RGB-D diffusion model that iteratively generates sparse, geometrically consistent keyframes, serving as reliable anchors for the scene's appearance and geometry. To maintain long-range geometric consistency, the model 1) jointly handles image and depth in a shared latent space, 2) explicitly co… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: ICCV 2025. Project page: https://auto-scape.github.io

  6. arXiv:2510.16340  [pdf, ps, other

    cs.CL cs.AI

    Thinking About Thinking: Evaluating Reasoning in Post-Trained Language Models

    Authors: Pratham Singla, Shivank Garg, Ayush Singh, Ishan Garg, Ketan Suhaas Saichandran

    Abstract: Recent advances in post-training techniques have endowed Large Language Models (LLMs) with enhanced capabilities for tackling complex, logic-intensive tasks through the generation of supplementary planning tokens. This development raises a fundamental question: Are these models aware of what they "learn" and "think"? To address this, we define three core competencies: (1) awareness of learned late… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

  7. arXiv:2510.08996  [pdf, ps, other

    cs.SE cs.AI

    Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation

    Authors: Spandan Garg, Benjamin Steenhoek, Yufan Huang

    Abstract: Current benchmarks for evaluating software engineering agents, such as SWE-Bench Verified, are predominantly derived from GitHub issues and fail to accurately reflect how developers interact with chat-based coding assistants in integrated development environments (IDEs). We posit that this mismatch leads to a systematic overestimation of agent's capabilities in real-world scenarios, especially bug… ▽ More

    Submitted 14 October, 2025; v1 submitted 10 October, 2025; originally announced October 2025.

  8. arXiv:2510.05051  [pdf, ps, other

    cs.CV

    SegMASt3R: Geometry Grounded Segment Matching

    Authors: Rohit Jayanti, Swayam Agrawal, Vansh Garg, Siddharth Tourani, Muhammad Haris Khan, Sourav Garg, Madhava Krishna

    Abstract: Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features, segment matching captures structured regions, offering greater robustness to occlusions, lighting variations, and viewpoint changes. In this paper, we leverage the… ▽ More

    Submitted 24 October, 2025; v1 submitted 6 October, 2025; originally announced October 2025.

    Comments: Accepted to The 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) as a Spotlight (top 3.5%)

  9. arXiv:2510.03840  [pdf, ps, other

    cs.CV

    Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models

    Authors: Pranav Sharma, Shivank Garg, Durga Toshniwal

    Abstract: Recent advances in image generation models have led to models that produce synthetic images that are increasingly difficult for standard AI detectors to identify, even though they often remain distinguishable by humans. To identify this discrepancy, we introduce \textbf{Mirage}, a curated dataset comprising a diverse range of AI-generated images exhibiting visible artifacts, where current state-of… ▽ More

    Submitted 4 October, 2025; originally announced October 2025.

    Comments: ACM MM'25, MALLM Workshop

  10. arXiv:2510.01529  [pdf, ps, other

    cs.LG cs.CR

    Bypassing Prompt Guards in Production with Controlled-Release Prompting

    Authors: Jaiden Fairoze, Sanjam Garg, Keewoo Lee, Mingyuan Wang

    Abstract: As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms designed to filter malicious queries while being easy to implement and update. In this work, we introduce a new attack that circumvents such prompt guards, highlighting their limitations. Our method consistently jailbreaks production models while mai… ▽ More

    Submitted 7 October, 2025; v1 submitted 1 October, 2025; originally announced October 2025.

  11. arXiv:2509.24091  [pdf, ps, other

    cs.SE cs.AI cs.PF

    PerfBench: Can Agents Resolve Real-World Performance Bugs?

    Authors: Spandan Garg, Roshanak Zilouchian Moghaddam, Neel Sundaresan

    Abstract: Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineering agents have shown promise in automated bug fixing, existing benchmarks primarily focus on functional correctness and fail to evaluate agents' abilities to identify and resolve no… ▽ More

    Submitted 16 October, 2025; v1 submitted 28 September, 2025; originally announced September 2025.

  12. arXiv:2509.20148  [pdf, ps, other

    cs.CV

    Smaller is Better: Enhancing Transparency in Vehicle AI Systems via Pruning

    Authors: Sanish Suwal, Shaurya Garg, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi

    Abstract: Connected and autonomous vehicles continue to heavily rely on AI systems, where transparency and security are critical for trust and operational safety. Post-hoc explanations provide transparency to these black-box like AI models but the quality and reliability of these explanations is often questioned due to inconsistencies and lack of faithfulness in representing model decisions. This paper syst… ▽ More

    Submitted 5 October, 2025; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: 17 pages

  13. arXiv:2509.19552  [pdf, ps, other

    cs.CV

    iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning

    Authors: Manyi Yao, Bingbing Zhuang, Sparsh Garg, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker, Abhishek Aich

    Abstract: Grounding large language models (LLMs) in domain-specific tasks like post-hoc dash-cam driving video analysis is challenging due to their general-purpose training and lack of structured inductive biases. As vision is often the sole modality available for such analysis (i.e., no LiDAR, GPS, etc.), existing video-based vision-language models (V-VLMs) struggle with spatial reasoning, causal inference… ▽ More

    Submitted 1 October, 2025; v1 submitted 23 September, 2025; originally announced September 2025.

    Comments: Accepted at NeurIPS 2025

  14. arXiv:2509.18436  [pdf, ps, other

    cs.AI cs.CL cs.DB

    Memory-QA: Answering Recall Questions Based on Multimodal Memories

    Authors: Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiun-Zu Kuo, Jiayang Xu, Ankur Bansal, Christopher Brossman, Yue Liu, Aaron Colak, Ahmed Aly, Anuj Kumar, Xin Luna Dong

    Abstract: We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To ad… ▽ More

    Submitted 26 September, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

  15. arXiv:2509.09594  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    ObjectReact: Learning Object-Relative Control for Visual Navigation

    Authors: Sourav Garg, Dustin Craggs, Vineeth Bhat, Lachlan Mares, Stefan Podgorski, Madhava Krishna, Feras Dayoub, Ian Reid

    Abstract: Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and subgoal image. However, image-level representations of the world have limitations because images are… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: CoRL 2025; 23 pages including appendix

  16. arXiv:2509.08699  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals

    Authors: Stefan Podgorski, Sourav Garg, Mehdi Hosseinzadeh, Lachlan Mares, Feras Dayoub, Ian Reid

    Abstract: Visual navigation in robotics traditionally relies on globally-consistent 3D maps or learned controllers, which can be computationally expensive and difficult to generalize across diverse environments. In this work, we present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation without requiring 3D maps or pre-trained controllers. Our… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

    Comments: 9 pages, 5 figures, ICRA 2025

  17. arXiv:2508.20030  [pdf, ps, other

    eess.SY cs.AI cs.AR cs.LG

    Large Language Models (LLMs) for Electronic Design Automation (EDA)

    Authors: Kangwei Xu, Denis Schwachhofer, Jason Blocklove, Ilia Polian, Peter Domanski, Dirk Pflüger, Siddharth Garg, Ramesh Karri, Ozgur Sinanoglu, Johann Knechtel, Zhuorui Zhao, Ulf Schlichtmann, Bing Li

    Abstract: With the growing complexity of modern integrated circuits, hardware engineers are required to devote more effort to the full design-to-manufacturing workflow. This workflow involves numerous iterations, making it both labor-intensive and error-prone. Therefore, there is an urgent demand for more efficient Electronic Design Automation (EDA) solutions to accelerate hardware development. Recently, la… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: Accepted by IEEE International System-on-Chip Conference

  18. arXiv:2508.16738  [pdf, ps, other

    cs.AR cs.CR

    zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates

    Authors: Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen

    Abstract: Zero-Knowledge Proofs (ZKPs) have emerged as powerful tools for secure and privacy-preserving computation. ZKPs enable one party to convince another of a statement's validity without revealing anything else. This capability has profound implications in many domains, including: machine learning, blockchain, image authentication, and electronic voting. Despite their potential, ZKPs have seen limited… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

    Comments: 14 pages, 14 figures

  19. arXiv:2508.09726  [pdf, ps, other

    cs.CL cs.LG

    Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

    Authors: Vaishnavi Shrivastava, Ahmed Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos

    Abstract: Large language models trained with reinforcement learning with verifiable rewards tend to trade accuracy for length--inflating response lengths to achieve gains in accuracy. While longer answers may be warranted for harder problems, many tokens are merely "filler": repetitive, verbose text that makes no real progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length… ▽ More

    Submitted 13 August, 2025; originally announced August 2025.

  20. arXiv:2508.06671  [pdf, ps, other

    cs.CL cs.AI

    Do Biased Models Have Biased Thoughts?

    Authors: Swati Rajwal, Shivank Garg, Reem Abdel-Salam, Abdelrahman Zayed

    Abstract: The impressive performance of language models is undeniable. However, the presence of biases based on gender, race, socio-economic status, physical appearance, and sexual orientation makes the deployment of language models challenging. This paper studies the effect of chain-of-thought prompting, a recent approach that studies the steps followed by the model before it responds, on fairness. More sp… ▽ More

    Submitted 11 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

    Comments: Accepted at main track of the Second Conference on Language Modeling (COLM 2025)

    ACM Class: I.2.7

  21. arXiv:2508.06047  [pdf, ps, other

    cs.AR

    ArchXBench: A Complex Digital Systems Benchmark Suite for LLM Driven RTL Synthesis

    Authors: Suresh Purini, Siddhant Garg, Mudit Gaur, Sankalp Bhat, Sohan Mupparapu, Arun Ravindran

    Abstract: Modern SoC datapaths include deeply pipelined, domain-specific accelerators, but their RTL implementation and verification are still mostly done by hand. While large language models (LLMs) exhibit advanced code-generation abilities for programming languages like Python, their application to Verilog-like RTL remains in its nascent stage. This is reflected in the simple arithmetic and control circui… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

    Comments: Published in 7th ACM/IEEE International Symposium on Machine Learning for CAD

  22. arXiv:2508.02047  [pdf, ps, other

    cs.CV

    Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model Limitations

    Authors: Sparsh Garg, Abhishek Aich

    Abstract: Obtaining high-quality fine-grained annotations for traffic signs is critical for accurate and safe decision-making in autonomous driving. Widely used datasets, such as Mapillary, often provide only coarse-grained labels - without distinguishing semantically important types such as stop signs or speed limit signs. To this end, we present a new validation set for traffic signs derived from the Mapi… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: Accepted to ICCV 2025 Workshop (4th DataCV Workshop and Challenge)

  23. arXiv:2507.20346  [pdf

    cs.CY

    EyeAI: AI-Assisted Ocular Disease Detection for Equitable Healthcare Access

    Authors: Shiv Garg, Ginny Berkemeier

    Abstract: Ocular disease affects billions of individuals unevenly worldwide. It continues to increase in prevalence with trends of growing populations of diabetic people, increasing life expectancies, decreasing ophthalmologist availability, and rising costs of care. We present EyeAI, a system designed to provide artificial intelligence-assisted detection of ocular diseases, thereby enhancing global health.… ▽ More

    Submitted 27 July, 2025; originally announced July 2025.

  24. MTU: The Multifunction Tree Unit for Accelerating Zero-Knowledge Proofs

    Authors: Jianqiao Mo, Alhad Daftardar, Joey Ah-Kiow, Kaiyue Guo, Benedikt Bünz, Siddharth Garg, Brandon Reagen

    Abstract: Zero-Knowledge Proofs (ZKPs) are critical for privacy-preserving techniques and verifiable computation. Many ZKP protocols rely on key kernels such as the SumCheck protocol and Merkle Tree commitments to enable their key security properties. These kernels exhibit balanced binary tree computational patterns, which enable efficient hardware acceleration. Although prior work has investigated accelera… ▽ More

    Submitted 19 October, 2025; v1 submitted 22 July, 2025; originally announced July 2025.

    Comments: (Best Paper Nominee) Accepted to HASP'25 at MICRO 2025

    Journal ref: Proceedings of the International Workshop on Hardware and Architectural Support for Security and Privacy 2025

  25. arXiv:2507.13264  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Voxtral

    Authors: Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, Abhinav Rastogi, Adam Yang, Albert Q. Jiang, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Anmol Agarwal, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout , et al. (81 additional authors not shown)

    Abstract: We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enab… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 17 pages

  26. arXiv:2507.11574  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators

    Authors: Kazuma Kobayashi, Shailesh Garg, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam

    Abstract: Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a framework that transforms neural operator-based virtual sensing with calibrated, distribution-free pr… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  27. arXiv:2507.10646  [pdf, ps, other

    cs.SE cs.AI

    CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance

    Authors: Myeongsoo Kim, Shweta Garg, Baishakhi Ray, Varun Kumar, Anoop Deoras

    Abstract: Programming assistants powered by large language models have transformed software development, yet most benchmarks focus narrowly on code generation tasks. Recent efforts like InfiBench and StackEval attempt to address this gap using Stack Overflow data but remain limited to single-turn interactions in isolated contexts, require significant manual curation, and fail to represent complete project e… ▽ More

    Submitted 16 July, 2025; v1 submitted 14 July, 2025; originally announced July 2025.

  28. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  29. arXiv:2507.02972  [pdf, ps, other

    cs.CV cs.LG

    Farm-Level, In-Season Crop Identification for India

    Authors: Ishan Deshpande, Amandeep Kaur Reehal, Chandan Nath, Renu Singh, Aayush Patel, Aishwarya Jayagopal, Gaurav Singh, Gaurav Aggarwal, Amit Agarwal, Prathmesh Bele, Sridhar Reddy, Tanya Warrier, Kinjal Singh, Ashish Tendulkar, Luis Pazos Outon, Nikita Saxena, Agata Dondzik, Dinesh Tewari, Shruti Garg, Avneet Singh, Harsh Dhand, Vaibhav Rajan, Alok Talekar

    Abstract: Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restri… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  30. arXiv:2506.22557  [pdf, ps, other

    cs.CR cs.LG

    MetaCipher: A Time-Persistent and Universal Multi-Agent Framework for Cipher-Based Jailbreak Attacks for LLMs

    Authors: Boyuan Chen, Minghao Shao, Abdul Basit, Siddharth Garg, Muhammad Shafique

    Abstract: As large language models (LLMs) grow more capable, they face growing vulnerability to sophisticated jailbreak attacks. While developers invest heavily in alignment finetuning and safety guardrails, researchers continue publishing novel attacks, driving progress through adversarial iteration. This dynamic mirrors a strategic game of continual evolution. However, two major challenges hinder jailbrea… ▽ More

    Submitted 13 August, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  31. arXiv:2506.21569  [pdf, ps, other

    cs.CL cs.AI

    Hybrid-NL2SVA: Integrating RAG and Finetuning for LLM-based NL2SVA

    Authors: Weihua Xiao, Derek Ekberg, Siddharth Garg, Ramesh Karri

    Abstract: SystemVerilog Assertions (SVAs) are critical for verifying the correctness of hardware designs, but manually writing them from natural language property descriptions, i.e., NL2SVA, remains a labor-intensive and error-prone task. Recent advances in large language models (LLMs) offer opportunities to automate this translation. However, existing models still struggle with understanding domain-specifi… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  32. arXiv:2506.20100  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV

    MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations

    Authors: Vardhan Dongre, Chi Gui, Shubham Garg, Hooshang Nayyeri, Gokhan Tur, Dilek Hakkani-Tür, Vikram S. Adve

    Abstract: We introduce MIRAGE, a new benchmark for multimodal expert-level reasoning and decision-making in consultative interaction settings. Designed for the agriculture domain, MIRAGE captures the full complexity of expert consultations by combining natural user queries, expert-authored responses, and image-based context, offering a high-fidelity benchmark for evaluating models on grounded reasoning, cla… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 66 pages, 32 figures, 23 tables

  33. arXiv:2506.17556  [pdf, ps, other

    cs.DS cs.LG

    Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method

    Authors: Sachin Garg, Michał Dereziński

    Abstract: The Nyström method is a popular low-rank approximation technique for large matrices that arise in kernel methods and convex optimization. Yet, when the data exhibits heavy-tailed spectral decay, the effective dimension of the problem often becomes so large that even the Nyström method may be outside of our computational budget. To address this, we propose Block-Nyström, an algorithm that injects a… ▽ More

    Submitted 18 July, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  34. arXiv:2506.13048  [pdf, ps, other

    cs.LG

    The Space Complexity of Learning-Unlearning Algorithms

    Authors: Yeshwanth Cherapanamjeri, Sumegha Garg, Nived Rajaraman, Ayush Sekhari, Abhishek Shetty

    Abstract: We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner n… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  35. arXiv:2506.12286  [pdf, ps, other

    cs.AI cs.SE

    The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason

    Authors: Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam

    Abstract: As large language models (LLMs) become increasingly capable and widely adopted, benchmarks play a central role in assessing their practical utility. For example, SWE-Bench Verified has emerged as a critical benchmark for evaluating LLMs' software engineering abilities, particularly their aptitude for resolving real-world GitHub issues. Recent LLMs show impressive performance on SWE-Bench, leading… ▽ More

    Submitted 6 August, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  36. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  37. arXiv:2506.09450  [pdf, ps, other

    cs.CL cs.AI

    UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs

    Authors: Prameshwar Thiyagarajan, Vaishnavi Parimi, Shamant Sai, Soumil Garg, Zhangir Meirbek, Nitin Yarlagadda, Kevin Zhu, Chris Kim

    Abstract: Theory of Mind (ToM), the ability to understand the mental states of oneself and others, remains a challenging area for large language models (LLMs), which often fail to predict human mental states accurately. In this paper, we introduce UniToMBench, a unified benchmark that integrates the strengths of SimToM and TOMBENCH to systematically improve and assess ToM capabilities in LLMs by integrating… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted at Conference of the North American Chapter of the Association for Computational Linguistics, Student Research Workshop 2025 (NAACL SRW 2025)

  38. arXiv:2506.07239  [pdf, ps, other

    cs.AR cs.AI

    VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code

    Authors: Raghu Vamshi Hemadri, Jitendra Bhandari, Andre Nakkab, Johann Knechtel, Badri P Gopalan, Ramesh Narayanaswamy, Ramesh Karri, Siddharth Garg

    Abstract: Modern chip design is complex, and there is a crucial need for early-stage prediction of key design-quality metrics like timing and routing congestion directly from Verilog code (a commonly used programming language for hardware design). It is especially important yet complex to predict individual lines of code that cause timing violations or downstream routing congestion. Prior works have tried a… ▽ More

    Submitted 28 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

  39. arXiv:2506.01854  [pdf, ps, other

    cs.CR cs.CC

    Black-Box Crypto is Useless for Pseudorandom Codes

    Authors: Sanjam Garg, Sam Gunn, Mingyuan Wang

    Abstract: A pseudorandom code is a keyed error-correction scheme with the property that any polynomial number of encodings appear random to any computationally bounded adversary. We show that the pseudorandomness of any code tolerating a constant rate of random errors cannot be based on black-box reductions to almost any generic cryptographic primitive: for instance, anything that can be built from random o… ▽ More

    Submitted 29 September, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  40. arXiv:2506.00318  [pdf, ps, other

    cs.CV

    Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

    Authors: Sara Ghazanfari, Francesco Croce, Nicolas Flammarion, Prashanth Krishnamurthy, Farshad Khorrami, Siddharth Garg

    Abstract: Recent work has shown that eliciting Large Language Models (LLMs) to generate reasoning traces in natural language before answering the user's request can significantly improve their performance across tasks. This approach has been extended to multimodal LLMs, where the models can produce chain-of-thoughts (CoT) about the content of input images and videos. In this work, we propose to obtain video… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  41. arXiv:2505.20302  [pdf, ps, other

    cs.PL cs.AI cs.LO

    VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification

    Authors: Patrick Yubeaton, Andre Nakkab, Weihua Xiao, Luca Collini, Ramesh Karri, Chinmay Hegde, Siddharth Garg

    Abstract: This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the gr… ▽ More

    Submitted 18 November, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  42. arXiv:2505.13723  [pdf, ps, other

    cs.LG math.OC stat.ML

    Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

    Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michał Dereziński, Madeleine Udell

    Abstract: Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number o… ▽ More

    Submitted 27 October, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: NeurIPS 2025; 31 pages, 7 figures, 2 tables

  43. arXiv:2504.21536  [pdf, other

    cs.DC

    Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model

    Authors: Suvarthi Sarkar, Sparsh Mittal, Shivam Garg, Aryabartta Sahu

    Abstract: Cloud computing has become a pivotal platform for executing scientific workflows due to its scalable and cost-effective infrastructure. Scientific Cloud Service Providers (SCSPs) act as intermediaries that rent virtual machines (VMs) from Infrastructure-as-a-Service (IaaS) providers to meet users' workflow execution demands. The SCSP earns profit from the execution of scientific workflows if it co… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  44. Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs

    Authors: Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen

    Abstract: Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without revealing anything else. ZKPs have applications in blockchain technologies, verifiable machine learning, and electronic voting, but have yet to see widespread adoption due to the computational complexi… ▽ More

    Submitted 5 August, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

    Comments: 16 pages, 14 figures, presented at the 52nd International Symposium on Computer Architecture (ISCA), 2025

  45. arXiv:2504.00294  [pdf, other

    cs.LG cs.AI cs.CL

    Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

    Authors: Vidhisha Balachandran, Jingya Chen, Lingjiao Chen, Shivam Garg, Neel Joshi, Yash Lara, John Langford, Besmira Nushi, Vibhav Vineet, Yue Wu, Safoora Yousefi

    Abstract: Inference-time scaling can enhance the reasoning capabilities of large language models (LLMs) on complex problems that benefit from step-by-step problem solving. Although lengthening generated scratchpads has proven effective for mathematical tasks, the broader impact of this approach on other tasks remains less clear. In this work, we investigate the benefits and limitations of scaling methods ac… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    ACM Class: I.2

  46. arXiv:2503.12721  [pdf, other

    cs.AI

    Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective

    Authors: Luca Collini, Andrew Hennessee, Ramesh Karri, Siddharth Garg

    Abstract: Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engin… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: 7 pages, submitted for peer review

  47. arXiv:2503.07832  [pdf, other

    cs.AI cs.CL cs.LG cs.SE

    RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code

    Authors: Dhruv Gautam, Spandan Garg, Jinu Jang, Neel Sundaresan, Roshanak Zilouchian Moghaddam

    Abstract: Recent advances in language model (LM) agents and function calling have enabled autonomous, feedback-driven systems to solve problems across various digital domains. To better understand the unique limitations of LM agents, we introduce RefactorBench, a benchmark consisting of 100 large handcrafted multi-file refactoring tasks in popular open-source repositories. Solving tasks within RefactorBench… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: ICLR 2025 Camera Ready

    ACM Class: I.2.5

  48. arXiv:2503.00737  [pdf, ps, other

    cs.CV

    Multi-Cali Anything: Dense Feature Multi-Frame Structure-from-Motion for Large-Scale Camera Array Calibration

    Authors: Jinjiang You, Hewei Wang, Yijie Li, Mingxiao Huo, Long Van Tran Ha, Mingyuan Ma, Jinfeng Xu, Jiayi Zhang, Puzhen Wu, Shubham Garg, Wei Pu

    Abstract: Calibrating large-scale camera arrays, such as those in dome-based setups, is time-intensive and typically requires dedicated captures of known patterns. While extrinsics in such arrays are fixed due to the physical setup, intrinsics often vary across sessions due to factors like lens adjustments or temperature changes. In this paper, we propose a dense-feature-driven multi-frame calibration metho… ▽ More

    Submitted 31 July, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: Accepted to IROS 2025. Final camera-ready version. 8 pages

  49. arXiv:2502.16182  [pdf, other

    cs.CL

    IPO: Your Language Model is Secretly a Preference Classifier

    Authors: Shivank Garg, Ayush Singh, Shweta Singh, Paras Chopra

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. While it enables LLMs to achieve human-level alignment, it often incurs significant computational and financial costs due to its reliance on training external reward models or human-labeled preferences. In this work, we propose Implicit Preference Op… ▽ More

    Submitted 20 March, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

  50. arXiv:2502.08643  [pdf, other

    cs.RO cs.AI cs.CV

    A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

    Authors: Shivansh Patel, Xinchen Yin, Wenlong Huang, Shubham Garg, Hooshang Nayyeri, Li Fei-Fei, Svetlana Lazebnik, Yunzhu Li

    Abstract: Task specification for robotic manipulation in open-world environments is challenging, requiring flexible and adaptive objectives that align with human intentions and can evolve through iterative feedback. We introduce Iterative Keypoint Reward (IKER), a visually grounded, Python-based reward function that serves as a dynamic task specification. Our framework leverages VLMs to generate and refine… ▽ More

    Submitted 18 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: ICRA 2025, Project Page: https://iker-robot.github.io/