这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 106 results for author: Garg, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.14179  [pdf, ps, other

    cs.CV cs.AI

    Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures

    Authors: Yuancheng Xu, Wenqi Xian, Li Ma, Julien Philip, Ahmet Levent Taşel, Yiwei Zhao, Ryan Burgert, Mingming He, Oliver Hermann, Oliver Pilarski, Rahul Garg, Paul Debevec, Ning Yu

    Abstract: We introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models through a novel customization data pipeline. We train the character consistency component with recorded volumetric capture performances re-rendered with diverse camera trajectories via 4D Gaussian Splatting (4DGS), lighting variability obtained with a video relighting model.… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

    Comments: Accepted to SIGGRAPH Asia 2025

  2. arXiv:2510.09872  [pdf, ps, other

    cs.LG cs.AI

    WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

    Authors: Sanjari Srivastava, Gang Li, Cheng Chang, Rishu Garg, Manpreet Kaur, Charlene Y. Lee, Yuezhang Li, Yining Mao, Ignacio Cases, Yanan Xie, Peng Qi

    Abstract: Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container to extract information). We introduce WARC-Bench (Web Archive Benchmark), a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  3. arXiv:2509.24166  [pdf, ps, other

    cs.LG cs.AI

    Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs

    Authors: Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey

    Abstract: Machine unlearning in large language models (LLMs) is essential for privacy and safety; however, existing approaches remain unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent on retained data while performing gradient ascent on forget data, the data whose influence should be removed. However, when combined with cross-entropy loss, this procedu… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: In Submission

  4. arXiv:2509.07929  [pdf, ps, other

    cs.GT cs.IR cs.LG

    Smart Fast Finish: Preventing Overdelivery via Daily Budget Pacing at DoorDash

    Authors: Rohan Garg, Yongjin Xiao, Jason, Yang, Mandar Rahurkar

    Abstract: We present a budget pacing feature called Smart Fast Finish (SFF). SFF builds upon the industry standard Fast Finish (FF) feature in budget pacing systems that depletes remaining advertising budget as quickly as possible towards the end of some fixed time period. SFF dynamically updates system parameters such as start time and throttle rate depending on historical ad-campaign data. SFF is currentl… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  5. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  6. arXiv:2505.09436  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.IR

    CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

    Authors: Raghav Garg, Kapil Sharma, Karan Gupta

    Abstract: Large Language Models (LLMs) hold immense potential for revolutionizing Customer Experience Management (CXM), particularly in contact center operations. However, evaluating their practical utility in complex operational environments is hindered by data scarcity (due to privacy concerns) and the limitations of current benchmarks. Existing benchmarks often lack realism, failing to incorporate deep k… ▽ More

    Submitted 19 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2502.13113  [pdf, other

    cs.DC cs.AR

    HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads

    Authors: Raveesh Garg, Michael Pellauer, Tushar Krishna

    Abstract: Artificial intelligence (AI) application domains consist of a mix of tensor operations with high and low arithmetic intensities (aka reuse). Hierarchical (i.e. compute along multiple levels of memory hierarchy) and heterogeneous (multiple different sub-accelerators) accelerators are emerging as a popular way to process mixed reuse workloads, and workloads which consist of tensor operators with div… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  8. arXiv:2502.06807  [pdf, other

    cs.LG cs.AI cs.CL

    Competitive Programming with Large Reasoning Models

    Authors: OpenAI, :, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaiev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju , et al. (1 additional authors not shown)

    Abstract: We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad i… ▽ More

    Submitted 18 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  9. arXiv:2412.16720  [pdf, other

    cs.AI

    OpenAI o1 System Card

    Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

    Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  10. arXiv:2412.11420  [pdf

    cs.CV

    Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion

    Authors: Adam Bethell, Ravi Garg, Ian Reid

    Abstract: Estimating the 6D pose and 3D size of an object from an image is a fundamental task in computer vision. Most current approaches are restricted to specific instances with known models or require ground truth depth information or point cloud captures from LIDAR. We tackle the harder problem of pose estimation for category-level objects from a single RGB image. We propose a novel solution that elimin… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  11. arXiv:2411.12174  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

    Authors: Rahul Garg, Trilok Padhi, Hemang Jain, Ugur Kursuncu, Ponnurangam Kumaraguru

    Abstract: Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. O… ▽ More

    Submitted 24 February, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  12. arXiv:2409.17650  [pdf, other

    cs.AI cs.CL

    Digital Twin Ecosystem for Oncology Clinical Operations

    Authors: Himanshu Pandey, Akhil Amod, Shivang, Kshitij Jaggi, Ruchi Garg, Abheet Jain, Vinayak Tantia

    Abstract: Artificial Intelligence (AI) and Large Language Models (LLMs) hold significant promise in revolutionizing healthcare, especially in clinical applications. Simultaneously, Digital Twin technology, which models and simulates complex systems, has gained traction in enhancing patient care. However, despite the advances in experimental clinical settings, the potential of AI and digital twins to streaml… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Pre Print

  13. arXiv:2408.11748  [pdf, other

    cs.CV

    Understanding Depth and Height Perception in Large Visual-Language Models

    Authors: Shehreen Azad, Yash Jain, Rishit Garg, Yogesh S Rawat, Vibhav Vineet

    Abstract: Geometric understanding - including depth and height perception - is fundamental to intelligence and crucial for navigating our environment. Despite the impressive capabilities of large Vision Language Models (VLMs), it remains unclear how well they possess the geometric understanding required for practical applications in visual perception. In this work, we focus on evaluating the geometric under… ▽ More

    Submitted 25 April, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted in CVPRW 2025. Project page: https://sacrcv.github.io/GeoMeter-website/

  14. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis , et al. (237 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 21 December, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  15. arXiv:2407.12354  [pdf, other

    cs.CV

    Invertible Neural Warp for NeRF

    Authors: Shin-Fang Chng, Ravi Garg, Hemanth Saratchandran, Simon Lucey

    Abstract: This paper tackles the simultaneous optimization of pose and Neural Radiance Fields (NeRF). Departing from the conventional practice of using explicit global representations for camera pose, we propose a novel overparameterized representation that models camera poses as learnable rigid warp functions. We establish that modeling the rigid warps must be tightly coupled with constraints and regulariz… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Project page: https://sfchng.github.io/ineurowarping-github.io/

  16. arXiv:2406.18954  [pdf, other

    cs.LG cs.AI

    Alignment For Performance Improvement in Conversation Bots

    Authors: Raghav Garg, Kapil Sharma, Shrey Singla

    Abstract: This paper shows that alignment methods can achieve superior adherence to guardrails compared to instruction fine-tuning alone in conversational agents, also known as bots, within predefined guidelines or 'guardrails'. It examines traditional training approaches such as instruction fine-tuning and the recent advancements in direct alignment methods like Identity Preference Optimization (IPO), and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  17. A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The Facial Action Coding System (FACS) for studying facial expressions is manual and requires significant effort and expertise. This paper explores the use of automated techniques to generate Action Units (AUs) for studying facial expressions. We propose an unsupervised approach based on Principal Component Analysis (PCA) and facial keypoint tracking to generate data-driven AUs called PCA AUs usin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in [LNCS,volume 14301], and is available online at https://doi.org/10.1007/978-3-031-45170-6_85

  18. arXiv:2406.05434  [pdf, other

    cs.CV cs.HC

    Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facia… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  19. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  20. arXiv:2405.02793  [pdf, other

    cs.CV cs.CL

    ImageInWords: Unlocking Hyper-Detailed Image Descriptions

    Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

    Abstract: Despite the longstanding adage "an image is worth a thousand words," generating accurate hyper-detailed image descriptions remains unsolved. Trained on short web-scraped image text, vision-language models often generate incomplete descriptions with visual inconsistencies. We address this via a novel data-centric approach with ImageInWords (IIW), a carefully designed human-in-the-loop framework for… ▽ More

    Submitted 28 October, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: Webpage (https://google.github.io/imageinwords), GitHub (https://github.com/google/imageinwords), HuggingFace (https://huggingface.co/datasets/google/imageinwords)

  21. arXiv:2405.01736  [pdf, other

    cs.AR

    PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects

    Authors: Raveesh Garg, Hyoukjun Kwon, Eric Qin, Yu-Hsin Chen, Tushar Krishna, Liangzhen Lai

    Abstract: Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  22. arXiv:2404.19753  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DOCCI: Descriptions of Connected and Contrasting Images

    Authors: Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

    Abstract: Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  23. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2312.03766  [pdf, other

    cs.CL cs.CV

    Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

    Authors: Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

    Abstract: While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds… ▽ More

    Submitted 17 July, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Journal ref: ECCV 2024

  25. arXiv:2310.18235  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

    Authors: Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

    Abstract: Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Project website: https://google.github.io/dsg

  26. arXiv:2309.08685  [pdf, other

    cs.GT cs.DC

    Fairly Allocating Goods in Parallel

    Authors: Rohan Garg, Alexandros Psomas

    Abstract: We initiate the study of parallel algorithms for fairly allocating indivisible goods among agents with additive preferences. We give fast parallel algorithms for various fundamental problems, such as finding a Pareto Optimal and EF1 allocation under restricted additive valuations, finding an EF1 allocation for up to three agents, and finding an envy-free allocation with subsidies. On the flip side… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  27. arXiv:2306.10392  [pdf, other

    cs.CR cs.LG

    GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks

    Authors: Akshat Gupta, Laxman Singh Tomar, Ridhima Garg

    Abstract: Cyber attacks deceive machines into believing something that does not exist in the first place. However, there are some to which even humans fall prey. One such famous attack that attackers have used over the years to exploit the vulnerability of vision is known to be a Homoglyph attack. It employs a primary yet effective mechanism to create illegitimate domains that are hard to differentiate from… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Journal ref: AAAI AICS Conference 2023

  28. CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

    Authors: Rahul Madhavan, Rishabh Garg, Kahini Wadhawan, Sameep Mehta

    Abstract: We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structu… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 19 pages, 10 figures. Findings of ACL 2023

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  29. arXiv:2303.18135  [pdf

    cs.CR

    Towards A Sustainable and Ethical Supply Chain Management: The Potential of IoT Solutions

    Authors: Hardik Sharma, Rajat Garg, Harshini Sewani, Rasha Kashef

    Abstract: Globalization has introduced many new challenges making Supply chain management (SCM) complex and huge, for which improvement is needed in many industries. The Internet of Things (IoT) has solved many problems by providing security and traceability with a promising solution for supply chain management. SCM is segregated into different processes, each requiring different types of solutions. IoT dev… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 9 pages

  30. arXiv:2303.13504  [pdf, other

    cs.CV

    ReBotNet: Fast Real-time Video Enhancement

    Authors: Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M. Patel, Anne Menini

    Abstract: Most video restoration networks are slow, have high computational load, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time video enhancement for practical use-cases like live video calls and video streams. Our proposed method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a dual-branch framework. The first… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Project Website: https://jeya-maria-jose.github.io/rebotnet-web/

  31. arXiv:2303.11499  [pdf, other

    cs.DC cs.AR

    CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse

    Authors: Raveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Tensor algebra accelerators have been gaining popularity for running high-performance computing (HPC) workloads. Identifying optimal schedules for individual tensor operations and designing hardware to run these schedules is an active area of research. Unfortunately, operators in HPC workloads such as Conjugate Gradient often have operators with skewed shapes, fundamentally limiting the reuse any… ▽ More

    Submitted 4 March, 2025; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at the 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2025)

  32. mlpack 4: a fast, header-only C++ machine learning library

    Authors: Ryan R. Curtin, Marcus Edel, Omar Shrit, Shubham Agrawal, Suryoday Basak, James J. Balamuta, Ryan Birmingham, Kartik Dutt, Dirk Eddelbuettel, Rishabh Garg, Shikhar Jaiswal, Aakash Kaushik, Sangyeon Kim, Anjishnu Mukherjee, Nanubala Gnana Sai, Nippun Sharma, Yashwant Singh Parihar, Roshan Swain, Conrad Sanderson

    Abstract: For over 15 years, the mlpack machine learning library has served as a "swiss army knife" for C++-based machine learning. Its efficient implementations of common and cutting-edge machine learning algorithms have been used in a wide variety of scientific and industrial applications. This paper overviews mlpack 4, a significant upgrade over its predecessor. The library has been significantly refacto… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Journal of Open Source Software, Vol. 8, No. 82, 2023

  33. arXiv:2301.10852  [pdf, other

    cs.AR

    Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

    Authors: Francisco Muñoz-Martínez, Raveesh Garg, José L. Abellán, Michael Pellauer, Manuel E. Acacio, Tushar Krishna

    Abstract: Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (i.e., Inner Product, Outer Product or Gustavsons), that determines their overall efficiency. We demonstrate that this static decision inherently results in a suboptimal dynamic solution. This is because different SpMSpM kernels show vary… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: To appear on ASPLOS 2023

  34. arXiv:2211.14387  [pdf

    cs.LG cs.AI econ.EM

    Machine Learning Algorithms for Time Series Analysis and Forecasting

    Authors: Rameshwar Garg, Shriya Barpanda, Girish Rao Salanke N S, Ramya S

    Abstract: Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 9 Pages, 4 Figures, 9 Formulae, 1 Table, 6th International Conference on Microelectronics, Computing & Communication Systems (MCCS-2021), Paper ID: MCCS21084, Presented at MCCS-2021, Accepted, In Press

  35. arXiv:2206.13577  [pdf, other

    cs.CV cs.AI cs.LG

    A View Independent Classification Framework for Yoga Postures

    Authors: Mustafa Chasmai, Nirjhar Das, Aman Bhardwaj, Rahul Garg

    Abstract: Yoga is a globally acclaimed and widely recommended practice for a healthy living. Maintaining correct posture while performing a Yogasana is of utmost importance. In this work, we employ transfer learning from Human Pose Estimation models for extracting 136 key-points spread all over the body to train a Random Forest classifier which is used for estimation of the Yogasanas. The results are evalua… ▽ More

    Submitted 14 August, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

  36. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  37. arXiv:2202.11233  [pdf, other

    cs.CV

    Retrieval Augmented Classification for Long-Tail Visual Recognition

    Authors: Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton van den Hengel

    Abstract: We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classificatio… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  38. arXiv:2201.08916  [pdf, other

    cs.AR

    Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

    Authors: Eric Qin, Raveesh Garg, Abhimanyu Bambhaniya, Michael Pellauer, Angshuman Parashar, Sivasankaran Rajamanickam, Cong Hao, Tushar Krishna

    Abstract: Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they target tensor algebra (typically matrix multiplications); yet dozens of new accelerators are proposed for every new application. The motivation is that the size a… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  39. arXiv:2112.14406  [pdf, other

    cs.CV cs.LG

    Overcoming Mode Collapse with Adaptive Multi Adversarial Training

    Authors: Karttikeya Mangalam, Rohin Garg

    Abstract: Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator. Investigative study using a new data generation procedure indicates that the mode collapse of the generator is driven by the discriminator's inability to… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: BMVC 2021 Poster

  40. arXiv:2112.05858  [pdf, other

    cs.DC

    MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at Scale

    Authors: Yao Xu, Zhengji Zhao, Rohan Garg, Harsh Khetawat, Rebecca Hartman-Baker, Gene Cooperman

    Abstract: MANA-2.0 is a scalable, future-proof design for transparent checkpointing of MPI-based computations. Its network transparency ("network-agnostic") feature ensures that MANA-2.0 will provide a viable, efficient mechanism for transparently checkpointing MPI applications on current and future supercomputers. MANA-2.0 is an enhancement of previous work, the original MANA, which interposes MPI calls, a… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  41. arXiv:2111.10882  [pdf, other

    cs.CV cs.SD eess.AS

    Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

    Authors: Rishabh Garg, Ruohan Gao, Kristen Grauman

    Abstract: Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to binaural audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach ex… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: Published in BMVC 2021, project page: http://vision.cs.utexas.edu/projects/geometry-aware-binaural/

  42. arXiv:2110.12012  [pdf

    cs.DC cs.DB cs.LG

    RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

    Authors: Pankaj Singh, Sudhakar Singh, P K Mishra, Rakhi Garg

    Abstract: Frequent itemset mining (FIM) is a highly computational and data intensive algorithm. Therefore, parallel and distributed FIM algorithms have been designed to process large volume of data in a reduced time. Recently, a number of FIM algorithms have been designed on Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for th… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: This version is not published or communicated anywhere. arXiv admin note: substantial text overlap with arXiv:1912.06415

  43. arXiv:2110.05655  [pdf, other

    cs.CV

    Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

    Authors: Shumian Xin, Neal Wadhwa, Tianfan Xue, Jonathan T. Barron, Pratul P. Srinivasan, Jiawen Chen, Ioannis Gkioulekas, Rahul Garg

    Abstract: We present a method that takes as input a single dual-pixel image, and simultaneously estimates the image's defocus map -- the amount of defocus blur at each pixel -- and recovers an all-in-focus image. Our method is inspired from recent works that leverage the dual-pixel sensors available in many consumer cameras to assist with autofocus, and use them for recovery of defocus maps or all-in-focus… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 (Oral)

  44. arXiv:2110.03781  [pdf, ps, other

    cs.LG cs.NI

    5G Traffic Prediction with Time Series Analysis

    Authors: Nikhil Nayak, Rujula Singh R, Rameshwar Garg, Varun Danda, Chandana Kiran, Kaustuv Saha

    Abstract: In today's day and age, a mobile phone has become a basic requirement needed for anyone to thrive. With the cellular traffic demand increasing so dramatically, it is now necessary to accurately predict the user traffic in cellular networks, so as to improve the performance in terms of resource allocation and utilisation. Since traffic learning and prediction is a classical and appealing field, whi… ▽ More

    Submitted 20 July, 2025; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: 5 pages, 5 figures

    MSC Class: 68T50 ACM Class: I.2.0; I.2.4; I.2.6

  45. i-Pulse: A NLP based novel approach for employee engagement in logistics organization

    Authors: Rachit Garg, Arvind W Kiwelekar, Laxman D Netak, Akshay Ghodake

    Abstract: Although most logistics and freight forwarding organizations, in one way or another, claim to have core values. The engagement of employees is a vast structure that affects almost every part of the company's core environmental values. There is little theoretical knowledge about the relationship between firms and the engagement of employees. Based on research literature, this paper aims to provide… ▽ More

    Submitted 24 May, 2021; originally announced June 2021.

    Comments: 11 Pages 7 Figures. International Journal of Information Management Data Insights (Elsevier) 2021

  46. arXiv:2105.04419  [pdf, other

    cs.RO

    VDB-EDT: An Efficient Euclidean Distance Transform Algorithm Based on VDB Data Structure

    Authors: Delong Zhu, Chaoqun Wang, Wenshan Wang, Rohit Garg, Sebastian Scherer, Max Q. -H. Meng

    Abstract: This paper presents a fundamental algorithm, called VDB-EDT, for Euclidean distance transform (EDT) based on the VDB data structure. The algorithm executes on grid maps and generates the corresponding distance field for recording distance information against obstacles, which forms the basis of numerous motion planning algorithms. The contributions of this work mainly lie in three folds. Firstly, w… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  47. arXiv:2104.01272  [pdf, other

    cs.RO eess.SY

    Visual Servoing Approach for Autonomous UAV Landing on a Moving Vehicle

    Authors: Azarakhsh Keipour, Guilherme A. S. Pereira, Rogerio Bonatti, Rohit Garg, Puru Rastogi, Geetesh Dubey, Sebastian Scherer

    Abstract: Many aerial robotic applications require the ability to land on moving platforms, such as delivery trucks and marine research boats. We present a method to autonomously land an Unmanned Aerial Vehicle on a moving vehicle. A visual servoing controller approaches the ground vehicle using velocity commands calculated directly in image space. The control laws generate velocity commands in all three di… ▽ More

    Submitted 26 December, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: 18 pages. Published in Sensors Journal

    Journal ref: Sensors 2022, 22(17), 6549

  48. arXiv:2103.08546  [pdf, other

    cs.DC

    Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC

    Authors: Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Twinkle Jain, Rohan Garg, Gene Cooperman, Rebecca Hartman-Baker, Zhengji Zhao

    Abstract: Checkpoint/restart (C/R) provides fault-tolerant computing capability, enables long running applications, and provides scheduling flexibility for computing centers to support diverse workloads with different priority. It is therefore vital to get transparent C/R capability working at NERSC. MANA, by Garg et. al., is a transparent checkpointing tool that has been selected due to its MPI-agnostic an… ▽ More

    Submitted 16 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

  49. arXiv:2103.07977  [pdf, other

    cs.DC cs.AR

    Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

    Authors: Raveesh Garg, Eric Qin, Francisco Muñoz-Martínez, Robert Guirado, Akshay Jain, Sergi Abadal, José L. Abellán, Manuel E. Acacio, Eduard Alarcón, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelera… ▽ More

    Submitted 6 March, 2022; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)

  50. arXiv:2103.00933  [pdf, other

    cs.CV

    DF-VO: What Should Be Learnt for Visual Odometry?

    Authors: Huangying Zhan, Chamara Saroj Weerasekera, Jia-Wang Bian, Ravi Garg, Ian Reid

    Abstract: Multi-view geometry-based methods dominate the last few decades in monocular Visual Odometry for their superior performance, while they have been vulnerable to dynamic and low-texture scenes. More importantly, monocular methods suffer from scale-drift issue, i.e., errors accumulate over time. Recent studies show that deep neural networks can learn scene depths and relative camera in a self-supervi… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: extended version of ICRA-2020 paper (Visual Odometry Revisited: What Should Be Learnt?)