这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 99 results for author: Bhatnagar, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.13729  [pdf, ps, other

    cs.RO cs.AI

    AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework

    Authors: Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, Marcel Hallgarten

    Abstract: Rare, yet critical, scenarios pose a significant challenge in testing and evaluating autonomous driving planners. Relying solely on real-world driving scenes requires collecting massive datasets to capture these scenarios. While automatic generation of traffic scenarios appears promising, data-driven models require extensive training data and often lack fine-grained control over the output. Moreov… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  2. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  3. arXiv:2506.03667  [pdf, ps, other

    cs.CV cs.AI

    Accelerating SfM-based Pose Estimation with Dominating Set

    Authors: Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar

    Abstract: This paper introduces a preprocessing technique to speed up Structure-from-Motion (SfM) based pose estimation, which is critical for real-time applications like augmented reality (AR), virtual reality (VR), and robotics. Our method leverages the concept of a dominating set from graph theory to preprocess SfM models, significantly enhancing the speed of the pose estimation process without losing si… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  4. arXiv:2504.21845  [pdf, ps, other

    quant-ph cs.IT

    On the Efficacy of the Peeling Decoder for the Quantum Expander Code

    Authors: Jefrin Sharmitha Prabhu, Abhinav Vaishya, Shobhit Bhatnagar, Aryaman Manish Kolhe, V. Lalitha, P. Vijay Kumar

    Abstract: The problem of recovering from qubit erasures has recently gained attention as erasures occur in many physical systems such as photonic systems, trapped ions, superconducting qubits and circuit quantum electrodynamics. While several linear-time decoders for error correction are known, their error-correcting capability is limited to half the minimum distance of the code, whereas erasure correction… ▽ More

    Submitted 29 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  5. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  6. arXiv:2502.11604  [pdf, other

    cs.LG stat.ML

    An Actor-Critic Algorithm with Function Approximation for Risk Sensitive Cost Markov Decision Processes

    Authors: Soumyajit Guin, Vivek S. Borkar, Shalabh Bhatnagar

    Abstract: In this paper, we consider the risk-sensitive cost criterion with exponentiated costs for Markov decision processes and develop a model-free policy gradient algorithm in this setting. Unlike additive cost criteria such as average or discounted cost, the risk-sensitive cost criterion is less studied due to the complexity resulting from the multiplicative structure of the resulting Bellman equation.… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  7. A Layered Swarm Optimization Method for Fitting Battery Thermal Runaway Models to Accelerating Rate Calorimetry Data

    Authors: Saakaar Bhatnagar, Andrew Comerford, Zelu Xu, Simone Reitano, Luigi Scrimieri, Luca Giuliano, Araz Banaeizadeh

    Abstract: Thermal runaway in lithium-ion batteries is a critical safety concern for the battery industry due to its potential to cause uncontrolled temperature rises and subsequent fires that can engulf the battery pack and its surroundings. Modeling and simulation offer cost-effective tools for designing strategies to mitigate thermal runaway. Accurately simulating the chemical kinetics of thermal runaway,… ▽ More

    Submitted 1 April, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

  8. arXiv:2411.15193  [pdf, other

    cs.CV cs.AI

    Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting

    Authors: Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar

    Abstract: We introduce a training-free method for feature field rendering in Gaussian splatting. Our approach back-projects 2D features into pre-trained 3D Gaussians, using a weighted sum based on each Gaussian's influence in the final rendering. While most training-based feature field rendering methods excel at 2D segmentation but perform poorly at 3D segmentation without post-processing, our method achiev… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  9. arXiv:2409.11681  [pdf, other

    cs.CV cs.RO

    Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

    Authors: Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar

    Abstract: 3D Gaussian Splatting has emerged as a powerful 3D scene representation technique, capturing fine details with high efficiency. In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentati… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Preprint, Under review for ICRA 2025

  10. arXiv:2409.08381  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations

    Authors: Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja

    Abstract: Vision-language models (VLMs) like CLIP have been adapted for Multi-Label Recognition (MLR) with partial annotations by leveraging prompt-learning, where positive and negative prompts are learned for each class to associate their embeddings with class presence or absence in the shared vision-text feature space. While this approach improves MLR performance by relying on VLM priors, we hypothesize t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  11. Chemical Reaction Neural Networks for Fitting Accelerating Rate Calorimetry Data

    Authors: Saakaar Bhatnagar, Andrew Comerford, Zelu Xu, Davide Berti Polato, Araz Banaeizadeh, Alessandro Ferraris

    Abstract: As the demand for lithium-ion batteries rapidly increases there is a need to design these cells in a safe manner to mitigate thermal runaway. Thermal runaway in batteries leads to an uncontrollable temperature rise and potentially fires, which is a major safety concern. Typically, when modelling the chemical kinetics of thermal runaway calorimetry data ( e.g. Accelerating Rate Calorimetry (ARC)) i… ▽ More

    Submitted 3 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2408.07272  [pdf, other

    cs.AI cs.HC

    Abstract Operations Research Modeling Using Natural Language Inputs

    Authors: Junxuan Li, Ryan Wickman, Sahil Bhatnagar, Raj Kumar Maity, Arko Mukherjee

    Abstract: Operations research (OR) uses mathematical models to enhance decision-making, but developing these models requires expert knowledge and can be time-consuming. Automated mathematical programming (AMP) has emerged to simplify this process, but existing systems have limitations. This paper introduces a novel methodology that uses recent advances in Large Language Model (LLM) to create and edit OR sol… ▽ More

    Submitted 28 January, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

  13. arXiv:2405.18560  [pdf, other

    cs.CV cs.AI cs.IR cs.LG eess.IV

    Potential Field Based Deep Metric Learning

    Authors: Shubhang Bhatnagar, Narendra Ahuja

    Abstract: Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model that instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain t… ▽ More

    Submitted 19 April, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2025

  14. arXiv:2405.12167  [pdf, other

    cs.CY

    Open-Source Assessments of AI Capabilities: The Proliferation of AI Analysis Tools, Replicating Competitor Models, and the Zhousidun Dataset

    Authors: Ritwik Gupta, Leah Walker, Eli Glickman, Raine Koizumi, Sarthak Bhatnagar, Andrew W. Reddie

    Abstract: The integration of artificial intelligence (AI) into military capabilities has become a norm for major military power across the globe. Understanding how these AI models operate is essential for maintaining strategic advantages and ensuring security. This paper demonstrates an open-source methodology for analyzing military AI models through a detailed examination of the Zhousidun dataset, a Chines… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  15. arXiv:2405.06621  [pdf, other

    cs.IT

    On Streaming Codes for Simultaneously Correcting Burst and Random Erasures

    Authors: Shobhit Bhatnagar, Biswadip Chakraborty, P. Vijay Kumar

    Abstract: Streaming codes are packet-level codes that recover dropped packets within a strict decoding-delay constraint. We study streaming codes over a sliding-window (SW) channel model which admits only those erasure patterns which allow either a single burst erasure of $\le b$ packets along with $\le e$ random packet erasures, or else, $\le a$ random packet erasures, in any sliding-window of $w$ time slo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  16. arXiv:2405.06606  [pdf, other

    cs.IT

    On Streaming Codes for Burst and Random Errors

    Authors: Shobhit Bhatnagar, P. Vijay Kumar

    Abstract: Streaming codes (SCs) are packet-level codes that recover erased packets within a strict decoding-delay deadline. Streaming codes for various packet erasure channel models such as sliding-window (SW) channel models that admit random or burst erasures in any SW of a fixed length have been studied in the literature, and the optimal rate as well as rate-optimal code constructions of SCs over such cha… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  17. arXiv:2404.16193  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Improving Multi-label Recognition using Class Co-Occurrence Probabilities

    Authors: Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja

    Abstract: Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such c… ▽ More

    Submitted 19 September, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted to ICPR 2024, CVPR workshops 2024

  18. arXiv:2403.14977  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Piecewise-Linear Manifolds for Deep Metric Learning

    Authors: Shubhang Bhatnagar, Narendra Ahuja

    Abstract: Unsupervised deep metric learning (UDML) focuses on learning a semantic representation space using only unlabeled data. This challenging problem requires accurately estimating the similarity between data points, which is used to supervise a deep network. For this purpose, we propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted at CPAL 2024 (Oral)

  19. arXiv:2402.01371  [pdf, other

    cs.LG

    Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation

    Authors: Prashansa Panda, Shalabh Bhatnagar

    Abstract: Several recent works have focused on carrying out non-asymptotic convergence analyses for AC algorithms. Recently, a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the timescales of the actor and the critic are reversed and only asymptotic convergence shown. In our work, we present the first two-timescale critic-actor algorit… ▽ More

    Submitted 16 December, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  20. Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

    Authors: Saakaar Bhatnagar

    Abstract: Continuous Time Echo State Networks (CTESNs) are a promising yet under-explored surrogate modeling technique for dynamical systems, particularly those governed by stiff Ordinary Differential Equations (ODEs). A key determinant of the generalization accuracy of a CTESN surrogate is the method of projecting the reservoir state to the output. This paper shows that of the two common projection methods… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  21. arXiv:2311.11789  [pdf, other

    cs.LG cs.MA math.OC

    Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

    Authors: Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

    Abstract: In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents, incurring huge computational costs. Thus, recent works… ▽ More

    Submitted 29 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  22. arXiv:2310.16363  [pdf, other

    cs.LG

    Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

    Authors: Prashansa Panda, Shalabh Bhatnagar

    Abstract: Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algor… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  23. arXiv:2310.05000  [pdf, ps, other

    cs.LG cs.AI eess.SY math.OC

    The Reinforce Policy Gradient Algorithm Revisited

    Authors: Shalabh Bhatnagar

    Abstract: We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm.… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  24. Physics Informed Neural Networks for Modeling of 3D Flow-Thermal Problems with Sparse Domain Data

    Authors: Saakaar Bhatnagar, Andrew Comerford, Araz Banaeizadeh

    Abstract: Successfully training Physics Informed Neural Networks (PINNs) for highly nonlinear PDEs on complex 3D domains remains a challenging task. In this paper, PINNs are employed to solve the 3D incompressible Navier-Stokes (NS) equations at moderate to high Reynolds numbers for complex geometries. The presented method utilizes very sparsely distributed solution data in the domain. A detailed investigat… ▽ More

    Submitted 3 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  25. arXiv:2308.04643  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Long-Distance Gesture Recognition using Dynamic Neural Networks

    Authors: Shubhang Bhatnagar, Sharath Gopal, Narendra Ahuja, Liu Ren

    Abstract: Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a dr… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 1307-1312

  26. arXiv:2305.12239  [pdf, other

    cs.LG cs.AI

    Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

    Authors: Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar

    Abstract: The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy… ▽ More

    Submitted 19 July, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted at ICML 2023

  27. arXiv:2305.12125  [pdf, other

    cs.LG cs.AI

    A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

    Abstract: We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard stochastic gradient descent and the gradient clipping method. The output layer is updated using clipped gradients, the rest of the neural network is updated using standard gradients. Updating the outpu… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: 30 pages, 12 figures

    MSC Class: 90B05; 90C40; 90C90

  28. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  29. arXiv:2303.07068  [pdf, other

    cs.LG

    n-Step Temporal Difference Learning with Optimal n

    Authors: Lakshmi Mandal, Shalabh Bhatnagar

    Abstract: We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. Our objective function for the optimization problem is the average root mean squared error (RMSE). We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure. Whereas… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

  30. Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions

    Authors: Jesse Islam, Maxime Turgeon, Robert Sladek, Sahir Bhatnagar

    Abstract: In the context of survival analysis, data-driven neural network-based methods have been developed to model complex covariate effects. While these methods may provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines th… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

  31. arXiv:2212.10477  [pdf, ps, other

    cs.LG math.ST stat.ML

    Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

    Authors: Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

    Abstract: We present in this paper a family of generalized simultaneous perturbation-based gradient search (GSPGS) estimators that use noisy function measurements. The number of function measurements required by each estimator is guided by the desired level of accuracy. We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present t… ▽ More

    Submitted 12 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: The material in this paper was presented in part at the Conference on Information Sciences and Systems (CISS) in March 2023

  32. arXiv:2211.09174  [pdf, other

    cs.LG cs.AI

    CASPR: Customer Activity Sequence-based Prediction and Representation

    Authors: Pin-Jung Chen, Sahil Bhatnagar, Sagar Goyal, Damian Konrad Kowalczyk, Mayank Shrivastava

    Abstract: Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning pr… ▽ More

    Submitted 28 November, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Presented at the Table Representation Learning Workshop, NeurIPS 2022, New Orleans. Authors listed in random order

  33. arXiv:2210.07573  [pdf, other

    cs.LG

    Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

    Authors: Ashish Kumar Jayant, Shalabh Bhatnagar

    Abstract: During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem has been recently well stud… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Proceedings of NeurIPS 2022

  34. A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

    Authors: Soumyajit Guin, Shalabh Bhatnagar

    Abstract: The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general. Another setting that has become popular in recent times is of Constrained Reinforcement Learning, wher… ▽ More

    Submitted 20 March, 2025; v1 submitted 10 October, 2022; originally announced October 2022.

  35. Actor-Critic or Critic-Actor? A Tale of Two Time Scales

    Authors: Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

    Abstract: We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

  36. An Agent-Based Fleet Management Model for First- and Last-Mile Services

    Authors: Saumya Bhatnagar, Tarun Rambha, Gitakrishnan Ramadurai

    Abstract: With the growth of cars and car-sharing applications, commuters in many cities, particularly developing countries, are shifting away from public transport. These shifts have affected two key stakeholders: transit operators and first- and last-mile (FLM) services. Although most cities continue to invest heavily in bus and metro projects to make public transit attractive, ridership in these systems… ▽ More

    Submitted 4 December, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

  37. arXiv:2208.00290  [pdf, ps, other

    math.OC cs.LG

    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

    Authors: Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of… ▽ More

    Submitted 30 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

  38. arXiv:2201.00286  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Reinforcement Learning for Task Specifications with Action-Constraints

    Authors: Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar

    Abstract: In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

  39. arXiv:2112.02999  [pdf, other

    cs.RO

    Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning

    Authors: Utkarsh A. Mishra, Soumya R. Samineni, Prakhar Goel, Chandravaran Kunjeti, Himanshu Lodha, Aman Singh, Aditya Sagi, Shalabh Bhatnagar, Shishir Kolathaya

    Abstract: Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two… ▽ More

    Submitted 4 November, 2021; originally announced December 2021.

    Comments: 8 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2110.12239

  40. arXiv:2111.11768  [pdf, other

    cs.LG

    Schedule Based Temporal Difference Algorithms

    Authors: Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

    Abstract: Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($λ$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step returns in TD($λ$), controlled by the parameter $λ$, decrease exponentially with increasing $n$. In this paper, we present a $λ$-schedule procedure that generalizes the… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  41. arXiv:2111.11004  [pdf, other

    cs.LG

    Gradient Temporal Difference with Momentum: Stability and Convergence

    Authors: Rohan Deb, Shalabh Bhatnagar

    Abstract: Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of these algorithms asymptotically. In doing so, we… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  42. arXiv:2110.15093  [pdf, other

    cs.LG cs.AI

    Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids

    Authors: Vivek VP, Dr. Shalabh Bhatnagar

    Abstract: Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of i… ▽ More

    Submitted 6 August, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

  43. arXiv:2110.10969  [pdf, other

    cs.LG cs.CV cs.NE

    Memory Efficient Adaptive Attention For Multiple Domain Learning

    Authors: Himanshu Pradeep Aswani, Abhiraj Sunil Kanse, Shubhang Bhatnagar, Amit Sethi

    Abstract: Training CNNs from scratch on new domains typically demands large numbers of labeled images and computations, which is not suitable for low-power hardware. One way to reduce these requirements is to modularize the CNN architecture and freeze the weights of the heavier modules, that is, the lower layers after pre-training. Recent studies have proposed alternative modular architectures and schemes t… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: 13 pages, 3 figures, 4 graphs, 3 tables

  44. arXiv:2110.10017  [pdf, other

    cs.LG cs.AI

    Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

    Authors: Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K. J., Shalabh Bhatnagar

    Abstract: Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is ve… ▽ More

    Submitted 15 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: This paper has been accepted for presentation at the IJCNN at IEEE WCCI 2022 and for publication in the conference proceedings published by IEEE

  45. arXiv:2102.10165  [pdf, other

    cs.IT

    Analyzing Cross Validation In Compressed Sensing With Mixed Gaussian And Impulse Measurement Noise With L1 Errors

    Authors: Chinmay Gurjarpadhye, Shubhang Bhatnagar, Ajit Rajwade

    Abstract: Compressed sensing (CS) involves sampling signals at rates less than their Nyquist rates and attempting to reconstruct them after sample acquisition. Most such algorithms have parameters, for example the regularization parameter in LASSO, which need to be chosen carefully for optimal performance. These parameters can be chosen based on assumptions on the noise level or signal sparsity, but this kn… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  46. arXiv:2101.02349  [pdf, other

    cs.AI cs.MA

    Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

    Authors: P. Parnika, Raghuram Bharadwaj Diddigi, Sai Koti Reddy Danda, Shalabh Bhatnagar

    Abstract: In this work, we consider the problem of computing optimal actions for Reinforcement Learning (RL) agents in a co-operative setting, where the objective is to optimize a common goal. However, in many real-life applications, in addition to optimizing the goal, the agents are required to satisfy certain constraints specified on their actions. Under this setting, the objective of the agents is to not… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  47. arXiv:2010.16342  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

    Authors: Kartik Paigwar, Lokesh Krishna, Sashank Tirumala, Naman Khetan, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

    Abstract: In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtain… ▽ More

    Submitted 10 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: Accepted in 4th Conference on Robot Learning 2020, MIT, USA

  48. arXiv:2010.15947  [pdf, other

    cs.CV cs.LG

    PAL : Pretext-based Active Learning

    Authors: Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi

    Abstract: The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks… ▽ More

    Submitted 28 March, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  49. arXiv:2010.06142  [pdf, other

    cs.LG

    Hindsight Experience Replay with Kronecker Product Approximate Curvature

    Authors: Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

    Abstract: Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. Ho… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1708.05144 by other authors

  50. arXiv:2009.00821  [pdf, other

    eess.SY cs.AI

    A reinforcement learning approach to hybrid control design

    Authors: Meet Gandhi, Atreyee Kundu, Shalabh Bhatnagar

    Abstract: In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP). This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal co… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: 9 pages