Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

Panda, Prashansa; Bhatnagar, Shalabh

Computer Science > Machine Learning

arXiv:2402.01371v1 (cs)

[Submitted on 2 Feb 2024 (this version), latest version 16 Dec 2024 (v3)]

Title:Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

Authors:Prashansa Panda, Shalabh Bhatnagar

View PDF

Abstract:In recent years, there has been a lot of research work activity focused on carrying out asymptotic and non-asymptotic convergence analyses for two-timescale actor critic algorithms where the actor updates are performed on a timescale that is slower than that of the critic. In a recent work, the critic-actor algorithm has been presented for the infinite horizon discounted cost setting in the look-up table case where the timescales of the actor and the critic are reversed and asymptotic convergence analysis has been presented. In our work, we present the first critic-actor algorithm with function approximation and in the long-run average reward setting and present the first finite-time (non-asymptotic) analysis of such a scheme. We obtain optimal learning rates and prove that our algorithm achieves a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.08})$ for the mean squared error of the critic to be upper bounded by $\epsilon$ which is better than the one obtained for actor-critic in a similar setting. We also show the results of numerical experiments on three benchmark settings and observe that the critic-actor algorithm competes well with the actor-critic algorithm.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.01371 [cs.LG]
	(or arXiv:2402.01371v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.01371

Submission history

From: Prashansa Panda [view email]
[v1] Fri, 2 Feb 2024 12:48:49 UTC (233 KB)
[v2] Fri, 24 May 2024 06:57:17 UTC (413 KB)
[v3] Mon, 16 Dec 2024 16:17:46 UTC (2,987 KB)

Computer Science > Machine Learning

Title:Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators