SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Sikchi, Harshit; Chitnis, Rohan; Touati, Ahmed; Geramifard, Alborz; Zhang, Amy; Niekum, Scott

Computer Science > Machine Learning

arXiv:2311.02013 (cs)

[Submitted on 3 Nov 2023 (v1), last revised 29 Feb 2024 (this version, v2)]

Title:SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Authors:Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

View PDF HTML (experimental)

Abstract:Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for developing generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to GCRL based on supervised learning and contrastive learning are often suboptimal in the offline setting. An alternative perspective on GCRL optimizes for occupancy matching, but necessitates learning a discriminator, which subsequently serves as a pseudo-reward for downstream RL. Inaccuracies in the learned discriminator can cascade, negatively influencing the resulting policy. We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe. The key insight is combining the occupancy matching perspective of GCRL with a convex dual formulation to derive a learning objective that can better leverage suboptimal offline data. SMORe learns scores or unnormalized densities representing the importance of taking an action at a state for reaching a particular goal. SMORe is principled and our extensive experiments on the fully offline GCRL benchmark composed of robot manipulation and locomotion tasks, including high-dimensional observations, show that SMORe can outperform state-of-the-art baselines by a significant margin.

Comments:	Published at International Conference of Learning Representations (ICLR) 2024. 26 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2311.02013 [cs.LG]
	(or arXiv:2311.02013v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.02013

Submission history

From: Harshit Sikchi [view email]
[v1] Fri, 3 Nov 2023 16:19:33 UTC (1,442 KB)
[v2] Thu, 29 Feb 2024 03:47:12 UTC (3,410 KB)

Computer Science > Machine Learning

Title:SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators