Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

Espositi, Federico; Bonarini, Andrea

doi:10.1007/978-3-030-64583-0_25

Federico Espositi¹⁶ &
Andrea Bonarini¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1829 Accesses

Abstract

In Deep Reinforcement Learning (DRL), agents learn by sampling transitions from a batch of stored data called Experience Replay. In most DRL algorithms, the Experience Replay is filled by experiences gathered by the learning agent itself. However, agents that are trained completely Off-Policy, based on experiences gathered by behaviors that are completely decoupled from their own, cannot learn to improve their own policies. In general, the more algorithms train agents Off-Policy, the more they become prone to divergence. The main contribution of this research is the proposal of a novel learning framework called Policy Feedback, used both as a tool to leverage offline-collected expert experiences, and also as a general framework to improve the understanding of the issues behind Off-Policy Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Federated Offline Reinforcement Learning for Autonomous Systems

Integrating Policy Reuse with Learning from Demonstrations for Knowledge Transfer in Deep Reinforcement Learning

A General Unbiased Training Framework for Deep Reinforcement Learning

References

Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016). arXiv: 1606.01540. http://arxiv.org/abs/1606.01540
de Bruin, T., et al.: Improved deep reinforcement learning for robotics through distribution-based experience retention. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3947–3952 (2016)
Google Scholar
Fujimoto, S., Meger, D., Precup, D.: O-policy deep reinforcement learning without exploration. CoRR abs/1812.02900 (2018). arXiv: 1812.02900. http://arxiv.org/abs/1812.02900
van Hasselt, H., et al.: Deep reinforcement learning and the deadly triad. CoRR abs/1812.02648 (2018). arXiv: 1812.02648. http://arxiv.org/abs/1812.02648
Kumar, A., et al. Stabilizing O-policy Q-learning via bootstrapping error reduction (2019). arXiv: 1906.00949 [cs.LG]
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). arXiv: 1509.02971 [cs.LG]
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–33 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar

Download references

Author information

Authors and Affiliations

AI and Robotics Lab, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34\5, 20133, Milano, MI, Italy
Federico Espositi & Andrea Bonarini

Authors

Federico Espositi
View author publications
Search author on:PubMed Google Scholar
Andrea Bonarini
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Federico Espositi .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Giorgio Jansen
Almawave, Rome, Italy
Vincenzo Sciacca
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Espositi, F., Bonarini, A. (2020). Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-64583-0_25
Published: 08 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics