Abstract
In Deep Reinforcement Learning (DRL), agents learn by sampling transitions from a batch of stored data called Experience Replay. In most DRL algorithms, the Experience Replay is filled by experiences gathered by the learning agent itself. However, agents that are trained completely Off-Policy, based on experiences gathered by behaviors that are completely decoupled from their own, cannot learn to improve their own policies. In general, the more algorithms train agents Off-Policy, the more they become prone to divergence. The main contribution of this research is the proposal of a novel learning framework called Policy Feedback, used both as a tool to leverage offline-collected expert experiences, and also as a general framework to improve the understanding of the issues behind Off-Policy Learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016). arXiv: 1606.01540. http://arxiv.org/abs/1606.01540
de Bruin, T., et al.: Improved deep reinforcement learning for robotics through distribution-based experience retention. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3947–3952 (2016)
Fujimoto, S., Meger, D., Precup, D.: O-policy deep reinforcement learning without exploration. CoRR abs/1812.02900 (2018). arXiv: 1812.02900. http://arxiv.org/abs/1812.02900
van Hasselt, H., et al.: Deep reinforcement learning and the deadly triad. CoRR abs/1812.02648 (2018). arXiv: 1812.02648. http://arxiv.org/abs/1812.02648
Kumar, A., et al. Stabilizing O-policy Q-learning via bootstrapping error reduction (2019). arXiv: 1906.00949 [cs.LG]
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). arXiv: 1509.02971 [cs.LG]
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–33 (2015). https://doi.org/10.1038/nature14236
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Espositi, F., Bonarini, A. (2020). Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-64583-0_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)