+
Skip to main content

Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

  • 1829 Accesses

Abstract

In Deep Reinforcement Learning (DRL), agents learn by sampling transitions from a batch of stored data called Experience Replay. In most DRL algorithms, the Experience Replay is filled by experiences gathered by the learning agent itself. However, agents that are trained completely Off-Policy, based on experiences gathered by behaviors that are completely decoupled from their own, cannot learn to improve their own policies. In general, the more algorithms train agents Off-Policy, the more they become prone to divergence. The main contribution of this research is the proposal of a novel learning framework called Policy Feedback, used both as a tool to leverage offline-collected expert experiences, and also as a general framework to improve the understanding of the issues behind Off-Policy Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016). arXiv: 1606.01540. http://arxiv.org/abs/1606.01540

  2. de Bruin, T., et al.: Improved deep reinforcement learning for robotics through distribution-based experience retention. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3947–3952 (2016)

    Google Scholar 

  3. Fujimoto, S., Meger, D., Precup, D.: O-policy deep reinforcement learning without exploration. CoRR abs/1812.02900 (2018). arXiv: 1812.02900. http://arxiv.org/abs/1812.02900

  4. van Hasselt, H., et al.: Deep reinforcement learning and the deadly triad. CoRR abs/1812.02648 (2018). arXiv: 1812.02648. http://arxiv.org/abs/1812.02648

  5. Kumar, A., et al. Stabilizing O-policy Q-learning via bootstrapping error reduction (2019). arXiv: 1906.00949 [cs.LG]

  6. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). arXiv: 1509.02971 [cs.LG]

  7. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–33 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Espositi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Espositi, F., Bonarini, A. (2020). Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64583-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64582-3

  • Online ISBN: 978-3-030-64583-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载