这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Skill enhancement learning with knowledge distillation

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Skill learning through reinforcement learning has significantly progressed in recent years. However, it often struggles to efficiently find optimal or near-optimal policies due to the inherent trial-and-error exploration in reinforcement learning. Although algorithms have been proposed to enhance skill learning efficacy, there is still much room for improvement in terms of skill learning performance and training stability. In this paper, we propose an algorithm called skill enhancement learning with knowledge distillation (SELKD), which integrates multiple actors and multiple critics for skill learning. SELKD employs knowledge distillation to establish a mutual learning mechanism among actors. To mitigate critic overestimation bias, we introduce a novel target value calculation method. We also perform theoretical analysis to ensure the convergence of SELKD. Finally, experiments are conducted on several continuous control tasks, illustrating the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018

    Google Scholar 

  2. Ibarz J, Tan J, Finn C, et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robotics Res, 2021, 40: 698–721

    Article  Google Scholar 

  3. Luo F-M, Xu T, Lai H, et al. A survey on model-based reinforcement learning. Sci China Inf Sci, 2024, 67: 121101

    Article  MathSciNet  Google Scholar 

  4. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533

    Article  Google Scholar 

  5. Liu N J, Cai Y H, Lu T, et al. Real-sim-real transfer for real-world robot control policy learning with deep reinforcement learning. Appl Sci, 2020, 10: 1555

    Article  Google Scholar 

  6. Gu S X, Holly E, Lillicrap T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of IEEE International Conference on Robotics and Automation, 2017. 3389–3396

  7. Haarnoja T, Pong V, Zhou A, et al. Composable deep reinforcement learning for robotic manipulation. In: Proceedings of IEEE International Conference on Robotics and Automation, 2018. 6244–6251

  8. Levine S, Finn C, Darrell T, et al. End-to-end training of deep visuomotor policies. J Mach Learn Res, 2016, 17: 1334–1373

    MathSciNet  Google Scholar 

  9. Fazeli N, Oller M, Wu J, et al. See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot, 2019, 4: eaav3123

    Article  Google Scholar 

  10. Liu N J, Lu T, Cai Y H, et al. Manipulation skill learning on multi-step complex task based on explicit and implicit curriculum learning. Sci China Inf Sci, 2022, 65: 114201

    Article  MathSciNet  Google Scholar 

  11. Ziebart B D. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Dissertation for Ph.D. Degree. Pittsburgh: Carnegie Mellon University, 2010

    Google Scholar 

  12. Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of International conference on machine learning, 2018. 1861–1870

  13. Zhu Y, Wang Z, Merel J, et al. Reinforcement and imitation learning for diverse visuomotor skills. 2018. ArXiv:1802.09564

  14. Hasselt H. Double Q-learning. In: Proceedings of Advances in Neural Information Processing Systems, 2010

  15. van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016

  16. Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of International Conference on Machine Learning, 2018. 1587–1596

  17. Lan Q, Pan Y, Fyshe A, et al. Maxmin Q-learning: controlling the estimation bias of Q-learning. In: Proceedings of International Conference on Learning Representations, 2020

  18. Chen X Y, Wang C, Zhou Z J, et al. Randomized ensembled double Q-learning: learning fast without a model. In: Proceedings of International Conference on Learning Representations, 2021

  19. Rusu A A, Colmenarejo S G, Gülçehre Ç, et al. Policy distillation. In: Proceedings of International Conference on Learning Representations, 2016. 1–13

  20. Dillenbourg P. Collaborative Learning: Cognitive and Computational Approaches. New York: Elsevier Science, 1999

    Google Scholar 

  21. Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of Machine Learning Proceedings, 1994. 157–163

  22. Hadfield-Menell D, Russell S J, Abbeel P, et al. Cooperative inverse reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2016

  23. Joshi T, Kodamana H, Kandath H, et al. TASAC: a twin-actor reinforcement learning framework with a stochastic policy with an application to batch process control. Control Eng Pract, 2023, 134: 105462

    Article  Google Scholar 

  24. Lai K H, Zha D, Li Y, et al. Dual policy distillation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2020. 3146–3152

  25. Haarnoja T, Tang H, Abbeel P, et al. Reinforcement learning with deep energy-based policies. In: Proceedings of International Conference on Machine Learning, 2017. 1352–1361

  26. Fox R, Pakman A, Tishby N. Taming the noise in reinforcement learning via soft updates. In: Proceedings of Conference on Uncertainty in Artificial Intelligence, 2016. 202–211

  27. Nair A, McGrew B, Andrychowicz M, et al. Overcoming exploration in reinforcement learning with demonstrations. In: Proceedings of IEEE International Conference on Robotics and Automation, 2018. 6292–6299

  28. Torabi F, Warnell G, Stone P. Behavioral cloning from observation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2018. 4950–4957

  29. Popov I, Heess N, Lillicrap T, et al. Data-efficient deep reinforcement learning for dexterous manipulation. 2017. ArXiv:1704.03073

  30. Kumar A, Gupta A, Levine S. DisCor: corrective feedback in reinforcement learning via distribution correction. In: Proceedings of Advances in Neural Information Processing Systems, 2020. 18560–18572

  31. Czarnecki W M, Pascanu R, Osindero S, et al. Distilling policy distillation. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2019. 1331–1340

  32. Zhao C, Hospedales T. Robust domain randomised reinforcement learning through peer-to-peer distillation. In: Proceedings of Asian Conference on Machine Learning, 2021. 1237–1252

  33. Anschel O, Baram N, Shimkin N. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2017. 176–185

  34. Agarwal R, Schuurmans D, Norouzi M. An optimistic perspective on offline reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2020. 104–114

  35. Lee K, Laskin M, Srinivas A, et al. SUNRISE: a simple unified framework for ensemble learning in deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2021. 6131–6141

  36. Wu Y, Chen X, Wang C, et al. Aggressive Q-learning with ensembles: achieving both high sample efficiency and high asymptotic performance. In: Proceedings of Advances in Neural Information Processing Systems, 2022

  37. Yang Z, Ren K, Luo X, et al. Towards applicable reinforcement learning: improving the generalization and sample efficiency with policy ensemble. In: Proceedings of International Joint Conference on Artificial Intelligence, 2022

  38. Li Q, Kumar A, Kostrikov I, et al. Efficient deep reinforcement learning requires regulating overfitting. In: Proceedings of International Conference on Learning Representations, 2022

  39. Sheikh H, Frisbee K, Phielipp M. DNS: determinantal point process based neural network sampler for ensemble reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2022. 19731–19746

  40. Huang Z, Zhou S, Zhuang B, et al. Learning to run with actor-critic ensemble. 2017. ArXiv:1712.08987

  41. Wang H, Yu Y, Jiang Y. Review of the progress of communication-based multi-agent reinforcement learning (in Chinese). Sci Sin Inform, 2022, 52: 742–764

    Article  Google Scholar 

  42. Li J C, Wu F, Shi H B, et al. A collaboration of multi-agent model using an interactive interface. Inf Sci, 2022, 611: 349–363

    Article  Google Scholar 

  43. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2016

  44. Singh S, Jaakkola T, Littman M L, et al. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learn, 2000, 38: 287–308

    Article  Google Scholar 

  45. Brockman G, Cheung V, Pettersson L, et al. OpenAI Gym. 2016. ArXiv:1606.01540

  46. Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015

Download references

Acknowledgements

This work was supported by “New Generation Artificial Intelligence” Key Field Research and Development Plan of Guangdong Province (Grant No. 2021B0101410002), National Science and Technology Major Project of the Ministry of Science and Technology of China (Grant No. 2018AAA0102900), and National Natural Science Foundation of China (Grant Nos. U22A2057, 62133013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuchun Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, N., Sun, F., Fang, B. et al. Skill enhancement learning with knowledge distillation. Sci. China Inf. Sci. 67, 182203 (2024). https://doi.org/10.1007/s11432-023-4016-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s11432-023-4016-0

Keywords