Skill enhancement learning with knowledge distillation

Liu, Naijun; Sun, Fuchun; Fang, Bin; Liu, Huaping

doi:10.1007/s11432-023-4016-0

Skill enhancement learning with knowledge distillation

Research Paper
Published: 22 July 2024

Volume 67, article number 182203, (2024)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

260 Accesses
Explore all metrics

Abstract

Skill learning through reinforcement learning has significantly progressed in recent years. However, it often struggles to efficiently find optimal or near-optimal policies due to the inherent trial-and-error exploration in reinforcement learning. Although algorithms have been proposed to enhance skill learning efficacy, there is still much room for improvement in terms of skill learning performance and training stability. In this paper, we propose an algorithm called skill enhancement learning with knowledge distillation (SELKD), which integrates multiple actors and multiple critics for skill learning. SELKD employs knowledge distillation to establish a mutual learning mechanism among actors. To mitigate critic overestimation bias, we introduce a novel target value calculation method. We also perform theoretical analysis to ensure the convergence of SELKD. Finally, experiments are conducted on several continuous control tasks, illustrating the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSSD: multi-scale self-distillation for object detection

Article Open access 21 March 2024

Intrusion Detection System Using Ensemble Machine Learning for Digital Infrastructure

Visual emotion analysis using skill-based multi-teacher knowledge distillation

Article 21 February 2025

References

Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018
Google Scholar
Ibarz J, Tan J, Finn C, et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robotics Res, 2021, 40: 698–721
Article Google Scholar
Luo F-M, Xu T, Lai H, et al. A survey on model-based reinforcement learning. Sci China Inf Sci, 2024, 67: 121101
Article MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533
Article Google Scholar
Liu N J, Cai Y H, Lu T, et al. Real-sim-real transfer for real-world robot control policy learning with deep reinforcement learning. Appl Sci, 2020, 10: 1555
Article Google Scholar
Gu S X, Holly E, Lillicrap T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of IEEE International Conference on Robotics and Automation, 2017. 3389–3396
Haarnoja T, Pong V, Zhou A, et al. Composable deep reinforcement learning for robotic manipulation. In: Proceedings of IEEE International Conference on Robotics and Automation, 2018. 6244–6251
Levine S, Finn C, Darrell T, et al. End-to-end training of deep visuomotor policies. J Mach Learn Res, 2016, 17: 1334–1373
MathSciNet Google Scholar
Fazeli N, Oller M, Wu J, et al. See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot, 2019, 4: eaav3123
Article Google Scholar
Liu N J, Lu T, Cai Y H, et al. Manipulation skill learning on multi-step complex task based on explicit and implicit curriculum learning. Sci China Inf Sci, 2022, 65: 114201
Article MathSciNet Google Scholar
Ziebart B D. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Dissertation for Ph.D. Degree. Pittsburgh: Carnegie Mellon University, 2010
Google Scholar
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of International conference on machine learning, 2018. 1861–1870
Zhu Y, Wang Z, Merel J, et al. Reinforcement and imitation learning for diverse visuomotor skills. 2018. ArXiv:1802.09564
Hasselt H. Double Q-learning. In: Proceedings of Advances in Neural Information Processing Systems, 2010
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016
Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of International Conference on Machine Learning, 2018. 1587–1596
Lan Q, Pan Y, Fyshe A, et al. Maxmin Q-learning: controlling the estimation bias of Q-learning. In: Proceedings of International Conference on Learning Representations, 2020
Chen X Y, Wang C, Zhou Z J, et al. Randomized ensembled double Q-learning: learning fast without a model. In: Proceedings of International Conference on Learning Representations, 2021
Rusu A A, Colmenarejo S G, Gülçehre Ç, et al. Policy distillation. In: Proceedings of International Conference on Learning Representations, 2016. 1–13
Dillenbourg P. Collaborative Learning: Cognitive and Computational Approaches. New York: Elsevier Science, 1999
Google Scholar
Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of Machine Learning Proceedings, 1994. 157–163
Hadfield-Menell D, Russell S J, Abbeel P, et al. Cooperative inverse reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2016
Joshi T, Kodamana H, Kandath H, et al. TASAC: a twin-actor reinforcement learning framework with a stochastic policy with an application to batch process control. Control Eng Pract, 2023, 134: 105462
Article Google Scholar
Lai K H, Zha D, Li Y, et al. Dual policy distillation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2020. 3146–3152
Haarnoja T, Tang H, Abbeel P, et al. Reinforcement learning with deep energy-based policies. In: Proceedings of International Conference on Machine Learning, 2017. 1352–1361
Fox R, Pakman A, Tishby N. Taming the noise in reinforcement learning via soft updates. In: Proceedings of Conference on Uncertainty in Artificial Intelligence, 2016. 202–211
Nair A, McGrew B, Andrychowicz M, et al. Overcoming exploration in reinforcement learning with demonstrations. In: Proceedings of IEEE International Conference on Robotics and Automation, 2018. 6292–6299
Torabi F, Warnell G, Stone P. Behavioral cloning from observation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2018. 4950–4957
Popov I, Heess N, Lillicrap T, et al. Data-efficient deep reinforcement learning for dexterous manipulation. 2017. ArXiv:1704.03073
Kumar A, Gupta A, Levine S. DisCor: corrective feedback in reinforcement learning via distribution correction. In: Proceedings of Advances in Neural Information Processing Systems, 2020. 18560–18572
Czarnecki W M, Pascanu R, Osindero S, et al. Distilling policy distillation. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2019. 1331–1340
Zhao C, Hospedales T. Robust domain randomised reinforcement learning through peer-to-peer distillation. In: Proceedings of Asian Conference on Machine Learning, 2021. 1237–1252
Anschel O, Baram N, Shimkin N. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2017. 176–185
Agarwal R, Schuurmans D, Norouzi M. An optimistic perspective on offline reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2020. 104–114
Lee K, Laskin M, Srinivas A, et al. SUNRISE: a simple unified framework for ensemble learning in deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2021. 6131–6141
Wu Y, Chen X, Wang C, et al. Aggressive Q-learning with ensembles: achieving both high sample efficiency and high asymptotic performance. In: Proceedings of Advances in Neural Information Processing Systems, 2022
Yang Z, Ren K, Luo X, et al. Towards applicable reinforcement learning: improving the generalization and sample efficiency with policy ensemble. In: Proceedings of International Joint Conference on Artificial Intelligence, 2022
Li Q, Kumar A, Kostrikov I, et al. Efficient deep reinforcement learning requires regulating overfitting. In: Proceedings of International Conference on Learning Representations, 2022
Sheikh H, Frisbee K, Phielipp M. DNS: determinantal point process based neural network sampler for ensemble reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2022. 19731–19746
Huang Z, Zhou S, Zhuang B, et al. Learning to run with actor-critic ensemble. 2017. ArXiv:1712.08987
Wang H, Yu Y, Jiang Y. Review of the progress of communication-based multi-agent reinforcement learning (in Chinese). Sci Sin Inform, 2022, 52: 742–764
Article Google Scholar
Li J C, Wu F, Shi H B, et al. A collaboration of multi-agent model using an interactive interface. Inf Sci, 2022, 611: 349–363
Article Google Scholar
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2016
Singh S, Jaakkola T, Littman M L, et al. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learn, 2000, 38: 287–308
Article Google Scholar
Brockman G, Cheung V, Pettersson L, et al. OpenAI Gym. 2016. ArXiv:1606.01540
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015

Download references

Acknowledgements

This work was supported by “New Generation Artificial Intelligence” Key Field Research and Development Plan of Guangdong Province (Grant No. 2021B0101410002), National Science and Technology Major Project of the Ministry of Science and Technology of China (Grant No. 2018AAA0102900), and National Natural Science Foundation of China (Grant Nos. U22A2057, 62133013).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Naijun Liu, Fuchun Sun, Bin Fang & Huaping Liu

Authors

Naijun Liu
View author publications
Search author on:PubMed Google Scholar
Fuchun Sun
View author publications
Search author on:PubMed Google Scholar
Bin Fang
View author publications
Search author on:PubMed Google Scholar
Huaping Liu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Fuchun Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, N., Sun, F., Fang, B. et al. Skill enhancement learning with knowledge distillation. Sci. China Inf. Sci. 67, 182203 (2024). https://doi.org/10.1007/s11432-023-4016-0

Download citation

Received: 28 May 2023
Revised: 14 September 2023
Accepted: 04 December 2023
Published: 22 July 2024
Version of record: 22 July 2024
DOI: https://doi.org/10.1007/s11432-023-4016-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skill enhancement learning with knowledge distillation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MSSD: multi-scale self-distillation for object detection

Intrusion Detection System Using Ensemble Machine Learning for Digital Infrastructure

Visual emotion analysis using skill-based multi-teacher knowledge distillation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now