Experimental quantum speed-up in reinforcement learning agents

Saggio, V.; Asenbeck, B. E.; Hamann, A.; Strömberg, T.; Schiansky, P.; Dunjko, V.; Friis, N.; Harris, N. C.; Hochberg, M.; Englund, D.; Wölk, S.; Briegel, H. J.; Walther, P.

doi:10.1038/s41586-021-03242-7

Article
Published: 10 March 2021

Experimental quantum speed-up in reinforcement learning agents

Nature volume 591, pages 229–233 (2021)Cite this article

20k Accesses
160 Citations
207 Altmetric
Metrics details

Subjects

Abstract

As the field of artificial intelligence advances, the demand for algorithms that can learn quickly and efficiently increases. An important paradigm within artificial intelligence is reinforcement learning¹, where decision-making entities called agents interact with environments and learn by updating their behaviour on the basis of the obtained feedback. The crucial question for practical applications is how fast agents learn². Although various studies have made use of quantum mechanics to speed up the agent’s decision-making process^3,4, a reduction in learning time has not yet been demonstrated. Here we present a reinforcement learning experiment in which the learning process of an agent is sped up by using a quantum communication channel with the environment. We further show that combining this scenario with classical communication enables the evaluation of this improvement and allows optimal control of the learning progress. We implement this learning protocol on a compact and fully tunable integrated nanophotonic processor. The device interfaces with telecommunication-wavelength photons and features a fast active-feedback mechanism, demonstrating the agent’s systematic quantum advantage in a setup that could readily be integrated within future large-scale quantum communication networks.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Schematic of a learning agent.**

**Fig. 4: Behaviour of the average reward η for different learning strategies.**

Realizing a deep reinforcement learning agent for real-time quantum feedback

Article Open access 06 November 2023

Entanglement-induced provable and robust quantum learning advantages

Article Open access 29 July 2025

Machine learning enhanced evaluation of semiconductor quantum dots

Article Open access 20 February 2024

Data availability

All the datasets used in the current work are available on Zenodo at https://doi.org/10.5281/zenodo.4327211.

References

Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).
Article ADS MathSciNet Google Scholar
Paparo, G. D., Dunjiko, V., Makmal, A., Martin-Delgrado, M. A. & Briegel, H. J. Quantum speedup for active learning agents. Phys. Rev. X4, 031002 (2014).
Google Scholar
Sriarunothai, T. et al. Speeding-up the decision making of a learning agent using an ion trap quantum processor. Quantum Sci. Technol. 4, 015014 (2019).
Article ADS Google Scholar
Johannink, T. et al. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA) 6023–6029 (IEEE, 2019).
Tjandra, A., Sakti, S. & Nakamura, S. Sequence-to-aequence ASR optimization via reinforcement learning. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5829–5833 (IEEE, 2018).
Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716–1720 (2018).
Article CAS Google Scholar
Thakur, C. S. et al. Large-scale neuromorphic spiking array processors: a quest to mimic the brain. Front. Neurosci. 12, 891 (2018).
Article Google Scholar
Steinbrecher, G. R., Olson, J. P., Englund, D. & Carolan, J. Quantum optical neural networks. npj Quantum Inf. 5, 60 (2019).
Article ADS Google Scholar
Silver, D. et al. Mastering the game of Go without human knowledge. Nature550, 354–359 (2017).
Article ADS CAS Google Scholar
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature574, 505–510 (2019).
Article ADS CAS Google Scholar
Dong. D., Chen, C., Li, H. & Tarn, T.-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B38, 1207–1220 (2008).
Article Google Scholar
Dunjko, V. & Briegel, H. J. Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep. Prog. Phys. 81, 074001 (2018).
Article ADS MathSciNet Google Scholar
Baireuther, P., O’Brien, T. E., Tarasinski, B. & Beenakker, C. W. J. Machine-learning-assisted correction of correlated qubit errors in a topological code. Quantum2, 48 (2018).
Article Google Scholar
Breuckmann, N. P. & Ni, X. Scalable neural network decoders for higher dimensional quantum codes. Quantum2, 68–92 (2018).
Article Google Scholar
Chamberland, C. & Ronagh, P. Deep neural decoders for near term fault-tolerant experiments. Quant. Sci. Technol. 3, 044002 (2018).
Article ADS Google Scholar
Fösel, T., Tighineanu, P., Weiss, T. & Marquardt, F. Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018).
Google Scholar
Poulsen Nautrup, H., Delfosse, N., Dunjko, V., Briegel, H. J. & Friis, N. Optimizing quantum error correction codes with reinforcement learning. Quantum3, 215 (2019).
Article Google Scholar
Yu, S. et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2, 1800074 (2019).
Article Google Scholar
Krenn, M., Malik, M., Fickler, R., Lapkiewicz, R. & Zeilinger, A. Automated search for new quantum experiments. Phys. Rev. Lett. 116, 090405 (2016).
Article ADS Google Scholar
Melnikov, A. A. et al. Active learning machine learns to create new quantum experiments. Proc. Natl Acad. Sci. USA115, 1221–1226 (2018).
Article ADS CAS Google Scholar
Dunjko, V., Friis, N. & Briegel, H. J. Quantum-enhanced deliberation of learning agents using trapped ions. New J. Phys. 17, 023006 (2015).
Article ADS Google Scholar
Jerbi, S., Poulsen Nautrup, H., Trenkwalder, L. M., Briegel, H. J. & Dunjko, V. A framework for deep energy-based reinforcement learning with quantum speed-up. Preprint at https://arxiv.org/abs/1910.12760 (2019).
Kimble, H. J. The quantum internet. Nature453, 1023–1030 (2008).
Article ADS CAS Google Scholar
Cacciapuoti, A. S. et al. Quantum internet: networking challenges in distributed quantum computing. IEEE Netw. 34, 137–143 (2020).
Article Google Scholar
Briegel, H. J. & De las Cuevas, G. Projective simulation for artificial intelligence. Sci. Rep. 2, 400 (2012).
Article Google Scholar
Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79, 325–328 (1997).
Article ADS CAS Google Scholar
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge Univ. Press, 2000).
Flamini, F. et al. Photonic architecture for reinforcement learning. New. J. Phys. 22, 045002 (2020).
Article ADS MathSciNet Google Scholar
Harris, N. C. et al. Quantum transport simulations in a programmable nanophotonic processor. Nat. Photon. 11, 447–452 (2017).
Article ADS CAS Google Scholar
Boyer, M., Brassard, G., Hoyer, P. & Tappa, A. Tight bounds on quantum searching. Fortschr. Phys. 46, 493–505 (1998).
Article Google Scholar
Senellart, P., Solomon, G. & White, A. High-performance semiconductor quantum-dot single-photon sources. Nat. Nanotechnol. 12, 1026–1039 (2017).
Article ADS CAS Google Scholar
Wan, N. H. et al. Large-scale integration of artificial atoms in hybrid photonic circuits. Nature583, 226–231 (2020).
Article ADS CAS Google Scholar
Northup, T. E. & Blatt, R. Quantum information transfer using photons. Nat. Photon. 8, 356–363 (2014).
Article ADS CAS Google Scholar
Denil, M. et al. Learning to perform physics experiments via deep reinforcement learning. Proc. Int. Conf. on Learning Representations (2017).
Bukov, M. et al. Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018).
CAS Google Scholar
Poulsen Nautrup, H. et al. Operationally meaningful representations of physical systems in neural networks. Preprint at https://arxiv.org/abs/2001.00593 (2020).
Yoder, T. J., Low, G. H. & Chuang, I. L. Fixed-point quantum search with an optimal number of queries. Phys. Rev. Lett. 113, 210501 (2014).
Article ADS Google Scholar
Kim, T., Fiorentino, M. & Wong, F. N. C. Phase-stable source of polarization-entangled photons using a polarization Sagnac interferometer. Phys. Rev. A73, 012316 (2006).
Article ADS Google Scholar
Saggio, V. et al. Experimental few-copy multipartite entanglement detection. Nat. Phys. 15, 935–940 (2019).
Article CAS Google Scholar
Marsili, F. et al. Detecting single infrared photons with 93% system efficiency. Nat. Photon. 7, 210–214 (2013).
Article ADS CAS Google Scholar

Download references

Acknowledgements

We thank L. A. Rozema, I. Alonso Calafell and P. Jenke for help with the detectors. A.H. acknowledges support from the Austrian Science Fund (FWF) through the project P 30937-N27. V.D. acknowledges support from the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). N.F. acknowledges support from the Austrian Science Fund (FWF) through the project P 31339-N27. H.J.B. acknowledges support from the Austrian Science Fund (FWF) through SFB BeyondC F7102, the Ministerium für Wissenschaft, Forschung, und Kunst Baden-Württemberg (Az. 33-7533-30-10/41/1) and the Volkswagen Foundation (Az. 97721). P.W. acknowledges support from the research platform TURIS, the European Commission through ErBeStA (no. 800942), HiPhoP (no. 731473), UNIQORN (no. 820474), EPIQUS (no. 899368), and AppQInfo (no. 956071), from the Austrian Science Fund (FWF) through CoQuS (W1210-N25), BeyondC (F 7113) and Research Group (FG 5), and Red Bull GmbH. The MIT portion of the work was supported in part by AFOSR award FA9550-16-1-0391 and NTT Research.

Author information

Authors and Affiliations

University of Vienna, Faculty of Physics, Vienna Center for Quantum Science and Technology (VCQ), Vienna, Austria
V. Saggio, B. E. Asenbeck, T. Strömberg, P. Schiansky & P. Walther
Institut für Theoretische Physik, Universität Innsbruck, Innsbruck, Austria
A. Hamann, S. Wölk & H. J. Briegel
LIACS, Leiden University, Leiden, The Netherlands
V. Dunjko
Institute for Quantum Optics and Quantum Information - IQOQI Vienna, Austrian Academy of Sciences, Vienna, Austria
N. Friis
Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
N. C. Harris & D. Englund
Nokia Corporation, New York, NY, USA
M. Hochberg
Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR), Institut für Quantentechnologien, Ulm, Germany
S. Wölk
Fachbereich Philosophie, Universität Konstanz, Konstanz, Germany
H. J. Briegel
Christian Doppler Laboratory for Photonic Quantum Computer, Faculty of Physics, University of Vienna, Vienna, Austria
P. Walther

Authors

V. Saggio
View author publications
Search author on:PubMed Google Scholar
B. E. Asenbeck
View author publications
Search author on:PubMed Google Scholar
A. Hamann
View author publications
Search author on:PubMed Google Scholar
T. Strömberg
View author publications
Search author on:PubMed Google Scholar
P. Schiansky
View author publications
Search author on:PubMed Google Scholar
V. Dunjko
View author publications
Search author on:PubMed Google Scholar
N. Friis
View author publications
Search author on:PubMed Google Scholar
N. C. Harris
View author publications
Search author on:PubMed Google Scholar
M. Hochberg
View author publications
Search author on:PubMed Google Scholar
D. Englund
View author publications
Search author on:PubMed Google Scholar
S. Wölk
View author publications
Search author on:PubMed Google Scholar
H. J. Briegel
View author publications
Search author on:PubMed Google Scholar
P. Walther
View author publications
Search author on:PubMed Google Scholar

Contributions

V.S. and B.E.A. implemented the experiment and performed data analysis. A.H., V.D., N.F., S.W. and H.J.B. developed the theoretical idea. T.S. and P.S. provided help with the experimental implementation. N.C.H., M.H. and D.E. designed the nanophotonic processor. V.S., S.W. and P.W. supervised the project. All the authors contributed to writing the paper.

Corresponding authors

Correspondence to V. Saggio or P. Walther.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature thanks Vojtěch Havlíček, Lucas Lamata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saggio, V., Asenbeck, B.E., Hamann, A. et al. Experimental quantum speed-up in reinforcement learning agents. Nature 591, 229–233 (2021). https://doi.org/10.1038/s41586-021-03242-7

Download citation

Received: 12 August 2020
Accepted: 15 January 2021
Published: 10 March 2021
Version of record: 10 March 2021
Issue date: 11 March 2021
DOI: https://doi.org/10.1038/s41586-021-03242-7

This article is cited by

Experimental quantum-enhanced kernel-based machine learning on a photonic processor
- Zhenghao Yin
- Iris Agresti
- Philip Walther
Nature Photonics (2025)
Quantum deep learning in neuroinformatics: a systematic review
- Nabil Anan Orka
- Md. Abdul Awal
- Mohammad Ali Moni
Artificial Intelligence Review (2025)
Hybrid quantum-classical reinforcement learning in latent observation spaces
- Dániel T. R. Nagy
- Csaba Czabán
- Zoltán Zimborás
Quantum Machine Intelligence (2025)
A hybrid learning agent for episodic learning tasks with unknown target distance
- Oliver Sefrin
- Sabine Wölk
Quantum Machine Intelligence (2025)
Quantum machine learning: a systematic categorization based on learning paradigms, NISQ suitability, and fault tolerance
- Bisma Majid
- Shabir Ahmed Sofi
- Zamrooda Jabeen
Quantum Machine Intelligence (2025)

Comments

Commenting on this article is now closed.

Xinhang Shen 12 March 2021, 02:08

Please be aware that quantum mechanics is wrong because it does not take the effect of aether into consideration while aether plays critical roles in all physical processes in the visible space of the universe. As quantum mechanics is wrong, all its ridiculous conclusions including particle entanglement are wrong, and thus quantum communication and quantum computing are hoaxes.
The existences of a fluid aether as the medium of light and other electromagnetic phenomena is a direct conclusion from the disproof of special relativity which uses Lorentz Transformation to redefine time and space but the newly defined time is no longer the physical time measured with physical clocks.
We know time is a concept abstracted from the status changes of physical processes such as the change of the view angle of the sun, the increase of the height of a tree, the distance that a car has driven, the biological age of a person, the number of cycles of a clock, etc. All the changes of the statuses of physical processes are the products of time and changing rates. The effect of time can never be shown without the help of a status changing rate. Every physical clock records the number of cycles of a periodical process and uses this number to indirectly calculate the elapsed time. The number of cycles is the product of time and frequency (i.e. changing rate). In special relativity, when observed from a stationary frame, relativistic time of a moving frame does become shorter but the relativistic frequency of a clock on the moving frame becomes faster to make the product of relativistic time and relativistic frequency unchanged compared with that of the stationary clock. That is, clock time is still absolute and independent of reference frames in special relativity. Thus relativistic time is not the clock time i.e. our physical time but a fake time and relativistic kinetic time dilation won't be found on any physical clock or any other physical process. Based on such a fake time, special relativity is wrong.