Abstract
As the field of artificial intelligence advances, the demand for algorithms that can learn quickly and efficiently increases. An important paradigm within artificial intelligence is reinforcement learning1, where decision-making entities called agents interact with environments and learn by updating their behaviour on the basis of the obtained feedback. The crucial question for practical applications is how fast agents learn2. Although various studies have made use of quantum mechanics to speed up the agent’s decision-making process3,4, a reduction in learning time has not yet been demonstrated. Here we present a reinforcement learning experiment in which the learning process of an agent is sped up by using a quantum communication channel with the environment. We further show that combining this scenario with classical communication enables the evaluation of this improvement and allows optimal control of the learning progress. We implement this learning protocol on a compact and fully tunable integrated nanophotonic processor. The device interfaces with telecommunication-wavelength photons and features a fast active-feedback mechanism, demonstrating the agent’s systematic quantum advantage in a setup that could readily be integrated within future large-scale quantum communication networks.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All the datasets used in the current work are available on Zenodo at https://doi.org/10.5281/zenodo.4327211.
References
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).
Paparo, G. D., Dunjiko, V., Makmal, A., Martin-Delgrado, M. A. & Briegel, H. J. Quantum speedup for active learning agents. Phys. Rev. X4, 031002 (2014).
Sriarunothai, T. et al. Speeding-up the decision making of a learning agent using an ion trap quantum processor. Quantum Sci. Technol. 4, 015014 (2019).
Johannink, T. et al. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA) 6023–6029 (IEEE, 2019).
Tjandra, A., Sakti, S. & Nakamura, S. Sequence-to-aequence ASR optimization via reinforcement learning. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5829–5833 (IEEE, 2018).
Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716–1720 (2018).
Thakur, C. S. et al. Large-scale neuromorphic spiking array processors: a quest to mimic the brain. Front. Neurosci. 12, 891 (2018).
Steinbrecher, G. R., Olson, J. P., Englund, D. & Carolan, J. Quantum optical neural networks. npj Quantum Inf. 5, 60 (2019).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature550, 354–359 (2017).
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature574, 505–510 (2019).
Dong. D., Chen, C., Li, H. & Tarn, T.-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B38, 1207–1220 (2008).
Dunjko, V. & Briegel, H. J. Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep. Prog. Phys. 81, 074001 (2018).
Baireuther, P., O’Brien, T. E., Tarasinski, B. & Beenakker, C. W. J. Machine-learning-assisted correction of correlated qubit errors in a topological code. Quantum2, 48 (2018).
Breuckmann, N. P. & Ni, X. Scalable neural network decoders for higher dimensional quantum codes. Quantum2, 68–92 (2018).
Chamberland, C. & Ronagh, P. Deep neural decoders for near term fault-tolerant experiments. Quant. Sci. Technol. 3, 044002 (2018).
Fösel, T., Tighineanu, P., Weiss, T. & Marquardt, F. Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018).
Poulsen Nautrup, H., Delfosse, N., Dunjko, V., Briegel, H. J. & Friis, N. Optimizing quantum error correction codes with reinforcement learning. Quantum3, 215 (2019).
Yu, S. et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2, 1800074 (2019).
Krenn, M., Malik, M., Fickler, R., Lapkiewicz, R. & Zeilinger, A. Automated search for new quantum experiments. Phys. Rev. Lett. 116, 090405 (2016).
Melnikov, A. A. et al. Active learning machine learns to create new quantum experiments. Proc. Natl Acad. Sci. USA115, 1221–1226 (2018).
Dunjko, V., Friis, N. & Briegel, H. J. Quantum-enhanced deliberation of learning agents using trapped ions. New J. Phys. 17, 023006 (2015).
Jerbi, S., Poulsen Nautrup, H., Trenkwalder, L. M., Briegel, H. J. & Dunjko, V. A framework for deep energy-based reinforcement learning with quantum speed-up. Preprint at https://arxiv.org/abs/1910.12760 (2019).
Kimble, H. J. The quantum internet. Nature453, 1023–1030 (2008).
Cacciapuoti, A. S. et al. Quantum internet: networking challenges in distributed quantum computing. IEEE Netw. 34, 137–143 (2020).
Briegel, H. J. & De las Cuevas, G. Projective simulation for artificial intelligence. Sci. Rep. 2, 400 (2012).
Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79, 325–328 (1997).
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge Univ. Press, 2000).
Flamini, F. et al. Photonic architecture for reinforcement learning. New. J. Phys. 22, 045002 (2020).
Harris, N. C. et al. Quantum transport simulations in a programmable nanophotonic processor. Nat. Photon. 11, 447–452 (2017).
Boyer, M., Brassard, G., Hoyer, P. & Tappa, A. Tight bounds on quantum searching. Fortschr. Phys. 46, 493–505 (1998).
Senellart, P., Solomon, G. & White, A. High-performance semiconductor quantum-dot single-photon sources. Nat. Nanotechnol. 12, 1026–1039 (2017).
Wan, N. H. et al. Large-scale integration of artificial atoms in hybrid photonic circuits. Nature583, 226–231 (2020).
Northup, T. E. & Blatt, R. Quantum information transfer using photons. Nat. Photon. 8, 356–363 (2014).
Denil, M. et al. Learning to perform physics experiments via deep reinforcement learning. Proc. Int. Conf. on Learning Representations (2017).
Bukov, M. et al. Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018).
Poulsen Nautrup, H. et al. Operationally meaningful representations of physical systems in neural networks. Preprint at https://arxiv.org/abs/2001.00593 (2020).
Yoder, T. J., Low, G. H. & Chuang, I. L. Fixed-point quantum search with an optimal number of queries. Phys. Rev. Lett. 113, 210501 (2014).
Kim, T., Fiorentino, M. & Wong, F. N. C. Phase-stable source of polarization-entangled photons using a polarization Sagnac interferometer. Phys. Rev. A73, 012316 (2006).
Saggio, V. et al. Experimental few-copy multipartite entanglement detection. Nat. Phys. 15, 935–940 (2019).
Marsili, F. et al. Detecting single infrared photons with 93% system efficiency. Nat. Photon. 7, 210–214 (2013).
Acknowledgements
We thank L. A. Rozema, I. Alonso Calafell and P. Jenke for help with the detectors. A.H. acknowledges support from the Austrian Science Fund (FWF) through the project P 30937-N27. V.D. acknowledges support from the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). N.F. acknowledges support from the Austrian Science Fund (FWF) through the project P 31339-N27. H.J.B. acknowledges support from the Austrian Science Fund (FWF) through SFB BeyondC F7102, the Ministerium für Wissenschaft, Forschung, und Kunst Baden-Württemberg (Az. 33-7533-30-10/41/1) and the Volkswagen Foundation (Az. 97721). P.W. acknowledges support from the research platform TURIS, the European Commission through ErBeStA (no. 800942), HiPhoP (no. 731473), UNIQORN (no. 820474), EPIQUS (no. 899368), and AppQInfo (no. 956071), from the Austrian Science Fund (FWF) through CoQuS (W1210-N25), BeyondC (F 7113) and Research Group (FG 5), and Red Bull GmbH. The MIT portion of the work was supported in part by AFOSR award FA9550-16-1-0391 and NTT Research.
Author information
Authors and Affiliations
Contributions
V.S. and B.E.A. implemented the experiment and performed data analysis. A.H., V.D., N.F., S.W. and H.J.B. developed the theoretical idea. T.S. and P.S. provided help with the experimental implementation. N.C.H., M.H. and D.E. designed the nanophotonic processor. V.S., S.W. and P.W. supervised the project. All the authors contributed to writing the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review informationNature thanks Vojtěch Havlíček, Lucas Lamata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Saggio, V., Asenbeck, B.E., Hamann, A. et al. Experimental quantum speed-up in reinforcement learning agents. Nature 591, 229–233 (2021). https://doi.org/10.1038/s41586-021-03242-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41586-021-03242-7
This article is cited by
-
Experimental quantum-enhanced kernel-based machine learning on a photonic processor
Nature Photonics (2025)
-
Quantum deep learning in neuroinformatics: a systematic review
Artificial Intelligence Review (2025)
-
Hybrid quantum-classical reinforcement learning in latent observation spaces
Quantum Machine Intelligence (2025)
-
A hybrid learning agent for episodic learning tasks with unknown target distance
Quantum Machine Intelligence (2025)
-
Quantum machine learning: a systematic categorization based on learning paradigms, NISQ suitability, and fault tolerance
Quantum Machine Intelligence (2025)
Xinhang Shen
Please be aware that quantum mechanics is wrong because it does not take the effect of aether into consideration while aether plays critical roles in all physical processes in the visible space of the universe. As quantum mechanics is wrong, all its ridiculous conclusions including particle entanglement are wrong, and thus quantum communication and quantum computing are hoaxes.
The existences of a fluid aether as the medium of light and other electromagnetic phenomena is a direct conclusion from the disproof of special relativity which uses Lorentz Transformation to redefine time and space but the newly defined time is no longer the physical time measured with physical clocks.
We know time is a concept abstracted from the status changes of physical processes such as the change of the view angle of the sun, the increase of the height of a tree, the distance that a car has driven, the biological age of a person, the number of cycles of a clock, etc. All the changes of the statuses of physical processes are the products of time and changing rates. The effect of time can never be shown without the help of a status changing rate. Every physical clock records the number of cycles of a periodical process and uses this number to indirectly calculate the elapsed time. The number of cycles is the product of time and frequency (i.e. changing rate). In special relativity, when observed from a stationary frame, relativistic time of a moving frame does become shorter but the relativistic frequency of a clock on the moving frame becomes faster to make the product of relativistic time and relativistic frequency unchanged compared with that of the stationary clock. That is, clock time is still absolute and independent of reference frames in special relativity. Thus relativistic time is not the clock time i.e. our physical time but a fake time and relativistic kinetic time dilation won't be found on any physical clock or any other physical process. Based on such a fake time, special relativity is wrong.