Introduction

Solving the Schrödinger equation for quantum many-body systems is at the core of many computational algorithms in fields such as condensed matter physics, quantum chemistry, and high-energy physics. A quantum advantage for this task would have far-reaching consequences for natural sciences. Among approaches to using quantum computers for eigenstate calculations, two have been the primary objects of discussion to date: quantum phase estimation (QPE)1,2, including its recent advancements (e.g., refs. 3,4,5), and the variational quantum eigensolver (VQE)6. Experimental implementations on pre-fault-tolerant devices have focused on VQE, which has been demonstrated on various experimental platforms for a wide range of problems (e.g., refs. 6,7,8,9). However, the bottleneck of parametric optimization has so far prevented its scaling beyond small instances. QPE, on the other hand, possesses theoretical precision guarantees, but quantum error correction will be necessary to reach the circuit depths required for problems of value, although small examples have been implemented10,11,12.

These results leave a gap in methods for eigenstate estimation between the small demonstrations that have been executed so far and large-scale, high-accuracy simulations using QPE or related methods on fault-tolerant quantum computers. In this work, we demonstrate that Krylov quantum diagonalization (KQD)13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28, a type of quantum subspace diagonalization13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42, can fill the gap for more general problems.

The main idea in KQD is to use the quantum computer to approximate the projection of the Hamiltonian into a Krylov space spanned by various time evolutions of an initial reference state. The resulting low-dimensional matrix is then classically diagonalized to obtain approximate low-lying energy eigenstates13. This method shares the property of variationality with VQE (up to effects of noise), but does not require an iterative parameter optimization, instead relying on a single round of circuit executions followed by classical post-processing. Furthermore, as long as the noise can be quantified, the accuracy of the method can be bounded theoretically27,28, as in QPE, meaning that KQD can continue to be valuable through the transition into the fault-tolerant era. In the near term, time evolutions for simulations with less stringent accuracy requirements are not prohibitive for existing quantum computers. While it may be challenging to quantify the impacts of all sources of error in this context, KQD still functions as a heuristic with an underlying exponential convergence towards a noisy estimate of the ground state energy.

In this work, we use KQD to estimate the ground-state energy of the Heisenberg model on a heavy-hexagonal lattice. We show that although noise poses a significant obstacle to high accuracy, even with advanced error mitigation43,44, we can obtain convergence to the ground-state energy on up to 56 qubits.

Results

Theory of Krylov quantum diagonalization

KQD consists of two main steps. The first is a quantum subroutine to construct the matrices

$${\tilde{H}}_{jk}=\langle {\psi }_{j}| H| {\psi }_{k}\rangle,\qquad {\tilde{S}}_{jk}=\langle {\psi }_{j}| {\psi }_{k}\rangle,$$
(1)

which corresponds to the projection of the Hamiltonian into and the overlap (Gram) matrix of a subspace \({\mathcal{K}}={\rm{Span}}\{\left\vert {\psi }_{0}\right\rangle,\ldots,\left\vert {\psi }_{D-1}\right\rangle \}\). The second step is to classically solve the time-independent Schrödinger equation projected into the subspace, which is given by

$$\tilde{H}c=E\tilde{S}c,$$
(2)

where c is a coordinate vector in the Krylov space. The approximate ground-state energy, within the entire Hilbert space or a symmetry sector, is obtained as the lowest eigenvalue of (2). Two distinct components affect the accuracy of the approximation27,28: the intrinsic error of projecting the full eigenvalue problem down into the subspace, which is related to the overlap of sufficiently low-energy states with the subspace, and any additional algorithmic, statistical, and hardware errors.

Subspace diagonalization methods differ primarily in the choice of subspace. In classical computing, one of the common approaches is to construct the subspace by generating correlation via local operators such as the hopping terms for fermions, as in multi-reference configuration interaction45. Alternatively, one can use global operators. For instance, the classical Lanczos method employs the power series of the Hamiltonian to construct the subspace as \({{\mathcal{K}}}_{P}={\rm{Span}}\{{H}^{\,j}\left\vert {\psi }_{0}\right\rangle \}\), which is also referred to as the power or polynomial Krylov space. The main advantage of such a construction is that the accuracy of the solution improves exponentially with the subspace size D46,47,48. The limiting factor in classical Lanczos and related methods is that they inevitably suffer from memory consumption that grows exponentially with the system size, owing to the need to represent entangled quantum states.

While various adaptations of this scheme to quantum computers have been proposed13,14,15,16,17,18,21,22,23,24,25,26,27,28,29,30,31,32,35,36,37,38,39,40,41,42, the most appropriate for near-term quantum computers is to use real-time evolutions as the global operators to generate the Krylov space:

$${{\mathcal{K}}}_{U}={\rm{Span}}\{{U}^{\, j}\left\vert {\psi }_{0}\right\rangle \},\quad j=0,1,\ldots,D-1,$$
(3)

where U = eiHdt is the time evolution operator for some timestep dt13,14,15,16,17,18,21,22,23,24,25,26,27,28. The advantage of this is two-fold: first, time evolutions can be approximated by circuits of short enough depth to be implemented on existing quantum devices. Second, one can show that even in the presence of noise, the error due to projection into this unitary Krylov space converges exponentially quickly with the Krylov dimension, just as in classical Krylov algorithms. The noise simply contributes an additional error term as long as it is not so large that it completely overwhelms the signal27,28. This means that it is possible to reach convergence of the approximate ground-state energy with a Krylov space of limited dimension.

While evaluation of the Krylov matrices on the quantum computer resolves the issue of memory, which is the main obstacle to scaling on the classical side, the main obstacle on the quantum side is noise. Two major contributions are statistical noise due to finite shot sampling and hardware noise in the device. Algorithmic error from the approximation of time evolutions also enters, but below we show numerically that its effects are below the level of the hardware errors. On the other hand, suppressing and mitigating those hardware errors proves to be crucial in order to scale the size of the simulation: we apply experimental techniques for this purpose (see Supplementary Note 4 for details) as well as keeping the quantum circuit as shallow as possible while maintaining global coupling structure of the Krylov space.

To simplify our circuits, we exploit the U(1) symmetry possessed by many condensed matter models, including the Heisenberg model we focus on. As a qubit operator, U(1) symmetry can be expressed as conservation of Hamming weight; in terms of spin-1/2 operators, it corresponds to conservation of the z component of total spin. Equivalently, we can think of the symmetry subspaces as k-particle subspaces, treating () spins as absence (presence) of a particle.

Figure 1 shows a sequence of circuits that could, in principle, be used to calculate the matrix elements (1). Panel (a) shows the standard Hadamard test, which would be the default tool for such a calculation. Panel (b) illustrates how we use spin conservation to avoid implementing the controlled time evolutions present in the conventional Hadamard test: instead, we implement controlled initializations of the reference state \(\left\vert {\psi }_{0}\right\rangle\), and then rely on the fact that the time evolutions preserve the “vacuum state” \(\left\vert 00\ldots 0\right\rangle\) up to a classically calculable phase.

Fig. 1: Schematic of Krylov quantum diagonalization.
figure 1

a Hadamard circuit for computing matrix elements of the form 〈ψiPψj〉, which relies on controlled unitary implementation of Krylov basis states. b Simplification of the circuit by exploiting a symmetry such as particle-number conservation. c The construction employed in this work. Only one time evolution circuit is required, and the second controlled preparation circuit is absorbed into the basis of the measurement. d Classical postprocessing to construct matrices \(\tilde{H}\) and \(\tilde{S}\), which yield a generalized eigenvalue problem. The matrices are Hermitian for the circuits shown in a, b, and Toeplitz Hermitian for c. Note that the diagonal elements, enclosed by black lines, can be computed classically.

As a second simplification, we note that for the exact time evolutions, \(\langle {\psi }_{0}| {U}_{j}^{\dagger }H{U}_{k}| {\psi }_{0}\rangle=\langle {\psi }_{0}| H{U}_{k-j}| {\psi }_{0}\rangle\), which gives us two formally equivalent ways to measure the same matrix element, with the second yielding a simpler circuit since it only involves one time evolution. However, once the time evolutions are approximated by Trotterization, these two expressions are no longer equal. In Fig. 1c, we show the circuit corresponding to the latter version.

It is not a priori clear whether one should prefer the circuits shown in panels b or c in Fig. 1, purely from a Trotter error perspective. One advantage of Fig. 1b is that it still corresponds to variational optimization in a subspace, since each matrix element still has the form (1). However, even this ceases to be true in the presence of finite sample and device noise28. Figure 1c, the version in which Toeplitz structure is explicitly enforced, is preferable from the perspective of circuit depth for two reasons: it only requires one time evolution, and as a result, the second controlled initialization can be applied as a Clifford transformation to the Pauli observables in the Hamiltonian rather than explicitly implemented in the circuit. In practice, we do not see dramatic violations of variationality with this method, thanks to the regularization technique used to avoid ill-conditioning of the eigenvalue problem (2) (see Supplementary Note 5 for details). As an example, we compare exact classical simulation results for circuits in Fig. 1b, c, which are to be compiled in experiments as shown in Fig. 2 (see the next section for details). Energy curves for a 20-qubit system shown in Fig. 3 indicate that the variationality restores quickly with the number of Trotter steps. These findings motivated using the version of the circuits shown in Fig. 1c.

Fig. 2: Quantum circuits for Krylov quantum diagonalization.
figure 2

a Each circuit performs the controlled preparation of an initial state within the target particle sector, followed by a Trotterized time evolution. b The controlled preparation prepares a computational basis state in which the Hamming weight corresponds to the number of particles for the given experiment, controlled on the auxiliary qubit. Since the heavy-hex lattice can be edgewise three-colored (colors given in the figure by red, green, and blue), both the controlled preparation and the Trotterized time evolution can be implemented using sequences of three unique two-qubit gate layers interleaved with single-qubit rotations. See the main text for details. c Each layer of two-qubit gates is Pauli twirled in order to tailor the noise to a sparse Pauli-Lindblad noise model Λ43,44, preceded by its amplification ΛG for PEA. Note that adjacent layers of single-qubit gates, originating from either the source circuit, the twirling, or the noise amplification layer, are always combined in a single layer; they are left unmerged in the figure for clarity. d (12 + 1)-qubit example of the CZ layers.

Fig. 3: Numerical investigations of algorithmic errors.
figure 3

a (20 + 1)-qubit layout of the Heisenberg model used for numerical simulations, with the green and red circles indicating the control and excited qubits. b Energy versus Krylov space dimension. The dotted and solid lines indicate the results from the circuits in Fig. 1b, c, respectively. c Heat map of the ground-state energy error ΔE for k = 5-particle sector with various dt and D, using 4 second-order Trotter steps. The white arrow indicates the value of π/||H||.

Large-scale experimental demonstrations

For our experiments, we studied the spin-1/2 antiferromagnetic Heisenberg model, which is defined for a set of edges E as

$$H=\sum _{(i,j)\in E}{J}_{ij}({X}_{i}{X}_{j}+{Y}_{i}{Y}_{j}+{Z}_{i}{Z}_{j})$$
(4)

with uniform couplings Jij = 1, where XiYiZi denote the Pauli matrices on the ith site. The set of interactions E is a subset of the heavy-hex lattice (see Fig. 4). Note that, while the heavy-hex lattice is bipartite and hence the ground state in the entire Hilbert space can be simulated efficiently using the path-integral Monte Carlo method49, the sign problem is present for excited states in general. Among the excited states, we focus on the lowest-energy eigenstates within several k-particle subspaces. The dimension of the k-particle subspace scales as O(Nk). Note that the circuit construction relies on the U(1) symmetry but not on SU(2) symmetry, and hence our method is directly applicable to XXZ model as well.

Fig. 4: Experimental diagonalization of many-body Hamiltonians.
figure 4

a The energy per site of Heisenberg model for particle numbers k = 1, 3, and 5 in system sizes of N = 56, 44, and 42, respectively. The error bars indicate standard deviations estimated by bootstrapping. The dashed curves indicate the energies from noiseless classical simulations, and solid black lines show the exact lowest energy in the given k-particle subspace. bd Qubit layout graphs. The green and red circles indicate the control and initial locations of particles, respectively. eg Energy curves for individual particle numbers k = 1, 3, and 5. hj Error matrices \(\Delta \tilde{H}/N :=| {\tilde{H}}_{\exp }-{\tilde{H}}_{{\rm{num}}}| /N\) and \(\Delta \tilde{S} :=| {\tilde{S}}_{\exp }-{\tilde{S}}_{{\rm{num}}}|\), where the subscripts “exp" and “num" denote data from experiments and numerical calculations, respectively.

We ran experiments in three different k-particle sectors: k = 1, 3, 5. The initial states in all three cases were computational basis states with numbers of \(\left\vert 1\right\rangle\) s given by k: for example, in the single-particle case, \(\left\vert {\psi }_{0}\right\rangle=\left\vert 00\ldots 1\ldots 0\right\rangle\). The circuit implementations for the different values of k therefore differ in the controlled preparation (see Figs. 1 and 2). The k = 1 case corresponds to generating only one particle in the initial state, which can easily be implemented with a CX gate between the control qubit (the ancilla in the Hadamard test) and an adjacent qubit. For k > 1, we chose locations for the particles that were distributed approximately uniformly over the qubit graph.

The heavy-hex lattice permits a three-coloring of its edges, in which each color corresponds to a layer of two-qubit gates that can be implemented simultaneously (see Fig. 2). Since each distinct two-qubit layer requires its own noise learning for probabilistic error amplification (PEA—see below), it is advantageous to minimize the number of distinct layers in the circuits. The controlled preparation circuits can be implemented using a set of two-qubit layers corresponding to the three-coloring of edges in the heavy-hex, with only a constant overhead compared to arbitrary layers (see Supplementary Note 2 for details). For our Trotterized time evolutions, we partitioned the Hamiltonian terms into the same set of layers. Therefore, we only had to learn the noise models of three unique layers in total for each experiment.

The depth of the controlled-initialization part of the circuit is proportional to the distance between the two furthest apart initial particles in the qubit graph. We used two second-order Trotter steps to approximate the time evolutions in all of our experiments. r second-order Trotter steps with three commuting groups of Hamiltonian terms require 4r +1 two-qubit layers (see panel b in Fig. 2), yielding 9 layers in our case for the time evolution part of the circuit.

To measure the observables corresponding to real or imaginary parts of the matrix elements in \(\tilde{H}\) and \(\tilde{S}\), we partitioned the observables into as few locally-commuting sets (measurement bases) as possible, since such sets are co-measurable7. The shortened circuits, as in the third row of Fig. 1, require conjugating the Hamiltonian terms (4) by the second controlled-initialization circuit, since it is not physically implemented. This yields the same number of Pauli observables since the controlled-initialization is a Clifford circuit, and one can prove that these observables can be partitioned into 2(k + 2) measurement bases; see Supplementary Note 2.

We performed experiments on the Heron R1 processor IBM_montecarlo. This is a 133-qubit device with fixed-frequency transmon qubits connected to each other via tunable couplers. Heron processors have faster two-qubit gates (similar in duration to the single-qubit gates) and lower cross-talk compared to the fixed-coupling devices of earlier generations. To further improve the measured observables (see Fig. 1), we used probabilistic error amplification (PEA)44 and twirled readout error extinction (TREX)50, which mitigates SPAM errors, to approximate noise-free expectation values. We additionally employed error suppression, in particular Pauli twirling and dynamical decoupling. Details of the error mitigation and suppression are given in Supplementary Note 4.

In our experiments, for each measurement basis, a certain number of twirled instances were generated, and each instance was then repeatedly measured for different values of the noise amplification factor. For the single-particle (k = 1) experiment, we used 300 twirled instances with 500 shots each, at noise amplification factors of 1, 1.5, 3. For k = 3, 5, we used 100 twirled instances with 500 shots each, at noise factors 1, 1.3, 1.6. The reduction in twirled instances for the larger experiments was introduced in order to reduce the total runtime, since the number of measurement bases as well as the circuit sizes increase with k. The adjustment of the noise amplification factors was due to the increased noise rates in the deeper circuits. The controlled-initialization part of the circuit involves creating a maximally entangled state of the control qubit and the initial particle locations. With an increase in the number of particles, this translates to a larger maximally entangled state prepared at the beginning of the circuit, which in turn makes the results more susceptible to noise.

The size of the Krylov space was fixed to D = 10 across all experiments in order to achieve a total runtime of the algorithm within the timescale of device recalibration processes (24 h). For a fixed value of k the experiment was run on a specific qubit subset, chosen according to the current status of the device by using a heuristic routine for optimal qubit mapping51. The k = 1 experiment was executed on a 57-qubit subset, the k = 3 experiment on a 45-qubit subset, and the k = 5 experiment on a 43-qubit subset (the layouts are shown in Fig. 4). The latter two were partially chosen by hand in order to have five complete heavy hexes in each case.

Although the time step dt theoretically has an optimal value of π/||H||27,28, the restriction to low-particle-number subspaces alters this. Consequently, we chose the time steps heuristically, with values 0.5, 0.022, and 0.1 for k = 1, 3, and 5, respectively.

Results are shown in Fig. 4. Panel a summarizes the results on a normalized energy scale, while e, f, and g show the convergence curves for each separate experiment. The corresponding qubit graphs are shown in panels b, c, and d, respectively. These convergence curves are a useful diagnostic tool for assessing the results of noisy KQD experiments. We know from the theoretical analysis that if error rates are low enough to resolve the signal, i.e., to distinguish the lowest energy state in the Krylov space from pure noise, then we should see an exponential decay of the energy towards a value offset from the true ground-state energy by a constant depending on the error rate27,28. Our results show this behavior up to some fluctuations, which is expected since the theoretical results only provide for an exponentially-decaying upper bound. If the noise had completely dominated the signal, however, the rate of convergence with subspace dimension would have been exponentially slow with respect to system size, and we would not have seen the initial fast convergence in Fig. 4. See Supplementary Notes 4 and 5 for further details.

In our experimental results, noise and algorithmic error (due to the Trotter approximation as well as the limited Krylov dimension) are still significant limiting factors, as evidenced by the differences between the most accurate estimated energies (at D = 10) and the true values. We estimated standard deviations for our experimental energies using bootstrapping, since the post-processing of solving the regularized, generalized eigenvalue problem (2) makes direct error propagation difficult. This yielded the error bars in Fig. 4; for further details, see Supplementary Note 5. Figure 4 also shows the energy convergence curves for ideal classical simulations of our circuits, which are tractable by representing vectors and operators only in the restricted particle-number subspaces. While the error bars are large due to the noisy experimental results, our estimated energies for the two larger values of k are consistent with the ideal simulation curves up to these standard deviations at nearly all points.

In the k = 1 experiment, the results deviate below the true lowest energy, indicating that noise has created an effective leakage out of the k = 1 subspace. This illustrates a risk of relying on symmetry conservation to remain in a particular subspace, although studying the global ground state would not be subject to this concern.

Exact diagonalization can also be carried out in the sectors of the Hilbert space studied in the present experiments, though not in the full Hilbert space. However, the experiments did not depend on those particular particle number sectors in any way except for the reduced circuit depth of the controlled initialization, so there are not qualitative or structural obstacles to scaling, only effects of noise. In the specific case we focused on—the ground states of the Heisenberg model on a 2D heavy-hexagonal lattice—it is also still possible to compute precise approximations using classical techniques such as tensor networks.

One may ask why KQD was employed rather than one of the various other algorithms that have been recently developed for ground-state energy estimation in a near-term or early-fault-tolerant setting, e.g.,3,4,5,52. One primary reason, in addition to KQD’s relatively well-understood noise tolerance21,25,27,28, those alternative methods all extract eigenenergies from the time evolution rather than directly from a projection of the Hamiltonian itself. This is a problem in a setting such as ours where the Trotter circuit is held fixed as the number of timesteps is increased (which was necessary to minimize circuit depth), because the spectra of the Trotter circuits diverges from the spectra of the ideal time evolutions, indeed becoming periodic with a period depending on the timestep and the fixed Trotter circuit. Hence, with this constraint algorithms depending only on the evolution will at some point cease to converge as the number of timesteps is increased.

Discussion

The KQD approach presented here enriches the landscape of quantum algorithms for ground state estimation on pre-fault-tolerant quantum processors, filling the gap between VQE and QPE. A complementary subspace algorithm called sample-based quantum diagonalization (SQD), based on sampling and sophisticated classical post-processing using a quantum-centric supercomputing (QCSC) architecture53, was recently used to demonstrate quantum simulations of chemistry beyond brute-force solutions. This QCSC method yields classically verifiable energies and does not require approximating time evolution, which makes it tractable in the near term for Hamiltonians containing large numbers of terms, such as molecular Hamiltonians. For condensed matter applications, KQD has provable convergence guarantees given an initial reference state with inverse polynomial overlap, and its circuits are feasible on pre-fault-tolerant processors as demonstrated in this work.

To compare the scale of this work to prior experimental quantum simulations of ground state energies, one can either compare based on qubit count or based on used Hilbert space dimension. Both are relevant because, for example, even though the subspace for our single-particle experiment was only 56-dimensional, it accrued the errors of a 57-qubit quantum circuit. Our largest subspace dimension was for the 5-particle experiment, whose subspace dimension was 850668, between the dimensions of the full Hilbert spaces of 19 and 20 qubits. We show the largest (to our knowledge) prior experimental demonstrations of end-to-end quantum algorithms for ground state energy simulation in Table 1, evaluated by both of the above metrics. Note that we only include experiments that implemented an entire algorithm, rather than, for example, optimizing parameters of a VQE classically and then only estimating a single energy on a quantum computer.

Table 1 Large-scale experimental demonstrations of quantum algorithms for ground state energy estimation, to date

As Table 1 shows, our experiment exceeds prior works by more than a factor of two in qubits and more than two orders of magnitude in Hilbert space dimension, with the exception of ref. 53, which has a strictly different domain of applicability as discussed in the previous paragraph. One may also note that the accuracies achieved by these experiments vary widely, but our focus was on achieving convergence at scale as discussed above, and the increase in scale places our work in a different regime from small-scale experiments (including some not shown in Table 1) that have achieved higher accuracy. In other words, this work represents a significant advance in the state of the art in quantum simulation of ground-state energies.