这是indexloc提供的服务,不要输入任何密码

Large-scale compressive microscopy via diffractive multiplexing across a sensor array

Kevin C. Zhou    \authormark1,6* Chaoying Gu    \authormark1 Muneki Ikeda    \authormark2 Tina M. Hayward    \authormark3 Nicholas Antipa    \authormark4 Rajesh Menon    \authormark3 Roarke Horstmeyer    \authormark5 Saul Kato    \authormark2 and Laura Waller\authormark1 \authormark1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA
\authormark2Department of Neurology, University of California, San Francisco, CA
\authormark3Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT
\authormark4Department of Electrical and Computer Engineering, University of California, San Diego, CA
\authormark5Department of Biomedical Engineering, Duke University, Durham, NC
\authormark6Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI
\authormark*kczhou@umich.edu, waller@berkeley.edu
journal: opticajournalarticletype: Research Article
{abstract*}

Microscopes face a trade-off between spatial resolution, field-of-view, and frame rate – improving one of these properties typically requires sacrificing the others, due to the limited spatiotemporal throughput of the sensor. To overcome this, we propose a new microscope that achieves snapshot gigapixel-scale imaging with a sensor array and a diffractive optical element (DOE). We improve the spatiotemporal throughput in two ways. First, we capture data with an array of 48 sensors resulting in 48×\times× more pixels than a single sensor. Second, we use point spread function (PSF) engineering and compressive sensing algorithms to fill in the missing information from the gaps surrounding the individual sensors in the array, further increasing the spatiotemporal throughput of the system by an additional >5.4×\times×. The array of sensors is modeled as a single large-format “super-sensor,” with erasures corresponding to the gaps between the individual sensors. The array is placed at the output of a (nearly) 4f imaging system, and we design a DOE for the Fourier plane that generates a distributed PSF that encodes information from the entire super-sensor area, including the gaps. We then computationally recover the large-scale image, assuming the object is sparse in some domain. Our calibration-free microscope can achieve similar-to\sim3 µm resolution over >5.2 cm2 FOVs at up to 120 fps, culminating in a total spatiotemporal throughput of 25.2 billion pixels per second. We demonstrate the versatility of our microscope in two different modes: structural imaging via darkfield contrast and functional fluorescence imaging of calcium dynamics across dozens of freely moving C. elegans simultaneously.

1 Introduction

Optical engineers face tradeoffs that limit the spatiotemporal throughput of the microscopes they design. Strategies that expand the space-bandwidth product (SBP) – the number of spatially resolvable points – [1] typically sacrifice frame rate. For example, physically scanning the sample  [2] or using scanning mirrors [3, 4] to build up a larger composite field-of-view (FOV) is time-consuming. Other techniques keep a constant FOV and improve resolution by capturing multiple images then computationally combining them; for example Fourier ptychography [5, 6, 7], single-molecule localization microscopy [8], and structured illumination microscopy (SIM) [9]. These methods can increase SBP and can achieve faster acquisition times than sample scanning since they do not require any moving parts. However, any sequential measurements mean that the sample should remain static during the acquisition, limiting use for imaging dynamic samples.

To achieve faster imaging, multiplexing and compressive sensing strategies can enable fewer measurements, for example, by illuminating the sample with multiple LEDs simultaneously in Fourier ptychography [10, 11] or overlapping multiple FOVs [12, 13, 14, 15], which are demixed computationally. More recently, computational approaches can model and correct for sample motion during acquisition [16, 17], circumventing the need for static samples. However, for truly single-shot large spatiotemporal throughput imaging, multi-camera array architectures have proven most useful for enabling high-speed large-SBP video acquisition [18, 19, 20].

Here, we present a new computational microscope that starts with a multi-sensor array and further improves SBP by spatial multiplexing to achieve single-shot gigapixel-scale imaging (Fig. 1). In contrast to previous multi-camera systems, ours treats the sensor array as a single large-format sensor, without associating each individual sensor with its own lens. This keeps the optical design simple. To fill in the inter-sensor gaps and solve for the full-scale image, using only the data captured at the sensors (comprising only similar-to\sim22% of the total area), we optimized and fabricated a diffractive optical element (DOE) to create a custom PSF that generates 15 shifted copies of the image, such that points that would otherwise have imaged onto the gaps are detected by at least one sensor in the array. Including both inter-sensor interpolation and extra-sensor extrapolation, our method increases the effective FOV by >5.4×\times×. Conveniently, our design does not require a physical PSF calibration step and is robust to system perturbations. We demonstrate our method experimentally on structural and functional (calcium dynamics) imaging of dozens of freely-moving C. elegans across up to multiple square centimeter FOVs at micrometer resolutions at 120 fps.

Refer to caption
Figure 1: Overview of our large-scale compressive microscope. (a) The imaging system uses an engineered diffractive optical element (DOE) in the pupil plane to generate a distributed multi-spot point spread function (PSF) at the image plane. Images are captured by an array of 48 sensors arranged in a grid with gaps between individual sensors. (b) The system forward model is a masked convolution with the PSF, which compressively encodes the parts of the image that would normally fall on the gaps between the sensors. A computational inverse problem then recovers the full-scale image, including areas outside the sensor coverage.

2 High-throughput microscopy with a diffractive pupil

2.1 Imaging system design

Fundamentally, our microscope is a pupil-coded, nearly 4f imaging system, consisting of an objective and tube lens with a DOE placed in the pupil plane and a 2D array of sensors acting as a “super sensor” in the image plane (Fig. 1). The purpose of the DOE is to create a sparse but extended PSF (see Sec. 2.2 for design considerations) that enables not only filling of the gaps within the sensor array, but also extrapolation beyond the outermost sensors (Sec. S2). The “super sensor” consists of a 6×\times×8 array of 48 8-bit monochrome sensors (3136×\times×4096 pixels, 1.1-µm pixel pitch, 9-mm center-to-center spacing) for a total of 0.617 gigapixels (GPs) per snapshot, whose rectangular hull covers an area of 4.95 cm ×\times× 6.64 cm. The sensor array frame rate has two bottlenecks: the data transfer bandwidth of 6.17 GB/sec and the sensor readout rate. At full 0.617 GP sampling, the frame rate is bandwidth-limited to 10 fps, which in practice is too slow for imaging freely moving C. elegans at high resolution. To achieve higher frame rates, we use 4×\times×4 pixel binning to achieve 120 fps (readout-limited). Since our method requires demixing overlapped images as part of an underdetermined system (we solve for more pixels than we measure), our samples should be relatively sparse (see Supplementary Fig. S5).

We report on two different imaging configurations: fluorescence and darkfield, both of which share the same tube lens (Jupiter-36B, f = 250 mm, f/3.5) and DOE. In the darkfield imaging mode, we use an FJW Industries lens (f = 90 mm, f/1.0) as the objective (for a total magnification of similar-to\sim2.78×\times×), four pseudo-collimated (Thorlabs ACL2520U-A, f = 20 mm) off-axis green LEDs (Thorlabs M530L4) as the illumination source, and a 514/2nm bandpass filter (Alluxa) in the pupil plane. The purpose of the narrowband filter is to reduce the radial spectral blurring of the nonzero diffraction orders, which would otherwise degrade spatial resolution (Supplementary Sec. S4). Since darkfield imaging has in principle zero background, it promotes sample sparsity. We also introduce a 40-mm-diameter aperture near the DOE, defining the pupil position and effective system resolution.

In fluorescence imaging mode, we replace the objective with a Rodenstock lens (f = 42 mm, nominal NA = 0.7, design wavelength = 532 nm) as the objective (for a total magnification similar-to\sim6×\times×), a high-power blue LED as the widefield Köhler illumination source (Thorlabs SOLIS-470C), a 480/40nm excitation filter (Chroma), and a 530/55nm fluorescence emission filter (Edmund) directly above the DOE. Compared to the darkfield mode, the emission filter bandwidth is significantly broadened to collect as much fluorescence emission as possible. Here, we take advantage of the fact that the objective lens was designed for one particular wavelength, resulting in extreme axial chromatic aberrations (i.e., wavelength-dependent focal shift). Thus, the PSF induced by the DOE still has a sharp peak, albeit with heavy tails that contribute to out-of-focus background. However, we empirically found that the loss in SNR due to increased the background was offset by the increase in SNR from having a significantly broader emission filter bandwidth, given the weakness of fluorescence signals (Supplementary Sec. S4).

2.2 Diffractive optical element design

As the pupil phase modulating element, the DOE directly determines the imaging system’s PSF, which we optimized for filling in the inter-sensor gaps to enable a 2D, contiguous, full-field reconstruction. To this end, there were five chief criteria we optimized for when designing the PSF: 1) At every point in the sample FOV, the PSF will result in light falling on at least one sensor, 2) the PSF should have minimal translational ambiguity, 3) the PSF is as narrow in extent as possible to minimize spectral blurring (due to the wavelength dependence of diffraction), 4) the PSF is as sparse as possible to minimize the impact of read noise, and 5) the PSF is robust to scaling errors, which may arise in practice due to deviations in the lens focal lengths. The second criterion ensures maximization of the rank of the linear forward model, which can be efficiently evaluated using a triple correlation (Eq. S2) between the PSF, itself, and an indicator function, 𝑀𝑎𝑠𝑘(x,y)𝑀𝑎𝑠𝑘𝑥𝑦\mathit{Mask}(x,y)italic_Mask ( italic_x , italic_y ), which is 1 where there is a pixel and 0 in the gaps of the sensor array plane. This triple correlation is a generalization of the typical PSF autocorrelation procedure for evaluating the performance of standard imaging systems with gap-free sensors. See Supplementary Sec. S1 for details on PSF optimization.

Optimized based on these design criteria, our PSF consists of an asymmetric distribution of 15 diffraction-limited spots (Sec. S1.3, Fig. S1, Table S1). This design enables a simple (windowed) convolution as our forward model, concentrates light into a sparse set of spots for mitigating loss in SNR, and achieves a spatial extent (6.8 mm ×\times× 6 mm) that is wider than the largest gap in the sensor array (5.55 mm ×\times× 4.49 mm) to enable gap filling. Another benefit of the extended PSF is that we can extrapolate beyond the rectangular hull of the sensor array (4.95 cm ×\times× 6.64 cm) by the width of the PSF (i.e., to 5.63 cm ×\times× 7.24 cm; see Sec. S2).

Designing our DOE consisted of computing the pupil phase pattern, ϕ(x,y)italic-ϕ𝑥𝑦\phi(x,y)italic_ϕ ( italic_x , italic_y ), that generates this multi-impulse PSF, within the constraints of our fabrication method, grayscale lithography. We used the Gerchberg-Saxton algorithm [21], modified to discretize the phase values to 64 levels at each iteration. We then converted ϕ(x,y)italic-ϕ𝑥𝑦\phi(x,y)italic_ϕ ( italic_x , italic_y ) to height values, h(x,y)𝑥𝑦h(x,y)italic_h ( italic_x , italic_y ), based on the refractive index of the photoresist (S1813) at a design wavelength of λ𝜆\lambdaitalic_λ = 515 nm (n𝑛nitalic_n = 1.6546):

h(x,y)=λϕ(x,y)2π(n1).𝑥𝑦𝜆italic-ϕ𝑥𝑦2𝜋𝑛1h(x,y)=\frac{\lambda\phi(x,y)}{2\pi(n-1)}.italic_h ( italic_x , italic_y ) = divide start_ARG italic_λ italic_ϕ ( italic_x , italic_y ) end_ARG start_ARG 2 italic_π ( italic_n - 1 ) end_ARG . (1)

We fabricated a custom multi-level (64-level) DOE based on h(x,y)𝑥𝑦h(x,y)italic_h ( italic_x , italic_y ), with a maximum height variation of 0.787 µm (corresponding to 2π𝜋\piitalic_π at λ𝜆\lambdaitalic_λ = 515 nm), a pixel size of 6 µm, and a diameter of 39.8 mm (Sec. 5.2).

Refer to caption

Figure 2: Single-frame results using (a-b) darkfield configuration and (d-e) fluorescence configuration with a green LED, whose spectrum roughly matches that of green fluorescent protein (GFP). (a,d) Raw captured data of a photolithography mask. (b,e) Reconstructions of the full-scale image, with a zoom-in on the area denoted by the red box. (c,f) Low-resolution reference image, taken with a 1×\times×/NA=0.04 objective, with zoom-in on same area.

2.3 Computational forward modeling and reconstruction

In principle, the forward model describing our imaging system is simply a masked 2D convolution:

P(𝐫)=(𝑝𝑠𝑓(𝐫)𝑂𝑏𝑗(𝐫))𝑀𝑎𝑠𝑘(𝐫),𝑃𝐫direct-producttensor-product𝑝𝑠𝑓𝐫𝑂𝑏𝑗𝐫𝑀𝑎𝑠𝑘𝐫P(\mathbf{r})=\left(\mathit{psf}(\mathbf{r})\otimes\mathit{Obj}(\mathbf{r})% \right)\odot\mathit{Mask}(\mathbf{r}),italic_P ( bold_r ) = ( italic_psf ( bold_r ) ⊗ italic_Obj ( bold_r ) ) ⊙ italic_Mask ( bold_r ) , (2)

where 𝐫=(x,y)𝐫𝑥𝑦\mathbf{r}=(x,y)bold_r = ( italic_x , italic_y ) is the 2D position, P(𝐫)𝑃𝐫P(\mathbf{r})italic_P ( bold_r ) is the forward prediction of the data, direct-product\odot denotes elementwise multiplication, 𝑝𝑠𝑓(𝐫)𝑝𝑠𝑓𝐫\mathit{psf}(\mathbf{r})italic_psf ( bold_r ) is the designed multi-impulse PSF, tensor-product\otimes denotes 2D convolution, 𝑂𝑏𝑗(𝐫)𝑂𝑏𝑗𝐫\mathit{Obj}(\mathbf{r})italic_Obj ( bold_r ) is the 2D object, and 𝑀𝑎𝑠𝑘(𝐫)𝑀𝑎𝑠𝑘𝐫\mathit{Mask}(\mathbf{r})italic_Mask ( bold_r ) is an indicator function defining whether there is a camera pixel at a given 𝐫𝐫\mathbf{r}bold_r position. An incoherent model is sufficient, with all quantities being real-valued. This is certainly true for fluorescence imaging, and approximately true for our darkfield imaging configuration, given the small coherence area of our LED illumination. Note that since the PSF is a collection of near-delta functions, Eq. 2 can also be interpreted as an incoherent superposition of multiple shifted copies of the object via the sifting property:

P(𝐫)=(iAiδ(𝐫𝚫𝐫i)𝑂𝑏𝑗(𝐫))𝑀𝑎𝑠𝑘(𝐫)=(iAi𝑂𝑏𝑗(𝐫𝚫𝐫i))𝑀𝑎𝑠𝑘(𝐫),𝑃𝐫direct-productsubscript𝑖tensor-productsubscript𝐴𝑖𝛿𝐫𝚫subscript𝐫𝑖𝑂𝑏𝑗𝐫𝑀𝑎𝑠𝑘𝐫direct-productsubscript𝑖subscript𝐴𝑖𝑂𝑏𝑗𝐫𝚫subscript𝐫𝑖𝑀𝑎𝑠𝑘𝐫\begin{split}P(\mathbf{r})&=\left(\sum_{i}A_{i}\delta(\mathbf{r}-\mathbf{% \Delta r}_{i})\otimes\mathit{Obj}(\mathbf{r})\right)\odot\mathit{Mask}(\mathbf% {r})\\ &=\left(\sum_{i}A_{i}\mathit{Obj}(\mathbf{r}-\mathbf{\Delta r}_{i})\right)% \odot\mathit{Mask}(\mathbf{r}),\end{split}start_ROW start_CELL italic_P ( bold_r ) end_CELL start_CELL = ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ ( bold_r - bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⊗ italic_Obj ( bold_r ) ) ⊙ italic_Mask ( bold_r ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Obj ( bold_r - bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ⊙ italic_Mask ( bold_r ) , end_CELL end_ROW (3)

where Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝚫𝐫i=(Δxi,Δyi)𝚫subscript𝐫𝑖Δsubscript𝑥𝑖Δsubscript𝑦𝑖\mathbf{\Delta r}_{i}=(\Delta x_{i},\Delta y_{i})bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( roman_Δ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Δ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are the amplitude and relative position of the i𝑡ℎsuperscript𝑖𝑡ℎi^{\mathit{th}}italic_i start_POSTSUPERSCRIPT italic_th end_POSTSUPERSCRIPT impulse of the PSF, respectively.

In practice, the forward model is not shift-invariant due to distortions and aberrations in the objective and tube lenses. A common way to model distortion is with a radial distortion model, whereby the imaging magnification, M(𝐫)𝑀𝐫M(\mathbf{r})italic_M ( bold_r ), varies radially (e.g., barrel and pincushion distortion):

M(𝐫)=1+i=1mai|𝐫𝐫0|22i,𝑀𝐫1superscriptsubscript𝑖1𝑚subscript𝑎𝑖subscriptsuperscript𝐫subscript𝐫02𝑖2M(\mathbf{r})=1+\sum_{i=1}^{m}a_{i}\left|\mathbf{r}-\mathbf{r}_{0}\right|^{2i}% _{2},italic_M ( bold_r ) = 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_r - bold_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (4)

which for simplicity is normalized by the nominal magnification, M0subscript𝑀0M_{0}italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, to 1 (i.e., so that the actual magnification is M0M(𝐫)subscript𝑀0𝑀𝐫M_{0}M(\mathbf{r})italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_M ( bold_r )), {ai}i=1msuperscriptsubscriptsubscript𝑎𝑖𝑖1𝑚\{a_{i}\}_{i=1}^{m}{ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are the m𝑚mitalic_m distortion coefficients, 𝐫0=(x0,y0)subscript𝐫0subscript𝑥0subscript𝑦0\mathbf{r}_{0}=(x_{0},y_{0})bold_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is the center of distortion, and ||2|\cdot|_{2}| ⋅ | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the L2 norm. This model assumes a rotationally symmetric imaging system; however, our microscope is not rotationally symmetric, even if all lens elements are perfectly aligned, since the DOE has an asymmetric diffraction pattern. Thus, we modeled distortions via two separable radial distortion models, one for the objective, one for the tube lens,

M𝑜𝑏𝑗(𝐫)=1+i=1mai𝑜𝑏𝑗|𝐫𝐫0𝑜𝑏𝑗|22i,M𝑡𝑢𝑏𝑒(𝐫)=1+i=1mai𝑡𝑢𝑏𝑒|𝐫𝐫0𝑡𝑢𝑏𝑒|22i,formulae-sequencesubscript𝑀𝑜𝑏𝑗𝐫1superscriptsubscript𝑖1𝑚superscriptsubscript𝑎𝑖𝑜𝑏𝑗subscriptsuperscript𝐫superscriptsubscript𝐫0𝑜𝑏𝑗2𝑖2subscript𝑀𝑡𝑢𝑏𝑒𝐫1superscriptsubscript𝑖1𝑚superscriptsubscript𝑎𝑖𝑡𝑢𝑏𝑒subscriptsuperscript𝐫superscriptsubscript𝐫0𝑡𝑢𝑏𝑒2𝑖2\begin{split}M_{\mathit{obj}}(\mathbf{r})&=1+\sum_{i=1}^{m}a_{i}^{\mathit{obj}% }\left|\mathbf{r}-\mathbf{r}_{0}^{\mathit{obj}}\right|^{2i}_{2},\\ M_{\mathit{tube}}(\mathbf{r})&=1+\sum_{i=1}^{m}a_{i}^{\mathit{tube}}\left|% \mathbf{r}-\mathbf{r}_{0}^{\mathit{tube}}\right|^{2i}_{2},\end{split}start_ROW start_CELL italic_M start_POSTSUBSCRIPT italic_obj end_POSTSUBSCRIPT ( bold_r ) end_CELL start_CELL = 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_obj end_POSTSUPERSCRIPT | bold_r - bold_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_obj end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_M start_POSTSUBSCRIPT italic_tube end_POSTSUBSCRIPT ( bold_r ) end_CELL start_CELL = 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_tube end_POSTSUPERSCRIPT | bold_r - bold_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_tube end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW (5)

so that each diffracted copy undergoes a unique distortion operation (Fig. S6). For example, given a point in the object plane at position 𝐫psubscript𝐫𝑝\mathbf{r}_{p}bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and a 2D spatial shift Δ𝐫iΔsubscript𝐫𝑖\Delta\mathbf{r}_{i}roman_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponding to the i𝑡ℎsuperscript𝑖𝑡ℎi^{\mathit{th}}italic_i start_POSTSUPERSCRIPT italic_th end_POSTSUPERSCRIPT impulse of the PSF, the point experiences a magnification of

M(𝐫p,Δ𝐫i)=M0M𝑜𝑏𝑗(𝐫p)M𝑡𝑢𝑏𝑒(𝐫pΔ𝐫i).𝑀subscript𝐫𝑝Δsubscript𝐫𝑖subscript𝑀0subscript𝑀𝑜𝑏𝑗subscript𝐫𝑝subscript𝑀𝑡𝑢𝑏𝑒subscript𝐫𝑝Δsubscript𝐫𝑖M(\mathbf{r}_{p},\Delta\mathbf{r}_{i})=M_{0}M_{\mathit{obj}}(\mathbf{r}_{p})M_% {\mathit{tube}}(-\mathbf{r}_{p}-\Delta\mathbf{r}_{i}).italic_M ( bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , roman_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_obj end_POSTSUBSCRIPT ( bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) italic_M start_POSTSUBSCRIPT italic_tube end_POSTSUBSCRIPT ( - bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - roman_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . (6)

We can thus use these equations (Eq. 5-6) to generate the PSF at any desired position in the FOV for a fully shift-variant model. As such, we used direct convolutions, as opposed to FFT-based convolutions, which granted us a few more computational efficiencies. First, although the multi-impulse PSF has a large spatial extent, it only has 15 non-zero points contributing to the convolution. Second, we only needed to compute the convolution at spatial positions where there exists a sensor pixel (rather than for the full FOV, followed by masking). Thus, this shift-variant forward model can be written within the loss function as

𝐿𝑜𝑠𝑠=1|S|𝐫S((iAi𝑂𝑏𝑗(M(𝐫p,Δ𝐫i)(𝐫𝚫𝐫i)))I(𝐫))2+λ1𝑇𝑉(𝑂𝑏𝑗)+λ2|𝑂𝑏𝑗|1S={𝐫𝑀𝑎𝑠𝑘(𝐫)=1}𝐿𝑜𝑠𝑠1𝑆subscript𝐫𝑆superscriptsubscript𝑖subscript𝐴𝑖𝑂𝑏𝑗𝑀subscript𝐫𝑝Δsubscript𝐫𝑖𝐫𝚫subscript𝐫𝑖𝐼𝐫2subscript𝜆1𝑇𝑉𝑂𝑏𝑗subscript𝜆2subscript𝑂𝑏𝑗1𝑆conditional-set𝐫𝑀𝑎𝑠𝑘𝐫1\begin{gathered}\mathit{Loss}=\frac{1}{|S|}\sum_{\mathbf{r}\in S}\left(\left(% \sum_{i}A_{i}\mathit{Obj}\left(M(\mathbf{r}_{p},\Delta\mathbf{r}_{i})(\mathbf{% r}-\mathbf{\Delta r}_{i})\right)\right)-\mathit{I(\mathbf{r})}\right)^{2}+% \lambda_{1}\mathit{TV}(\mathit{Obj})+\lambda_{2}|\mathit{Obj}|_{1}\\ S=\{\mathbf{r}\mid\mathit{Mask}(\mathbf{r})=1\}\end{gathered}start_ROW start_CELL italic_Loss = divide start_ARG 1 end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT bold_r ∈ italic_S end_POSTSUBSCRIPT ( ( ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Obj ( italic_M ( bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , roman_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_r - bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ) - italic_I ( bold_r ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_TV ( italic_Obj ) + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_Obj | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_S = { bold_r ∣ italic_Mask ( bold_r ) = 1 } end_CELL end_ROW (7)

where the first term is the mean square error (MSE), with the sum defined over positions in the image plane, I(𝐫)𝐼𝐫\mathit{I}(\mathbf{r})italic_I ( bold_r ), that contain detection pixels. We found that this fully shift-variant model was better than a patch-based shift-invariant model in terms of memory and accuracy, even when using FFT-based convolutions (Supplementary Sec. S3). The second term is an isotropic total variation (TV) regularizer and the third term is an L1 regularizer. We minimized Eq. 7 using (non-stochastic) gradient descent for each time point, also applying a non-negativity constraint at every iteration. Multi-sensor image data was acquired at 4×\times× downsampling, and the associated gradient update operations could entirely fit on a 24-GB GPU and took <5 seconds to reconstruct each frame (i.e., 50 iterations at >10 iter/sec).

3 Results

Refer to caption

Figure 3: High-speed (120 fps) darkfield imaging of wild-type C. elegans at high resolution across a 20.0 ×\times× 26.4 mm2 FOV. (a) Raw data at a single time frame, displayed with each sensor in its correct relative position and with static background subtracted. The blue outlines indicate the sensor boundaries. (b) Reconstruction of the full-scale image from the data captured in (a). The circled and numbered worms were tracked in the full video. The inset shows a zoom-in of worm number 1. (c) The full video reconstruction represented with color coded by time. (d) Select frames showing the motion of a few of the worms. The frames are separated by 3.7 sec. The red arrowheads mark the posterior end of the gut, which can be tracked throughout the video. Please see Videos 1 and 2 for the full raw video and full video reconstructions, respectively.

3.1 System characterization

We characterized the spatially-varying resolution, field curvature, and depth of field (DOF) of both imaging modes by translating and acquiring z-stacks of a negative USAF target across the entire lateral FOV. From these image stacks, we computed an image sharpness metric based on the mean square image gradient to identify the most in-focus image across the extended FOV (Fig. S7), which are plotted in Figs. S8-S11. Darkfield imaging mode can resolve down to group 8 element 4 (2.76 µm full-pitch resolution) across most of the extended FOV, and at some positions, group 9. Thus, when we operate the sensors at 4×\times× downsampling, the optical resolution and Nyquist sampling resolution (1.58 µm pixel size at the sample) are nearly matched across the reconstructed similar-to\sim20.0 ×\times× 26.4 mm2 FOV, for a total per-frame pixel count of 12621 ×\times× 16655 = 210 MPs and a total imaging throughput of up to similar-to\sim25.2 billion pixels per sec. On the other hand, while the fluorescence imaging mode can resolve down to group 9 in the central cameras, it has significant aberrations in the periphery. However, the fluorescence imaging mode can resolve down to group 7 element 5 (4.38 µm full-pitch resolution) across most of the 9.3 ×\times× 9.3-mm2 FOV. Thus, our fluorescence imaging resolution is aberration-limited, for a total imaging throughput of up to similar-to\sim2.17 GP/sec.

Plotting the best focus position and subtracting the plane of best fit (to account for tilt in the translational scan) reveal minimal field curvature across the FOV in both imaging modes (Fig. S7a,e). Finally, we computed the DOF across the extended FOV, defined as the axial full width half maximum (FWHM) of the sharpness metric (Fig. S7b,f). The DOF is larger further away from the center of the FOV, due to aberrations. Similarly, the fluorescence imaging mode has a larger DOF across the entire FOV, due to axial chromatic aberrations. Sample sharpness curves across all three spatial dimensions are shown in Fig. S7c,d,g,h.

3.2 Imaging of static samples

As a proof of principle, we first imaged static photolithography masks. The raw, multiplexed data and the corresponding reconstructions are shown in Fig. 2 for both imaging configurations. It is apparent from the raw data that our multiplexed imaging design superimposes multiple copies of the sample onto the sensor array. The global features of our reconstructions closely match those of the low-resolution reference images, captured using a reference Nikon microscope with a low-magnification objective (1×\times×/NA=0.04), covering a FOV of 13.3 ×\times× 13.3 mm2 with a pixel size of 6.5 µm. However, the zoom-ins show that our reconstructions have higher resolution (Fig. 2b,c,e,f).

3.3 Imaging of several dozen freely-moving C. elegans

To demonstrate the utility of our method in imaging highly dynamic samples, we imaged dozens of freely moving C. elegans at high speed (120 fps) for 15 seconds using both imaging modes (1800 frames total). Using the darkfield mode, we performed structural imaging of wild-type C. elegans freely moving across a 2 cm ×\times× 2 cm microfluidic device containing a hexagonal grid of cylindrical pillars, which mimic their natural soil environment (Fig. 3) [22, 23]. Prior to video reconstruction, we estimated and subtracted out the static background, removing darkfield signals from the pillars and any other static structures, leaving only the signal from the sparse distribution of dynamic C. elegans. The full background-subtracted multi-sensor video data is shown in Video 1, a single frame of which is visualized in Fig. 3a. The multiplexed copies of the worms are computationally demixed, leading to a contiguous, gap-free video reconstruction of the worms and their trajectories across the 15-sec video (Video 2). A single frame of and a time-encoded image of the whole video are shown in Fig. 3b and c, respectively. The contiguity of the reconstruction across space and time enabled tracking of the worms, shown in the zoom-ins in Video 2 and Fig. 3d. Furthermore, the high resolution of our reconstruction enables tracking of structures within the worm throughout the 15-sec duration, such as the posterior end of the gut, circled in red in Video 2 and indicated by the red arrowheads in Fig. 3d.

Finally, using the fluorescence imaging mode, we imaged freely moving C. elegans expressing GCaMP8f in the pharynx [24]. Here, the GCaMP8f indicator responds to the influx of intracellular calcium ions during the C. elegans’ rapid pharyngeal pumping events (up to 4-5 Hz) as part of their feeding behavior. The worms were loaded in a smaller, 8 mm ×\times× 8 mm microfluidic chip with the same hexagonal array of pillars [25]. Video 3 shows the raw, background-subtracted, multi-sensor video data, a single frame of which is shown in Fig. 4a. The corresponding reconstructions are visualized in Video 4 and Fig. 4b,c. Note that both the raw data and reconstructions appear sparser than their darkfield counterparts Fig. 3 because only the pharynx is fluorescently labeled. From the video reconstruction, we were able to track 12 individual worms throughout nearly the entire video and extract their pumping behavior from the fluorescence traces (Fig. 4e). A few sample pumping events are illustrated in Fig. 4d.

Refer to caption

Figure 4: High-speed (120 fps) of C. elegans expressing GCaMP8f in their pharynx. (a) Raw data (with static background subtraction) of a single frame, displayed with each sensor image in its correct relative position. The blue outlines indicate the sensor boundaries. See Video 3 for full video. (b) Reconstruction of the full-scale image from the timepoint in (a). The circled and numbered worms are analyzed in (d) and (e) as well as in Video 4. (c) The full 15-sec reconstructed video, color-coded by time. (d) Multiple frames (separated by 25 ms) showing the motion of 5 selected worms. (e) Fluorescence traces for the worms indicated in (b). The green highlighted time windows correspond to the image sequences plotted in (d). See Videos 3 and 4 for the full raw video and full video reconstructions and tracking of all 12 highlighted worms, respectively.

4 Discussion

We present a very high-throughput computational microscope that employs a threefold strategy to expand FOV without sacrificing spatial resolution or frame rate: a discontiguous array of 48 independent camera sensors, diffractive pupil coding to further expand the FOV by >5.4×\times×, and a computational image reconstruction algorithm. Specifically, we designed and fabricated a DOE that, when placed in the pupil plane of our microscope, creates a multi-impulse PSF that encodes and compresses information in the gaps between the 48 sensors, enabling minimally ambiguous computational reconstruction. Our computational microscope, capable of achieving micrometer resolution over multiple square centimeter FOVs at 120 Hz frame rates, stands out among the highest throughput microscopes at 25.2 GP/sec. With these capabilities, we demonstrate high-resolution structural and functional imaging of dozens of C. elegans in parallel.

There are several avenues for future investigation that could further improve our microscope. We could extend our high-throughput 2D imaging system to 3D by modifying the DOE design to include depth encoding [26, 27]. An important limitation of our design is that it requires lenses with relatively high SBPs, capable of achieving high resolution over a large FOV, which is practically difficult to achieve. As a result, our microscope objective and tube lens contain aberrations that prevent diffraction-limited resolution. In particular, given our pupil/DOE diameter of 40 mm and focal lengths of 90 mm (darkfield mode) and 42 mm (fluorescence mode), diffraction-limited performance would be 0.73 µm (NA\approx0.43) and 1.45 µm (NA\approx0.22), respectively. Moreover, fabricating an even larger DOE would further improve the theoretical diffraction-limited performance, though in practice would increase aberrations. Thus, calibrating the spatially-varying PSF and incorporating it into the forward model (Sec. 2) could at least partially bridge the resolution gap. This could be particularly useful for the fluorescence imaging mode, whose PSF contains both geometric and chromatic aberrations due to the DOE and objective (Sec. S4). A hybrid diffractive-refractive design could also be employed to reduce these DOE-induced chromatic aberrations [28]. Careful PSF calibration with lensless approaches [29, 30] could also help alleviate some of these challenges with aberration, albeit at the cost of SNR.

Despite these challenges, we have convincingly demonstrated experimentally the high-speed, high-resolution, wide-FOV imaging capabilities conferred by our microscope’s highly parallelized and multiplexed design. With these new capabilities, our microscope can open up new applications not only in functional and behavioral imaging of large populations of freely moving model organisms, but also in diagnostic imaging for cell counting or detection of rare species, and in semiconductor inspection.

5 Methods

5.1 C. elegans preparation

C. elegans animals were maintained under standard conditions and fed OP50 bacteria [31]. Wild-type worms were strain N2 (Bristol). The other strain used in functional imaging was INF418 nonEx106[myo-2p::GCaMP8f::unc-54 3’UTR] [24].

Structural and functional imaging were performed using monolayer microfluidic devices [22, 25], featuring arena areas 2 cm ×\times× 2 cm ×\times× 70 µm or 8 mm ×\times× 8 mm ×\times× 70 µm in size, respectively, where 200-µm-diameter cylindrical pillars were arrayed hexagonally with a 300 µm center-to-center spacing. In this structured environment, worms were inducted to crawl rather than swim [32, 33]. Before each experiment,  150 young adult hermaphrodites were collected from cultivation plates using 2 ml of S-basal buffer (100 mM NaCl and 50 mM potassium phosphate; pH 6.0), washed twice in fresh 2 ml of S-basal buffer to remove excess bacteria, drawn into a loading tubing using a 1-ml syringe, and gently introduced into the microfluidic devices. These steps were completed within 10 min. Then, gravity-fed S-basal was allowed to flow through the arena for an additional 10 minutes, and the loaded worms were left to crawl freely for 1 hour before video recording began, allowing them to enter a starved state.

5.2 DOE fabrication

The diffractive optical element (DOE) was fabricated on a 50.8 mm diameter, 500 µm thick glass substrate using grayscale lithography. A layer of positive-tone photoresist (MICROPOSIT S1813 G2) was spin-coated at 1000 rpm for 60 seconds and baked on a hotplate at 110°C for 2 minutes. The design was patterned onto the sample with a laser pattern generator (DWL66+, Heidelberg Instruments GmbH). Post-exposure, the sample was heated at 50°C for one minute and developed in AZ Developer (1:1) for 1 minute and 20 seconds. A calibration sample, prepared and developed in the same way, was used to map photoresist depths to laser intensities from the pattern generator before the final write. The DOE pattern consisted of a 6638 ×\times× 6638 matrix of heights, with each index measuring 6 µm x 6 µm. The maximum depth of the DOE pattern was 0.787 µm. A confocal microscope (Olympus LEXT OLS5000) was used to measure the heights of several outermost pixels in various locations on the sample. The average height difference between fabrication and ideal design was approximately 20 nm, with a standard deviation of 30 nm.

5.3 Data preprocessing

Prior to computational reconstruction (Sec. 2.3), we first performed two preprocessing steps. First, we estimated the background due to unrejected fluorescence excitation, darkfield signals from the pillars in the chips containing the C. elegans, and any other signals that remain static throughout the video recording. To do this, for every pixel in the raw recordings, we computed the nth percentile across time. We then subtracted this estimated background from each frame of the video. In the second step, we estimated the imaging system distortion and DOE orientation (Sec. 2.3) by jointly optimizing them with the reconstruction of a single frame from the video (or a superposition of multiple frames).

5.4 Tracking

Tracking of C. elegans was performed almost entirely automatically using Trackpy in Fig. 3 and Video 2. Automatic tracking of the GCaMP8f-expressing C. elegans, however, was much more challenging, given the fluorescence signal fluctuations during pharyngeal pumping. Thus, we first performed automatic tracking using Trackpy, followed by manual tracking of the 12 worms highlighted in Fig. 4 and Video 4.

Data availability

The raw data underlying the results are currently not publicly available, but are available upon request.

Code availability

The image reconstruction code will be made available on Github.

Acknowledgments

We thank Monika Scholz for providing the GCaMP8f-expressing C. elegans strain (INF418) and Leyla Kabuli for her constructive comments on the manuscript. This work was supported by the Office of Naval Research (N00014-22-1-2014), the Air Force Office of Scientific Research (AFOSR FA9550-22-1-0521), the National Institutes of Health (1RF1NS128772-01), and the Japan Society for Promotion of Science (JSPS KAKENHI 23KJ0349). This work was also supported in part by the Schmidt Science Fellows, in partnership with the Rhodes Trust. Laura Waller is a Chan Zuckerberg Biohub SF investigator.

Contributions

Conceptualization: KCZ, LW; Methodology: KCZ, CG, MI, NA, RM, RH, SK, LW; Software: KCZ, CG; Formal Analysis: KCZ, CG, MI; Investigation: KCZ, CG, MI, TH; Data Curation: KCZ, CG; Writing: KCZ, CG, MI, LW; Visualization: KCZ, CG, MI; Supervision: RM, SK, LW; Funding Acquisition: RM, SK, LW.

Disclosures

KCZ was a consultant for Ramona Optics Inc. RH is a cofounder of Ramona Optics Inc., which is commercializing multi-camera array microscopes. RM is a co-founder of Oblate Optics, Inc that is commercializing DOEs. The remaining authors declare no conflict of interest.

Supplementary information

S1 PSF design details

In this section, we explain in more detail the five criteria mentioned in the main text that we optimized for, followed by the exact optimization procedure.

S1.1 Criteria for a good PSF

Pan-visibility. First and foremost, the PSF should always be visible across the entire FOV on at least one sensor, including when the PSF is centered within a gap between sensors. This criterion ensures that we obtain a continuous, gap-free reconstruction. To fulfill this criterion, the PSF needs to be at least as large as the largest gap in the sensor array, which is 5.5504 mm by 4.4944 mm in the image plane. For example, a 3×\times×3 grid of dots with these dimensions satisfies this criterion (Fig. S1a); however, it has ambiguities (Fig. S1b) that renders reconstruction challenging, which leads us to the second criterion.

Minimal translational ambiguity. Specifically, no two PSF positions in the FOV should give rise to the same measurement and they should be as dissimilar as possible (low autocorrelation). This argument is analogous to that of the standard, linear shift-invariant (LSI) imaging system (with no gaps), as minimizing PSF autocorrelation results in maximizing the frequency content of the MTF. Here, due to the discontinuous FOV, this criterion is harder to satisfy. For example, the aforementioned 3×\times×3 PSF fails this test (Fig. S1b), but would not fail if there were no gaps (although this PSF is autocorrelated).

These first two criteria have intuitive interpretations, but could be understood more rigorously by examining the imaging problem more abstractly as a simple linear equation. To be concrete, let

𝐛(m×1)=𝐀(m×n)𝐱(n×1)𝑚1𝐛𝑚𝑛𝐀𝑛1𝐱\underset{(m\times 1)}{\mathbf{b}}=\underset{(m\times n)}{\mathbf{A}}\underset% {(n\times 1)}{\mathbf{x}}start_UNDERACCENT ( italic_m × 1 ) end_UNDERACCENT start_ARG bold_b end_ARG = start_UNDERACCENT ( italic_m × italic_n ) end_UNDERACCENT start_ARG bold_A end_ARG start_UNDERACCENT ( italic_n × 1 ) end_UNDERACCENT start_ARG bold_x end_ARG (S1)

describe a simplified forward model of our imaging system, ignoring distortion and aberrations, where 𝐛𝐛\mathbf{b}bold_b is a vector of length m𝑚mitalic_m, corresponding to the number of pixels across the sensor array, 𝐀𝐀\mathbf{A}bold_A is a Toeplitz matrix with the rows corresponding to shifted copies of the PSF, and 𝐱𝐱\mathbf{x}bold_x is a vector of length n𝑛nitalic_n, corresponding to the number pixels in the object reconstruction. Since there are gaps in the sensor array, n>m𝑛𝑚n>mitalic_n > italic_m, so that this is an underdetermined problem. Further, since we’re modeling an incoherent imaging system, the PSF and therefore the entries of 𝐀𝐀\mathbf{A}bold_A are strictly nonnegative. The best we can do for a general imaging system is maximize the rank of 𝐀𝐀\mathbf{A}bold_A (i.e., m𝑚mitalic_m); however, not all PSFs that maximize the rank of 𝐀𝐀\mathbf{A}bold_A are desirable. For example, a delta function PSF is the “best” PSF in terms of having zero translational ambiguity – in other words, its corresponding 𝐀𝐀\mathbf{A}bold_A not only has maximal rank m𝑚mitalic_m, but also has orthogonal rows. However, such an 𝐀𝐀\mathbf{A}bold_A matrix is effectively an m×m𝑚𝑚m\times mitalic_m × italic_m identity matrix, as the remaining columns are all zeros – in other words, certain parts of 𝐱𝐱\mathbf{x}bold_x would never be sampled, resulting in a gappy reconstruction.

Thus, if we want 𝐀𝐀\mathbf{A}bold_A to be a nonnegative, m×n𝑚𝑛m\times nitalic_m × italic_n matrix without any zero-only columns, we must accept a PSF that has non-zero autocorrelation (i.e., the rows of 𝐀𝐀\mathbf{A}bold_A cannot be mutually orthogonal). Nevertheless, we want the rows and columns of 𝐀𝐀\mathbf{A}bold_A to be as close to orthogonal as possible – doing so minimizes the condition number of 𝐀𝐀\mathbf{A}bold_A. In compressive sensing, this is minimizing the mutual coherence of 𝐀𝐀\mathbf{A}bold_A (the max normalized correlation between any two columns of 𝐀𝐀\mathbf{A}bold_A). Doing so ensures that no two positions in the FOV give rise to the same measurement on the sensor array. This minimization is also analogous to minimizing the autocorrelation of the PSF in a standard, gap-free LSI imaging system to improve its MTF, which we can generalize to our system by modulating the autocorrelation with the sensor array mask, 𝑀𝑎𝑠𝑘(𝐫)𝑀𝑎𝑠𝑘𝐫\mathit{Mask}(\mathbf{r})italic_Mask ( bold_r ). In particular, when testing candidate PSFs, rather than working with the very large 𝐀𝐀\mathbf{A}bold_A matrix, we can evaluate a normalized triple correlation:

C(𝚫𝐫,𝚫𝐫)=r𝑝𝑠𝑓(𝐫)𝑝𝑠𝑓(𝐫+𝚫𝐫)𝑀𝑎𝑠𝑘(𝐫+𝚫𝐫)r𝑝𝑠𝑓(𝐫)2𝑀𝑎𝑠𝑘(𝐫+𝚫𝐫),𝐶𝚫𝐫𝚫superscript𝐫subscript𝑟𝑝𝑠𝑓𝐫𝑝𝑠𝑓𝐫𝚫𝐫𝑀𝑎𝑠𝑘𝐫𝚫superscript𝐫subscript𝑟𝑝𝑠𝑓superscript𝐫2𝑀𝑎𝑠𝑘𝐫𝚫superscript𝐫C(\mathbf{\Delta r},\mathbf{\Delta r}^{\prime})=\frac{\sum_{r}\mathit{psf}(% \mathbf{r})\mathit{psf}(\mathbf{r}+\mathbf{\Delta r})\mathit{Mask}(\mathbf{r}+% \mathbf{\Delta r}^{\prime})}{\sum_{r}\mathit{psf}(\mathbf{r})^{2}\mathit{Mask}% (\mathbf{r}+\mathbf{\Delta r}^{\prime})},italic_C ( bold_Δ bold_r , bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_psf ( bold_r ) italic_psf ( bold_r + bold_Δ bold_r ) italic_Mask ( bold_r + bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_psf ( bold_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_Mask ( bold_r + bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG , (S2)

where we would like C(𝚫𝐫,𝚫𝐫)𝐶𝚫𝐫𝚫superscript𝐫C(\mathbf{\Delta r},\mathbf{\Delta r}^{\prime})italic_C ( bold_Δ bold_r , bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) to be close to 0 when 𝚫𝐫(0,0)𝚫𝐫00\mathbf{\Delta r}\neq(0,0)bold_Δ bold_r ≠ ( 0 , 0 ). The reason why 𝑀𝑎𝑠𝑘𝑀𝑎𝑠𝑘\mathit{Mask}italic_Mask also needs to be cross-correlated is that the PSF should have low autocorrelation, regardless of whether it is centered in a gap or on a sensor. In practice, it is sufficient to evaluate Eq. S2 for 𝑀𝑎𝑠𝑘𝑀𝑎𝑠𝑘\mathit{Mask}italic_Mask corresponding to 3×\times×3 sensors, as our imaging system is “periodically LSI”.

For a gap-free imaging system with a PSF containing k𝑘kitalic_k impulses, the lowest possible max autocorrelation for 𝚫𝐫(0,0)𝚫𝐫00\mathbf{\Delta r}\neq(0,0)bold_Δ bold_r ≠ ( 0 , 0 ) would be 1/k1𝑘1/k1 / italic_k – at most one impulse overlapping, which is only achievable if the PSF doesn’t have inversion symmetry (proof: PSF points 𝚫𝐫i𝚫subscript𝐫𝑖\mathbf{\Delta r}_{i}bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝚫𝐫j𝚫subscript𝐫𝑗\mathbf{\Delta r}_{j}bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT will simultaneously overlap with 𝚫𝐫j𝚫subscript𝐫𝑗-\mathbf{\Delta r}_{j}- bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝚫𝐫i𝚫subscript𝐫𝑖-\mathbf{\Delta r}_{i}- bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively). Thus, our PSF design should not have inversion symmetry. However, for a gappy sensor, the triple correlation could be higher than 1/k1𝑘1/k1 / italic_k, because not all k𝑘kitalic_k impulses may be visible at a given position in the sensor plane (this inflation is accounted for by the presence of 𝑀𝑎𝑠𝑘𝑀𝑎𝑠𝑘\mathit{Mask}italic_Mask in the denominator of Eq. S2).

Narrowest possible spatial extent. While the PSF needs to be larger than the largest inter-sensor gap, it should be no larger due to the lateral chromatic aberrations of the DOE. In particular, since the diffraction pattern off of the DOE scales with the wavelength, an off-axis spot experiences radial smearing (δr𝛿𝑟\delta ritalic_δ italic_r) proportionally with distance from the zero diffraction order,

δrΔRΔλλ,proportional-to𝛿𝑟Δ𝑅Δ𝜆𝜆\delta r\propto\frac{\Delta R\Delta\lambda}{\lambda},italic_δ italic_r ∝ divide start_ARG roman_Δ italic_R roman_Δ italic_λ end_ARG start_ARG italic_λ end_ARG , (S3)

where ΔRΔ𝑅\Delta Rroman_Δ italic_R is the radial distance from the zero order, ΔλΔ𝜆\Delta\lambdaroman_Δ italic_λ is the bandwidth of the detected light and λ𝜆\lambdaitalic_λ is the center wavelength (see Sec. S4 for further discussions). Practically, this equation places a limit on the bandwidth of the detected light for a given spot size on the sensor. For the darkfield imaging mode, we use a 2-nm bandwidth bandpass filter as a compromise between resolution and light throughput. However, for the fluorescence imaging mode, we take advantage of the wavelength-dependent focusing properties of the objective (i.e., axial chromatic aberration) to mitigate the radial blurring effects. In practice, we specify a bounding box within which to propose the random points that make up the PSF.

PSF sparsity. The fewer pixels into which our imaging system maps each point in the sample, the less sensor read noise we incur. This criterion also suggests that a good PSF design would be a collection of points. In practice, we have to balance this criterion with the minimal translational ambiguity criterion, which we found required a minimum number of points to satisfy. Furthermore, the more points, k𝑘kitalic_k, in the PSF, the smaller the max possible autocorrelation (1/k1𝑘1/k1 / italic_k); however, we prioritized PSF sparsity due to the low light levels of our imaging applications.

Robustness to scaling errors. Due to experimental imperfections, the actual PSF size may not match the design PSF size. For example, since the PSF size is proportional to the tube lens focal length, any error therein would lead to an artifactually large or small PSF. Thus, we searched for a PSF that would satisfy all the other criteria over a broad range of magnification errors.

Refer to caption
Figure S1: (a) The PSF should be visible on all sensors (gray rectangles), regardless of its position. While a 5-point PSF arranged in a “+” is not visible everywhere, a 3×\times×3 grid PSF slightly larger than the largest inter-sensor gap is visible everywhere. (b) The PSF should be minimally ambiguous with translation. While the 3×\times×3 PSF is visible everywhere, it produces translational ambiguities. The optimized PSF has no strict ambiguities. (c) The three steps in proposing a candidate PSF in the PSF search algorithm.

S1.2 PSF optimization procedure

The goal was to optimize a PSF parameterized by the positions of a distribution of points. Gradient-based methods are unsuitable because the gradients of the visibility and minimal ambiguity criteria with respect to the PSF point positions are zero almost everywhere. We thus opted for a gradient-free random search strategy, which involved systematically proposing random distribution points as the PSF candidates and checking adherence to the aforementioned criteria. In principle, there are an infinite number of suitable PSFs, so in practice, we ranked the suitable PSFs we obtained, based on their robustness to scaling errors. We caution that the PSF produced by the following procedure is not optimal by any metric, but rather is sufficient in practice for our imaging system.

Step 1: Propose candidate PSF. We specified a number of points, k𝑘kitalic_k, the PSF can have, and a box bounding the PSF points, which we set to 1.3 times larger than the largest rectangular inter-sensor gap. We found that uniform random proposals overwhelmingly generated PSFs that failed the pan-visibility and minimal ambiguity criteria. Instead, we employed two strategies that increased the probability of generating a good PSF: 1) using a fixed five-point template just barely larger than the largest rectangular gap (Fig. S1c), and 2) proposing PSFs with inversion symmetry (we break the symmetry later – see step 5 below). Due to this symmetry plus the zero order spot, k𝑘kitalic_k is always odd.

Step 2: Check adherence to pan-visibility criterion This step was fast, and screened out many candidates, before computing the triple correlation in the next step, which was much slower.

Step 3: Compute triple correlation efficiently and identify shift ambiguities. Since triple-correlating three 2D arrays (6D total) would be very slow, especially given the sizes of the arrays, we instead exploited a shortcut possible due to the multi-impulse parameterization of the PSF. Specifically, we considered Eq. S2 as a modulated cross-correlation between 𝑝𝑠𝑓(𝐫)𝑝𝑠𝑓𝐫\mathit{psf}(\mathbf{r})italic_psf ( bold_r ) and a windowed version, 𝑝𝑠𝑓(𝐫)𝑀𝑎𝑠𝑘(𝐫+𝚫𝐫)𝑝𝑠𝑓𝐫𝑀𝑎𝑠𝑘𝐫𝚫superscript𝐫\mathit{psf}(\mathbf{r})\mathit{Mask}(\mathbf{r}+\mathbf{\Delta r}^{\prime})italic_psf ( bold_r ) italic_Mask ( bold_r + bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where the former is a collection of k𝑘kitalic_k points, 𝚫𝐫i𝚫subscript𝐫𝑖\mathbf{\Delta r}_{i}bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the latter is a collection of kksuperscript𝑘𝑘k^{\prime}\leq kitalic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_k points, 𝚫𝐫j𝚫subscript𝐫𝑗\mathbf{\Delta r}_{j}bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Rather than create two 2D arrays to cross-correlate, we computed the pairwise 2D vectors, 𝚫𝚫𝐫𝑖𝑗=𝚫𝐫i𝚫𝐫j𝚫𝚫subscript𝐫𝑖𝑗𝚫subscript𝐫𝑖𝚫subscript𝐫𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}=\mathbf{\Delta r}_{i}-\mathbf{\Delta r}_% {j}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT = bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and counted the number of unique 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT vectors. Note, then, that the computational complexity of the 2D cross-correlation scales like kksuperscript𝑘𝑘k^{\prime}kitalic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_k (where k=15𝑘15k=15italic_k = 15 for our final PSF), rather than with the number of pixels in the extended PSF.

After counting the unique 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT’s, we identified the maximum number of occurrences of any one 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT (in other words, the number of points overlapping when the PSF and its modulated copy were offset by a shift of 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT) – if this number was equal to ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and there were at least two different 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT’s that occurred ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT times (one of them being 𝚫𝚫𝐫𝑖𝑗=(0,0)𝚫𝚫subscript𝐫𝑖𝑗00\mathbf{\Delta\Delta r}_{\mathit{ij}}=(0,0)bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT = ( 0 , 0 )), then each additional 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT besides (0,0)00(0,0)( 0 , 0 ) was a potential 2D shift that produced an ambiguity in the proposed PSF. Put another way, this condition meant that all ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT points of the modulated PSF fully overlapped with the k𝑘kitalic_k-point PSF at at least two unique relative shifts (one of them being zero shift). We then tested for the only situation that this was not an ambiguity, which was if more than ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT points of the full PSF shifted by 𝚫𝚫𝐫𝑖𝑗𝚫𝚫subscript𝐫𝑖𝑗\mathbf{\Delta\Delta r}_{\mathit{ij}}bold_Δ bold_Δ bold_r start_POSTSUBSCRIPT italic_ij end_POSTSUBSCRIPT were seen by the sensor array – the additional points break the ambiguity.

We must still repeat step 3 for all 𝚫𝐫𝚫superscript𝐫\mathbf{\Delta r}^{\prime}bold_Δ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT – that is, shifting the PSF to different parts of the sensor array (which incidentally also may change ksuperscript𝑘k^{\prime}italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT).

Step 4: Save the PSF and compute its scale robustness. Many candidate PSFs based on our sampling heuristic make it this far. We differentiate these candidate PSFs based on how robust they are to scaling errors. In particular, we isotropically scale the PSF by factors between s=𝑠absents=italic_s = 0.8 and 1.2 and rerun steps 2 and 3. Supposing that scaling the PSF by any s𝑠sitalic_s value within interval of [s𝑚𝑖𝑛,s𝑚𝑎𝑥]subscript𝑠𝑚𝑖𝑛subscript𝑠𝑚𝑎𝑥[s_{\mathit{min}},s_{\mathit{max}}][ italic_s start_POSTSUBSCRIPT italic_min end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_max end_POSTSUBSCRIPT ] passes steps 2 and 3, we report s𝑚𝑎𝑥/s𝑚𝑖𝑛subscript𝑠𝑚𝑎𝑥subscript𝑠𝑚𝑖𝑛s_{\mathit{max}}/s_{\mathit{min}}italic_s start_POSTSUBSCRIPT italic_max end_POSTSUBSCRIPT / italic_s start_POSTSUBSCRIPT italic_min end_POSTSUBSCRIPT as the PSF’s scale robustness factor. After running steps 1-3 many times and producing many candidate PSFs, we picked the one with the highest scale robustness factor.

Step 5: Break PSF symmetry. We previously enforced inversion symmetry in the proposed PSFs to decrease the rejection rate. However, symmetric PSFs lead to larger autocorrelations, as discussed in Sec. S1.1. Essentially, if one picks any two non-opposite PSF points, 𝚫𝐫i𝚫subscript𝐫𝑖\mathbf{\Delta r}_{i}bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝚫𝐫j𝚫subscript𝐫𝑗\mathbf{\Delta r}_{j}bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, they will simultaneously overlap with 𝚫𝐫j𝚫subscript𝐫𝑗-\mathbf{\Delta r}_{j}- bold_Δ bold_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝚫𝐫i𝚫subscript𝐫𝑖-\mathbf{\Delta r}_{i}- bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively, leading to autocorrelation peaks with amplitude 2/k2𝑘2/k2 / italic_k. There are (k choose 2)(k1)/2𝑘 choose 2𝑘12(k\text{ choose }2)-(k-1)/2( italic_k choose 2 ) - ( italic_k - 1 ) / 2 of such autocorrelation peaks (the second term arises due to the fact that picking opposite PSF points, 𝚫𝐫i𝚫subscript𝐫𝑖\mathbf{\Delta r}_{i}bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝚫𝐫i𝚫subscript𝐫𝑖-\mathbf{\Delta r}_{i}- bold_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, does not lead to overlapping).

Thus, in this step, we perturb the PSF point coordinates to break the symmetry, allowing the max autocorrelation of the PSF to be 1/k1𝑘1/k1 / italic_k. In other words, we wanted to split the 2/k2𝑘2/k2 / italic_k-amplitude peaks into pairs of 1/k1𝑘1/k1 / italic_k-amplitude peaks, so that the split distance was as long as possible. If we pose this problem as a maximization of the minimum possible split distance among all the splits, it becomes equivalent to a circle packing problem (for (k1)/2𝑘12(k-1)/2( italic_k - 1 ) / 2 circles). Since there’s no guarantee that the perturbed PSF preserves the scale robustness factor, and since there’s no inherent ordering of the (k1)/2𝑘12(k-1)/2( italic_k - 1 ) / 2 circles, we randomly permuted the 2D perturbation vectors and picked the order that maximized the scale robustness factor.

Step 6: Repeat the above procedure for other values of k𝑘kitalic_k. We wanted k𝑘kitalic_k to be as small as possible to minimize the number of pixels into which the light would be distributed, but large enough that the PSF satisfies the pan-visibility and minimal ambiguity conditions with a large scale robustness factor. We empirically found that k=13𝑘13k=13italic_k = 13 was sufficient, but yielded low scale robustness factors (<<<1.051.051.051.05), so we opted for k=15𝑘15k=15italic_k = 15 to obtain larger robustness factors (similar-to\sim1.15).

Refer to caption
Figure S2: Optimized PSF, positioned such that the zero order is at the center of the gap of a 2×\times×2 sensor subarray. The individual PSF impulses are enlarged so that they are visible.

S1.3 Final PSF design

The final PSF design is shown in Fig. S2 and its impulse coordinates are listed in Table S1.

x (mm) 3.3 3.0 3.1 1.5 -0.2 3.1 2.9 -3.5 -3.0 -2.9 -1.7 0.2 -2.9 -3.1 0
y (mm) 3.0 -2.5 -0.5 1.7 -2.5 -2.3 2.3 -3.0 2.7 0.7 -1.9 2.3 2.1 -2.1 0
Table S1: The optimized PSF coordinates, relative to the zero order (the last coordinate).

S2 Field of view extrapolation

The large spatial extent of the PSF on the sensor (6 mm ×\times× 6.8 mm) enables extrapolation beyond the native rectangular hull of the sensor array (4.95 cm ×\times× 6.64 cm). If we require half of the width or height of the PSF to be within the native rectangular hull, then the extrapolated FOV is simply the sum of the corresponding dimensions: 5.63 cm ×\times× 7.24 cm. Here, we validate that we can obtain good reconstruction quality in this extrapolation region by imaging a 2D target, translated well beyond the edge of the extrapolated FOV (Fig. S3).

Refer to caption

Figure S3: Our method enables extrapolation beyond the rectangular hull of the sensor array (i.e., the smallest box bounding all 48 sensors, denoted by blue boxes). This figure shows the same target as in Fig. 2b translated and reconstructed beyond the left (a), right (b), top (c), and bottom (d) edges.

S3 Comparison of shift-invariant and shift-variant model

In the forward model described in Sec. 2.3, we introduced a convolutional forward model, interpreting it as an incoherent superposition of multiple shifted copies of the object without explicitly using convolution. This interpretation also allows full modeling of the distortion-induced spatial variance of the PSF. In this section, we justify this “direct” approach by demonstrating that it not only leads to more accurate reconstructions, but also uses the least memory and does not significantly trade-off speed compared to FFT-based approaches. In particular, here, we compare three algorithms: full-FFT, patch-FFT, and direct:

Full-FFT: a shift-invariant model utilizing FFT-based convolution, as described in Eq. 2. To achieve effective interpolation of the PSF’s FFT, we pad the PSF to match the reconstruction size and then perform matrix multiplication in the frequency domain.

Patch-FFT: a patch-wise shift-variant model using FFT-based convolution. The FOV is divided into patches corresponding to the sensor array (e.g., 6×\times×8 patches). For each sensor patch, we determined a support region encompassing all potential object-space points contributing to that sensor patch. The size of this region is the sum of the sensor size and the PSF size. In the forward model, we employ a sliding window to extract patches based on this support, apply FFT-based convolution with each patch’s corresponding PSF, and crop the resulting sensor region to predict the measurement.

Direct: a fully shift-variant model utilizing direct superposition, as formulated in Eq. 7. This approach implements a direct convolution, where the PSF is approximated as delta functions, with all zero-multiplication operations eliminated for computational efficiency.

Refer to caption

Figure S4: Performance comparison across different algorithms: full-FFT, patch-FFT, and direct. Despite of being slightly slower than patch-FFT, the direct method achieves the best reconstruction quality, particularly in the peripheral regions. Additionally, the direct method consumes the least memory, making it suitable for full-resolution reconstruction without downsampling.

In Fig. S4, we use a lithography target as the sample and compare the algorithms’ reconstruction against the ground truth captured by a regular microscope with 1x maginification. The algorithms’ time and memory performance metrics are evaluated using the JAX framework with its just-in-time (JIT) compilation, executed on a single NVIDIA RTX 3090 GPU. We apply 8x downsampling to the measurement for all the methods, resulting in a reconstruction with dimensions of 6.3k×\times×8.3k (52 MP). The analysis includes two key performance indicators: the total computation time for 100 optimization iterations and the peak memory usage. These metrics are displayed in the bottom right corner of the figure, where the top row indicates the time consumption, and the bottom row shows the peak memory usage as reported by the memory_stats field of the JAX device.

The direct approach yields the best reconstruction quality, followed by the patch-FFT and full-FFT approaches. When PSF shift variance is not properly accounted for (as in the full-FFT and patch-FFT method), the reconstruction suffers from either extra artifacts or eliminated objects. This trade-off underscores the importance of accurately modeling the shift-variant PSF distortion.

We notice that the improvement in quality from patch-FFT over full-FFT is relatively minor. This is possibly because the patches are divided uniformly across the FOV, while the shift variance is not uniformly distributed for a lens system. It is optimized to have relatively small distortions in the central region, whereas the peripheral region exhibits more drastic PSF variations. We also observe that even with the direct method, the squares in the corner regions (especially in the bottom left corner) are not well reconstructed. This degradation primarily stems from lens vignetting, which naturally attenuates light intensity in peripheral regions.

To ensure a fair comparison, we generated the central and patch-wise PSFs for both FFT methods using lens parameters co-optimized according to Sec. 2.3. Specifically, we jointly optimize the 2D object 𝑂𝑏𝑗(r)𝑂𝑏𝑗r\mathit{Obj}(\mathrm{r})italic_Obj ( roman_r ) and the lens distortion parameters {ai𝑡𝑢𝑏𝑒}i=1m,{ai𝑜𝑏𝑗}i=1msuperscriptsubscriptsuperscriptsubscript𝑎𝑖𝑡𝑢𝑏𝑒𝑖1𝑚superscriptsubscriptsuperscriptsubscript𝑎𝑖𝑜𝑏𝑗𝑖1𝑚\{a_{i}^{\mathit{tube}}\}_{i=1}^{m},\{a_{i}^{\mathit{obj}}\}_{i=1}^{m}{ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_tube end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , { italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_obj end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for both the tube lens and objective. The optimized lens parameters are then used to generate PSFs at the center of the full reconstruction and at each sensor’s center for the full-FFT and patch-FFT methods, respectively. For the direct method evaluation, we disable lens parameter optimization since these parameters need only be co-optimized once.

S4 Radial spectral blurring: SNR-resolution tradeoff

The wavelength dependence of DOEs can affect the spatial resolution of our computational microscope. In particular, the size of the PSF, as the far-field diffraction pattern of the DOE, is scaled in proportion to the wavelength:

𝑝𝑠𝑓(𝐫,λ)=𝑝𝑠𝑓(λλ0𝐫,λ0),𝑝𝑠𝑓𝐫𝜆𝑝𝑠𝑓𝜆subscript𝜆0𝐫subscript𝜆0\mathit{psf(\mathbf{r},\lambda)}=\mathit{psf}\left(\frac{\lambda}{\lambda_{0}}% \mathbf{r},\lambda_{0}\right),italic_psf ( bold_r , italic_λ ) = italic_psf ( divide start_ARG italic_λ end_ARG start_ARG italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG bold_r , italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , (S4)

where 𝐫=(x,y)𝐫𝑥𝑦\mathbf{r}=(x,y)bold_r = ( italic_x , italic_y ) is the 2D spatial coordinate in the image plane, λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the design wavelength, and λ𝜆\lambdaitalic_λ is an arbitrary wavelength. The overall broadband PSF is thus

𝑝𝑠𝑓P(λ)(𝐫)=0P(λ)𝑝𝑠𝑓(𝐫,λ)𝑑λ,subscript𝑝𝑠𝑓𝑃𝜆𝐫superscriptsubscript0𝑃𝜆𝑝𝑠𝑓𝐫𝜆differential-d𝜆\mathit{psf_{P(\lambda)}(\mathbf{r})}=\int_{0}^{\infty}P(\lambda)\mathit{psf(% \mathbf{r},\lambda)}d\lambda,italic_psf start_POSTSUBSCRIPT italic_P ( italic_λ ) end_POSTSUBSCRIPT ( bold_r ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_P ( italic_λ ) italic_psf ( bold_r , italic_λ ) italic_d italic_λ , (S5)

where P(λ)𝑃𝜆P(\lambda)italic_P ( italic_λ ) is the power spectral density of the detected light. As a result, the individual impulses of the PSF experience radial blurring, δr𝛿𝑟\delta ritalic_δ italic_r, in proportion to the bandwidth of the source, ΔλΔ𝜆\Delta\lambdaroman_Δ italic_λ, and the radial distance from the zero order impulse,

δrΔλλ0|𝐫|,proportional-to𝛿𝑟Δ𝜆subscript𝜆0𝐫\delta r\propto\frac{\Delta\lambda}{\lambda_{0}}|\mathbf{r}|,italic_δ italic_r ∝ divide start_ARG roman_Δ italic_λ end_ARG start_ARG italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG | bold_r | , (S6)

suggesting that the PSF should be as small as possible and that the bandwidth of the detected light be as narrow as possible. The tolerance to bandwidth and PSF size depends on how δr𝛿𝑟\delta ritalic_δ italic_r compares to the size of each PSF impulse, which in turn is determined by the tube lens NA and system aberrations. Assuming an impulse spot size of 8.8 µm (the size that is critically sampled in the 4×\times×4 superpixel, with a pixel size of 1.1 µm), a maximum radius of 3.5709 mm (given an inter-sensor gap size of 5.5504 mm by 4.4944 mm), and a design wavelength of 515 nm, the bandwidth that produces a radial blur of 8.8 µm is ΔλΔ𝜆absent\Delta\lambda\approxroman_Δ italic_λ ≈ 1.3 nm. Thus, we used a 2-nm bandpass filter for the darkfield imaging mode as a compromise between resolution and light throughput.

For the fluorescence imaging mode, it would have been unacceptable to impose a 1-2-nm bandpass filter, given how weak fluorescence signals tend to be. Instead, we used a 55-nm bandpass filter and took advantage of the chromatic aberrations in the objective lens. Thus, when the wavelength deviates from the design wavelength, λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we not only get scaling, but also defocus. We can modify Eq. S4 to incorporate blurring due to defocus:

𝑝𝑠𝑓(𝐫,λ)=𝑝𝑠𝑓(λλ0𝐫,λ0)𝑑𝑒𝑓𝑜𝑐𝑢𝑠(𝐫,dfdλ(λλ0)),𝑝𝑠𝑓𝐫𝜆tensor-product𝑝𝑠𝑓𝜆subscript𝜆0𝐫subscript𝜆0𝑑𝑒𝑓𝑜𝑐𝑢𝑠𝐫𝑑𝑓𝑑𝜆𝜆subscript𝜆0\mathit{psf(\mathbf{r},\lambda)}=\mathit{psf}\left(\frac{\lambda}{\lambda_{0}}% \mathbf{r},\lambda_{0}\right)\otimes\mathit{defocus}\left(\mathbf{r},\frac{df}% {d\lambda}(\lambda-\lambda_{0})\right),italic_psf ( bold_r , italic_λ ) = italic_psf ( divide start_ARG italic_λ end_ARG start_ARG italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG bold_r , italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⊗ italic_defocus ( bold_r , divide start_ARG italic_d italic_f end_ARG start_ARG italic_d italic_λ end_ARG ( italic_λ - italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , (S7)

where 𝑑𝑒𝑓𝑜𝑐𝑢𝑠(𝐫,dz)𝑑𝑒𝑓𝑜𝑐𝑢𝑠𝐫𝑑𝑧\mathit{defocus}(\mathbf{r},dz)italic_defocus ( bold_r , italic_d italic_z ) is a 2D blur kernel due to a defocus of dz𝑑𝑧dzitalic_d italic_z and df/dλ𝑑𝑓𝑑𝜆df/d\lambdaitalic_d italic_f / italic_d italic_λ describes the focal shift per wavelength shift governing the axial chromatic aberration.

Applying Eq. S5 results in a PSF that has a sharp in-focus component corresponding to the power at wavelengths close to the nominal wavelength, λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and heavy tails due to the defocused components corresponding to the other wavelength components. With sufficient axial chromatic aberration, the lateral resolution is less impacted – this principle has been previously applied for extended DOF imaging [34]. However, there is still a tradeoff between total light throughput and background signal when spectrally filtering the fluorescence emission, with SNR being the ultimate metric of interest. That is, while increasing total light throughput improves SNR, increasing the background, although it can be digitally subtracted, introduces shot noise – the total noise is a quadrature sum of the signal shot noise, the background shot noise, and the detector noise. Empirically, we found that the signal enhancement from increased light throughput outweighed the deleterious effects of background.

We have not included the effects of spectrally-dependent diffraction efficiency of the DOE in Eqs. S4-S7, which increases the power in the zero order as the wavelength deviates from λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, modifying the effective P(λ)𝑃𝜆P(\lambda)italic_P ( italic_λ ) for the nonzero orders. This effect could be compensated for in future designs by decreasing the zero order contribution at the λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, based on the expected fluorescence emission bandwidth. However, in practice, we found that the spectrally-dependent diffraction efficiency over the 55-nm bandwidth had minimal impact.

S5 L1 hyperparameter sweep

Refer to caption

Figure S5: Reconstruction MSE across different ground truth object density with different L1 regularization strength.

Fig. S5 demonstrates how reconstruction MSE varies with object density and L1 regularization strength. For dense objects, weaker sparsity constraints (lower L1 regularization) yield better results. In contrast, relatively sparse objects exhibit an optimal L1 regularization strength, with performance degrading when the regularization is either too weak or too strong.

These results were obtained through simulated measurements. We generated ground truth samples with randomly scattered dots at specified densities and simulate the measurements through the forward model. The analysis employed the direct method detailed in Sec. S3, assuming accurately known lens parameters.

S6 Separable distortion model

Fig. S6 illustrates the separable distortion model described in Sec. 2.3.

Refer to caption

Figure S6: We model the spatially-varying PSF of our imaging system based on a separable radial distortion model, whereby the objective and tube lens each has its own radial distortion model (i.e., radially-varying magnifications: M𝑜𝑏𝑗(𝐫)subscript𝑀𝑜𝑏𝑗𝐫M_{\mathit{obj}}(\mathbf{r})italic_M start_POSTSUBSCRIPT italic_obj end_POSTSUBSCRIPT ( bold_r ) and M𝑡𝑢𝑏𝑒(𝐫)subscript𝑀𝑡𝑢𝑏𝑒𝐫M_{\mathit{tube}}(\mathbf{r})italic_M start_POSTSUBSCRIPT italic_tube end_POSTSUBSCRIPT ( bold_r )). In an ideal imaging system with unit magnification, a position at point 𝐫psubscript𝐫𝑝\mathbf{r}_{p}bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT in the object plane is mapped to a collection of points, 𝐫pΔ𝐫isubscript𝐫𝑝Δsubscript𝐫𝑖-\mathbf{r}_{p}-\Delta\mathbf{r}_{i}- bold_r start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - roman_Δ bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, comprising the PSF. The total system magnification is thus a product of the independent contributions from the objective (M𝑜𝑏𝑗(𝐫)subscript𝑀𝑜𝑏𝑗𝐫M_{\mathit{obj}}(\mathbf{r})italic_M start_POSTSUBSCRIPT italic_obj end_POSTSUBSCRIPT ( bold_r )), the tube lens (M𝑡𝑢𝑏𝑒(𝐫)subscript𝑀𝑡𝑢𝑏𝑒𝐫M_{\mathit{tube}}(\mathbf{r})italic_M start_POSTSUBSCRIPT italic_tube end_POSTSUBSCRIPT ( bold_r )), and the nominal magnification (M0subscript𝑀0M_{0}italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, given by the ratio of the focal lengths of the objective and tube lens).

S7 Resolution, depth of field, and field curvature characterization

As described in Sec. 3.1, to characterize resolution, DOF, and field curvature of the two imaging configurations, we took z-stacks of a USAF target centered at each of the sensors (after magnification by the imaging system) and computed the sharpness based on the mean square image gradient (Fig. S7c,d,g,h). In most cases, we could identify the focal position via the argmax of this sharpness metric across z. However, recognizing that at certain field positions, our imaging system exhibits astigmatism, we identified the circle of least confusion as the focal position by finding the local minimum flanked by two local maxima, corresponding to the two astigmatic foci (e.g., see rightmost column of Fig. S7c). The USAF images at the best-focus positions across all sensors are plotted in Figs. S8-S11. We removed any tilt between the lateral scan trajectory and the focal plane of the objective by fitting and subtracting the plane of best fit, yielding Fig. S7a,e. Finally, we computed the DOFs based on the FWHM of the sharpness curves (Fig. S7b,f).

Refer to caption

Figure S7: Field curvature and depth of field (DOF) characterization for the darkfield (a-d) and fluorescence (e-h) imaging modes. (a,e) Focal shift at the centers of all 36 or 48 sensors. (b,f) Depth of field at the centers of all 36 or 48 sensors. (c,g) Sharpness as a function of depth and a central row of the sensor array, normalized to the max sharpness. (d,h) Normalized sharpness as a function of depth and central column of the sensor array.

Refer to caption

Figure S8: Characterization of resolution across all 6×8686\times 86 × 8 sensors in the darkfield imaging configuration. Groups 6 and higher are shown. Zoom-ins of groups 8 and 9 are shown in Fig. S9.

Refer to caption

Figure S9: Characterization of resolution across all 6×8686\times 86 × 8 sensors in the darkfield imaging configuration. Groups 8 and 9 are shown.

Refer to caption

Figure S10: Characterization of resolution across all 6×6666\times 66 × 6 sensors in the fluorescence imaging configuration. Groups 6 and higher are shown. Zoom-ins of groups 8 and 9 are shown in Fig. S11.

Refer to caption

Figure S11: Characterization of resolution across all 6×6666\times 66 × 6 sensors in the fluorescence imaging configuration. Groups 8 and 9 are shown.

References

  • [1] J. Park, D. J. Brady, G. Zheng, et al., “Review of bio-optical imaging systems with a high space-bandwidth product,” \JournalTitleAdvanced Photonics 3, 044001–044001 (2021).
  • [2] M. G. Hanna, A. Parwani, and S. J. Sirintrapun, “Whole slide imaging: technology and applications,” \JournalTitleAdvances in Anatomic Pathology 27, 251–259 (2020).
  • [3] B. Potsaid, Y. Bellouard, and J. T. Wen, “Adaptive scanning optical microscope (asom): A multidisciplinary optical microscope design for large field of view and high resolution imaging,” \JournalTitleOptics express 13, 6504–6518 (2005).
  • [4] R. Shi, X. Chen, J. Deng, et al., “Random-access wide-field mesoscopy for centimetre-scale imaging of biodynamics with subcellular resolution,” \JournalTitleNature Photonics 18, 721–730 (2024).
  • [5] G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” \JournalTitleNature photonics 7, 739–745 (2013).
  • [6] P. C. Konda, L. Loetgering, K. C. Zhou, et al., “Fourier ptychography: current applications and future promises,” \JournalTitleOptics express 28, 9603–9630 (2020).
  • [7] G. Zheng, C. Shen, S. Jiang, et al., “Concept, implementations and applications of Fourier ptychography,” \JournalTitleNature Reviews Physics 3, 207–223 (2021).
  • [8] M. Lelek, M. T. Gyparaki, G. Beliu, et al., “Single-molecule localization microscopy,” \JournalTitleNature reviews methods primers 1, 39 (2021).
  • [9] M. G. Gustafsson, “Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy,” \JournalTitleJournal of microscopy 198, 82–87 (2000).
  • [10] L. Tian, X. Li, K. Ramchandran, and L. Waller, “Multiplexed coded illumination for Fourier ptychography with an LED array microscope,” \JournalTitleBiomedical optics express 5, 2376–2389 (2014).
  • [11] L. Tian, Z. Liu, L.-H. Yeh, et al., “Computational illumination for high-speed in vitro Fourier ptychographic microscopy,” \JournalTitleOptica 2, 904–911 (2015).
  • [12] R. Malik and K. Khare, “Single-shot extended field of view imaging using point spread function engineering,” \JournalTitleJOSA A 40, 1066–1075 (2023).
  • [13] R. Horisaki and J. Tanida, “Multi-channel data acquisition using multiplexed imaging with spatial encoding,” \JournalTitleOptics express 18, 23041–23053 (2010).
  • [14] R. H. Shepard, Y. Rachlin, V. Shah, and T. Shih, “Design architectures for optically multiplexed imaging,” \JournalTitleOptics express 23, 31419–31435 (2015).
  • [15] X. Yao, V. Pathak, H. Xi, et al., “Increasing a microscope’s effective field of view via overlapped imaging and machine learning,” \JournalTitleOptics Express 30, 1745–1761 (2022).
  • [16] R. Cao, N. S. Divekar, J. K. Nuñez, et al., “Neural space–time model for dynamic multi-shot imaging,” \JournalTitleNature Methods pp. 1–6 (2024).
  • [17] M. Kellman, M. Chen, Z. F. Phillips, et al., “Motion-resolved quantitative phase imaging,” \JournalTitleBiomedical optics express 9, 5456–5466 (2018).
  • [18] J. Fan, J. Suo, J. Wu, et al., “Video-rate imaging of biological dynamics at centimetre scale and micrometre resolution,” \JournalTitleNature Photonics 13, 809–816 (2019).
  • [19] K. C. Zhou, M. Harfouche, C. L. Cooke, et al., “Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second,” \JournalTitleNature photonics 17, 442–450 (2023).
  • [20] M. Harfouche, K. Kim, K. C. Zhou, et al., “Imaging across multiple spatial scales with the multi-camera array microscope,” \JournalTitleOptica 10, 471–480 (2023).
  • [21] R. W. Gerchberg, “A practical algorithm for the determination of plane from image and diffraction pictures,” \JournalTitleOptik 35, 237–246 (1972).
  • [22] D. R. Albrecht and C. I. Bargmann, “High-content behavioral analysis of Caenorhabditis elegans in precise spatiotemporal chemical environments,” \JournalTitleNature methods 8, 599–605 (2011).
  • [23] S. R. Lockery, K. J. Lawton, J. C. Doll, et al., “Artificial dirt: microfluidic substrates for nematode neurobiology and behavior,” \JournalTitleJournal of neurophysiology 99, 3136–3143 (2008).
  • [24] J. Liu, E. Bonnard, and M. Scholz, “Adapting and optimizing GCaMP8f for use in Caenorhabditis elegans,” \JournalTitleGenetics 228, iyae125 (2024).
  • [25] J. Larsch, D. Ventimiglia, C. I. Bargmann, and D. R. Albrecht, “High-throughput imaging of neuronal activity in caenorhabditis elegans,” \JournalTitleProceedings of the National Academy of Sciences of the United States of America 110, E4266–E4273 (2013).
  • [26] S. R. P. Pavani, M. A. Thompson, J. S. Biteen, et al., “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” \JournalTitleProceedings of the National Academy of Sciences 106, 2995–2999 (2009).
  • [27] F. Linda Liu, G. Kuo, N. Antipa, et al., “Fourier diffuserscope: single-shot 3D Fourier light field microscopy with a diffuser,” \JournalTitleOptics Express 28, 28969–28986 (2020).
  • [28] S. Abrahamsson, J. Chen, B. Hajj, et al., “Fast multicolor 3D imaging using aberration-corrected multifocus microscopy,” \JournalTitleNature methods 10, 60–63 (2013).
  • [29] N. Antipa, G. Kuo, R. Heckel, et al., “DiffuserCam: lensless single-exposure 3D imaging,” \JournalTitleOptica 5, 1–9 (2017).
  • [30] E. Zhao, N. Deshler, K. Monakhova, and L. Waller, “Multi-sensor lensless imaging: Synthetic large-format sensing with a disjoint sensor array,” in Computational optical sensing and imaging, (Optica Publishing Group, 2020), pp. CF2C–6.
  • [31] S. Brenner, “The genetics of Caenorhabditis elegans,” \JournalTitleGenetics 77, 71–94 (1974).
  • [32] S. R. Lockery, K. J. Lawton, J. C. Doll, et al., “Artificial dirt: microfluidic substrates for nematode neurobiology and behavior,” \JournalTitleJournal of neurophysiology 99, 3136–3143 (2008).
  • [33] S. Park, H. Hwang, S.-W. Nam, et al., “Enhanced caenorhabditis elegans locomotion in a structured microfluidic environment,” \JournalTitlePloS one 3, e2550 (2008).
  • [34] O. Cossairt and S. Nayar, “Spectral focal sweep: Extended depth of field from chromatic aberrations,” in 2010 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2010), pp. 1–8.