Abstract
Hyperspectral and high-speed imaging are both important for scene representation and understanding. However, simultaneously capturing both hyperspectral and high-speed data is still under-explored. In this work, we propose a high-speed hyperspectral imaging system by integrating compressive sensing sampling with bioinspired neuromorphic sampling. Our system includes a coded aperture snapshot spectral imager capturing moderate-speed hyperspectral measurement frames and a spike camera capturing high-speed grayscale dense spike streams. The two cameras provide complementary dual-modality data for reconstructing high-speed hyperspectral videos (HSV). To effectively synergize the two sampling mechanisms and obtain high-quality HSV, we propose a unified multi-modal reconstruction framework. The framework consists of a Spike Spectral Prior Network for spike-based information extraction and prior regularization, coupled with a dual-modality iterative optimization algorithm for reliable reconstruction. We finally build a hardware prototype to verify the effectiveness of our system and algorithm design. Experiments on both simulated and real data demonstrate the superiority of the proposed approach, where for the first time to our knowledge, high-speed HSV with 30 spectral bands can be captured at a frame rate of up to 20,000 FPS.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
For example, a Phantom v2640 camera (maximum 18,120 FPS at 480 p) has a size of 28 \(\times \) 19 \(\times \) 18.88 cm, a weight of 8.1 kg, and costs more than $100,000. During imaging, it generates nearly 37 GB of data per second.
Please refer to the supplementary material for more details on why a DVS sensor is not selected for our work.
We further provide a detailed discussion in Sect. 5.5.1 and the supplementary material.
Different from the simulation, we do not take DBR and 3 S into comparison with real data, mainly due to their prohibitive time complexity (more than 10 h to reconstruct a 10-frame HSV clip).
Except for the first convolution layer, which has \(n_k=32\) input channels and takes \(n_k\) spike planes at once.
Please refer to the supplementary material for detailed theoretical analysis.
References
Arad, B., & Ben-Shahar, O. (2016). Sparse recovery of hyperspectral signal from natural RGB images. In Computer vision–ECCV 2016: 14th European conference.
Arce, G. R., Brady, D. J., Carin, L., Arguello, H., & Kittle, D. S. (2014). Compressive coded aperture spectral imaging: An introduction. IEEE Signal Processing Magazine, 31(1), 105–115.
Bajestani S. E. M., & Beltrame, G. (2023). Event-based RGB sensing with structured light. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 5458–5467.
Bergman, S. M. (1996). The utility of hyperspectral data to detect and discriminate actual and decoy target vehicles. Master’s Thesis of Science in Systems Technology.
Bioucas-Dias, J. M., & Figueiredo, M. A. T. (2007). A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 16(12), 2992–3004.
Brady, D. J. (2009). Optical imaging and spectroscopy. Hoboken: Wiley-Blackwell.
Cai, Y., Lin, J., Hu, X., Wang, H., Yuan, X., Zhang, Y., Timofte, R., & Van Gool, L. (2022). Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17502–17511.
Cao, X., Du, H., Tong, X., Dai, Q., & Lin, S. (2011). A prism-mask system for multispectral video acquisition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2423–2435.
Cao, X., Yue, T., Lin, X., Lin, S., Yuan, X., Dai, Q., Carin, L., & Brady, D. J. (2016). Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world. IEEE Signal Processing Magazine, 33(5), 95–108.
Chakrabarti, A., & Zickler, T. (2011). Statistics of real-world hyperspectral images. In Proceedings of the IEEE conference on computer vision and pattern Recognition, IEEE.
Chang, Y., Yan, L., & Zhong, S. (2017). Hyper-Laplacian regularized unidirectional low-rank tensor recovery for multispectral image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE.
Chen, Y., Wang, Y., & Zhang, H. (2023). Prior image guided snapshot compressive spectral imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 11096–11107.
Cho, D., & Lee, T. (2015). A review of bioinspired vision sensors and their applications. Sensors and Materials, 27, 1.
Delbruck, T., Linares-Barranco, B., Culurciello, E., & Posch, C. (2010). Activity-driven, event-based vision sensors. In Proceedings of 2010 IEEE international symposium on circuits and systems, IEEE.
Descour, M., & Dereniak, E. (1995). Computed-tomography imaging spectrometer: Experimental calibration and reconstruction results. Applied Optics, 34(22), 4817–4826.
Dong, S., Huang, T., & Tian, Y. (2017). Spike camera and its coding methods. In Proceedings of the data compression conference, IEEE.
Etoh, T. G., Poggemann, D., Kreider, G., Mutoh, H., Theuwissen, A. J. P., Ruckelshausen, A., Kondo, Y., Maruno, H., Takubo, K., Soya, H., Takehara, K., Okinaka, T., & Takano, Y. (2003). An image sensor which captures 100 consecutive frames at 1,000,000 frames/s. IEEE Transactions on Electron Devices, 50(1), 144–151.
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., & Tian, Y. (2021). Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, IEEE.
Figueiredo, M. A. T., Nowak, R. D., & Wright, S. J. (2007). Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 1(4), 586–597.
Fu, Y., Lam, A., Sato, I., & Sato, Y. (2015). Adaptive spatial-spectral dictionary learning for hyperspectral image denoising. In Proceedings of the IEEE international conference on computer vision, pp. 343–351.
Gehm, M. E., John, R., Brady, D. J., Willett, R. M., & Schulz, T. J. (2007). Single-shot compressive spectral imaging with a dual-disperser architecture. Optics Express, 15(21), 14013–14027.
Gollisch, T., & Meister, M. (2008). Rapid neural coding in the retina with relative spike latencies. Science, 319(5866), 1108–1111.
He, W., Yokoya, N., & Yuan, X. (2021). Fast hyperspectral image recovery of dual-camera compressive hyperspectral imaging via non-iterative subspace-based fusion. IEEE Transactions on Image Processing, 30, 7170–7183.
Hu, X., Cai, Y., Lin, J., Wang, H., Yuan, X., Zhang, Y., Timofte, R., & Van Gool, L. (2022b). Hdnet: High-resolution dual-domain learning for spectral compressive imaging. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17542–17551.
Hu, L., Zhao, R., Ding, Z., Ma, L., Shi, B., Xiong, R., & Huang, T. (2022a). Optical flow estimation for spiking camera. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Huang, T., Zheng, Y., Yu, Z., Chen, R., Li, Y., Xiong, R., Ma, L., Zhao, J., Dong, S., & Zhu, L, et al. (2022a). 1000\(\times \) faster camera and machine vision with ordinary devices. Engineering.
Huang, Z., Zhang, T., Heng, W., Shi, B., & Zhou, S. (2022b). Real-time intermediate flow estimation for video frame interpolation. In European conference on computer vision.
Jähne, B. (2010). EMVA 1288 standard for machine vision: Objective specification of vital camera data. Optik & Photonik, 5(1), 53–54.
Jähne, B. (2020). Release 4 of the EMVA 1288 standard: Adaption and extension to modern image sensors. M. Heizmann| T. Längle p. 13.
James, J. (2009). Spectrograph design fundamentals. Cambridge: Cambridge University Press.
Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., & Liu, Y. (2020). Learning event-based motion deblurring. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3317–3326.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In The international conference on learning representations.
Kittle, D., Choi, K., Wagadarikar, A., & Brady, D. J. (2010). Multiframe image estimation for coded aperture snapshot spectral imagers. Applied Optics, 49(36), 6824–6833.
Kleinfelder, S., Lim, S., Liu, X., & El Gamal, A. (2001). A 10000 frames/s CMOS digital pixel sensor. IEEE Journal of Solid-State Circuits, 36(12), 2049–2059.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review Society for Industrial and Applied Mathematics, 51(3), 455–500.
Kornblith, S., Norouzi, M., Lee, H., & Hinton, G. (2019). Similarity of neural network representations revisited. In International conference on machine learning, vol. 97, pp. 3519–3529.
Kostadin, D., Alessandro, F., & Karen, E. (2007). Video denoising by sparse 3d transform-domain collaborative filtering. In The European signal processing conference, vol. 149, p. 2.
Kruse, F. A., Lefkoff, A. B., Boardman, J. W., Heidebrecht, K. B., Shapiro, A. T., Barloon, P. J., & Goetz, A. F. H. (1993). The spectral image processing system (SIPS)–interactive visualization and analysis of imaging spectrometer data. Remote Sensing of Environment, 44(2–3), 145–163.
Lee, C., Kosta, A. K., Zhu, A. Z., Chaney, K., Daniilidis, K., & Roy, K. (2020). Spike-FlowNet: Event-based optical flow estimation with energy-efficient hybrid neural networks. In European conference on computer vision, Springer International Publishing.
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128\(\times \)128 120 db 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Lin, X., Wetzstein, G., Liu, Y., & Dai, Q. (2014). Dual-coded compressive hyper-spectral imaging. Optics Letters, 39, 2044–2047.
Lin, S., Zhang, J., Pan, J., Jiang, Z., Zou, D., Wang, Y., Chen, J., & Ren, J. (2020). Learning event-driven video deblurring and interpolation. In Computer vision–ECCV 2020: 16th European conference, pp. 695–710.
Liu, Y., Yuan, X., Suo, J., Brady, D. J., & Dai, Q. (2019). Rank minimization for snapshot compressive imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12), 2990–3006.
Meng, Z., Ma, J., & Yuan, X. (2020). End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In European conference on computer vision, Springer International Publishing.
Meyerriecks, W., & Kosanke, K. (2003). Color values and spectra of the principal emitters in colored flames. Journal of Pyrotechnics, 18, 710–731.
Mian, A., & Hartley, R. (2012). Hyperspectral video restoration using optical flow and sparse coding. Optics Express, 20(10), 10658–10673.
Miao, X., Yuan, X., Pu, Y., & Athitsos, V. (2019). Lambda-net: Reconstruct hyperspectral images from a snapshot measurement. In Proceedings of the IEEE/CVF international conference on computer vision, IEEE.
Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51–63. https://doi.org/10.1109/MSP.2019.2931595
Pan, L., Scheerlinck, C., Yu, X., Hartley, R., Liu, M., & Dai, Y. (2019). Bringing a blurry frame alive at high frame-rate with an event camera. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6820–6829.
Qiu, H., Wang, Y., & Meng, D. (2021). Effective snapshot compressive-spectral imaging via deep denoising and total variation priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., & Courville, A. (2019). On the spectral bias of neural networks. In International conference on machine learning, vol. 97, pp. 5301–5310.
Roy, K., Jaiswal, A., & Panda, P. (2019). Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784), 607–617.
Settles, G. S. (2006). High-speed imaging of shock waves, explosions and gunshots: New digital video technology, combined with some classic imaging techniques, reveals shock waves as never before. American Scientist, 94(1), 22–31.
Shang, W., Ren, D., Zou, D., Ren, J. S., Luo, P., & Zuo, W. (2021). Bringing events into video deblurring with non-consecutively blurry frames. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 4531–4540.
Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE.
Sun, L., Sakaridis, C., Liang, J., Jiang, Q., Yang, K., Sun, P., Ye, Y., Wang, K., & Gool, LV. (2022). Event-based fusion for motion deblurring with cross-modal attention. In European conference on computer vision, pp. 412–428.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’17, pp. 6000–6010.
Wagadarikar, A., John, R., Willett, R., & Brady, D. (2008). Single disperser design for coded aperture snapshot spectral imaging. Applied Optics, 47(10), B44-51.
Wagadarikar, A. A., Pitsianis, N. P., Sun, X., & Brady, D. J. (2009). Video rate spectral imaging using a coded aperture snapshot spectral imager. Optics Express, 17(8), 6368–6388.
Wang, Y., Li, J., Zhu, L., Xiang, X., Huang, T., & Tian, Y. (2022b). Learning stereo depth estimation with bio-inspired spike cameras. In 2022 IEEE international conference on multimedia and expo (ICME), IEEE.
Wang, L., Sun, C., Fu, Y., Kim, M. H., & Huang, H. (2019a). Hyperspectral image reconstruction using a deep spatial-spectral prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Wang, L., Wu, Z., Zhong, Y., & Yuan, X. (2022a). Snapshot spectral compressive imaging reconstruction using convolution and contextual transformer. Photonics Research,10(8), 1848.
Wang, L., Xiong, Z., Gao, D., Shi, G., Zeng, W., & Wu, F. (2015). High-speed hyperspectral video acquisition with a dual-camera architecture. In Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2015.7299128
Wang, L., Xiong, Z., Huang, H., Shi, G., Wu, F., & Zeng, W. (2019b). High-speed hyperspectral video acquisition by combining nyquist and compressive sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence,41(4), 857–870.
Xue, T., Chen, B., Wu, J., Wei, D., & Freeman, W. T. (2019). Video enhancement with task-oriented flow. International Journal of Computer Vision, 127(8), 1106–1125.
Yasuma, F., Mitsunaga, T., Iso, D., & Nayar, S. K. (2010). Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Transactions on Image Processing, 19(9), 2241–2253.
Yu, Z., Zhang, Y., Liu, D., Zou, D., Chen, X., Liu, Y., & Ren, J. (2021). Training weakly supervised video frame interpolation with events. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 14569–14578.
Yuan, X., Brady, D. J., & Katsaggelos, A. K. (2021). Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine, 38(2), 65–88.
Yuan, X., Tsai, T. H., Zhu, R., Llull, P., Brady, D., & Carin, L. (2015). Compressive hyperspectral imaging with side information. IEEE Journal of Selected Topics in Signal Processing, 9(6), 964–976. https://doi.org/10.1109/JSTSP.2015.2411575
Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., & Timofte, R. (2021a). Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1–1.
Zhang, S., Wang, L., Zhang, L., & Huang, H. (2021b). Learning tensor low-rank prior for hyperspectral image reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In Computer vision–ECCV 2020: 16th European conference, pp. 666–682.
Zhang, X., Zhang, Y., Xiong, R., Sun, Q., & Zhang, J. (2022). Herosnet: Hyperspectral explicable reconstruction and optimal sampling deep network for snapshot compressive imaging. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Zhang, K., Zuo, W., & Zhang, L. (2018). FFDNet: Toward a fast and flexible solution for CNN based image denoising. IEEE Transactions on Image Processing, 27(9), 4608–4622.
Zhao, J., Xiong, R., Liu, H., Zhang, J., & Huang, T. (2021). Spk2ImgNet: Learning to reconstruct dynamic scene from continuous spike stream. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Zhao, J., Xiong, R., Xie, J., Shi, B., Yu, Z., Gao, W., & Huang, T. (2022). Reconstructing clear image for high-speed motion scene with a retina-inspired spike camera. IEEE Transactions on Computational Imaging, 8, 12–27.
Zheng, Y., Zheng, L., Yu, Z., Shi, B., Tian, Y., & Huang, T. (2021). High-speed image reconstruction through short-term plasticity for spiking cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE.
Zhu, L., Dong, S., Huang, T., & Tian, Y. (2019). A retina-inspired sampling method for visual texture reconstruction. In 2019 IEEE international conference on multimedia and expo (ICME), IEEE.
Zhu, L., Wang, X., Chang, Y., Li, J., Huang, T. & Tian, Y. (2022). Event-based video reconstruction via potential-assisted spiking neural network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3594–3604.
Acknowledgements
This work is supported by the National Natural Science Foundation of China under grants No. 62027804, No. 62425101, No. 62332002, No. 62088102, No. 62302041, and No. 62322204.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Dengxin Dai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 2 (mp4 18399 KB)
Supplementary file 3 (mp4 10837 KB)
Supplementary file 4 (mp4 22259 KB)
Supplementary file 5 (mp4 26460 KB)
Supplementary file 6 (mp4 11686 KB)
Supplementary file 7 (mp4 33746 KB)
Supplementary file 8 (mp4 5927 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Geng, M., Wang, L., Zhu, L. et al. Towards Ultra High-Speed Hyperspectral Imaging by Integrating Compressive and Neuromorphic Sampling. Int J Comput Vis 133, 1587–1610 (2025). https://doi.org/10.1007/s11263-024-02236-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02236-y