Abstract
Deep learning has made significant progress in event-driven applications. But to match standard vision networks, most approaches rely on aggregating events into grid-like representations, which obscure crucial temporal information and limit overall performance. To address this issue, we propose a novel event representation called compressed event sensing (CES) volumes. CES volumes preserve the high temporal resolution of event streams by leveraging the sparsity property of events and the principles of compressed sensing theory. They effectively capture the frequency characteristics of events in low-dimensional representations, which can be accurately decoded to raw high-dimensional event signals. In addition, our theoretical analysis show that, when integrated with a neural network, CES volumes demonstrates greater expressive power under the neural tangent kernel approximation. Through synthetic phantom validation on dense frame regression and two downstream applications involving intensity-image reconstruction and object recognition tasks, we demonstrate the superior performance of CES volumes compared to state-of-the-art event representations.
Similar content being viewed by others
Data Availibility Statement
This work does not propose a new dataset. All the datasets we used are publicly available.
References
Alonso, I., & Murillo, A. C. (2019). Ev-segnet Semantic segmentation for event-based cameras. In: CVPRW. https://doi.org/10.1109/cvprw.2019.00205
Arora, S., Du, S. S., Hu, W., Li, Z., Salakhutdinov, R. R., & Wang, R. (2019). On exact computation with an infinitely wide neural net. Advances in neural information processing systems,32.
Bajwa, W. U., Haupt, J., Sayeed, A. M., & Nowak, R. (2010). Compressed channel sensing: A new approach to estimating sparse multipath channels. Proceedings of the IEEE, 98(6), 1058–1076. https://doi.org/10.1109/jproc.2010.2042415
Baldwin, R., Liu, R., Almatrafi, M. M., Asari, V. K., & Hirakawa, K. (2022). Time-ordered recent event (tore) volumes for event cameras. TPAMI.https://doi.org/10.1109/tpami.2022.3172212
Basarab, A., Liebgott, H., Bernard, O., Friboulet, D., & Kouamé, D. (2013). Medical ultrasound image reconstruction using distributed compressive sampling. In: International symposium on biomedical imaging, pp. 628–631. IEEE. https://doi.org/10.1109/isbi.2013.6556553.
Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., & Andreopoulos, Y. (2019). Graph-based object classification for neuromorphic vision sensing. In: ICCV, pp. 491–501. https://doi.org/10.1109/iccv.2019.00058
Bietti, A., & Mairal, J. (2019). On the inductive bias of neural tangent kernels. Advances in Neural Information Processing Systems, 32
Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Transactions on Information Theory, 52(2), 489–509. https://doi.org/10.1109/tit.2005.862083
Candes, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted l 1 minimization. Journal of Fourier Analysis and Applications, 14(5), 877–905. https://doi.org/10.1007/s00041-008-9045-x
Carvalho, L., Costa, J. L., Mourão, J., & Oliveira, G. (2024).The positivity of the neural tangent kernel. arXiv preprint arXiv:2404.12928.
Chen, L., & Xu, S. (2020). Deep neural tangent kernel and laplace kernel have the same rkhs. arXiv preprint arXiv:2009.10683.
Chen, Z., Cao, Y., Quanquan, G., & Zhang, T. (2020). A generalized neural tangent kernel analysis for two-layer neural networks. Advances in Neural Information Processing Systems, 33, 13363–13373.
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306. https://doi.org/10.1109/TIT.2006.871582
Eldar, Y. C., & Kutyniok, G. (2012). Compressed sensing: theory and applications. Cambridge University Press.
Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. TPAMI, 28(4), 594–611. https://doi.org/10.1109/tpami.2006.79
Foucart, S., & Rauhut, H. (2013) Restricted isometry property, pp. 133–174. Springer New York, New York, NY.
Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Jörg Conradt, & Daniilidis, K., et al. (2020). Event-based vision: A survey. TPAMI, 44(1), 154–180. https://doi.org/10.1109/TPAMI.2020.3008413
Gehrig, D., Gehrig, M., Hidalgo-Carrió, J., & Scaramuzza, D. (2020). Video to events: Recycling video datasets for event cameras. In: CVPR, pp. 3586–3595. https://doi.org/10.1109/cvpr42600.2020.00364
Gehrig, D., Loquercio, A., Derpanis, K. G., & Scaramuzza, D. (2019). End-to-end learning of representations for asynchronous event-based data. In: ICCV, pp. 5633–5643. https://doi.org/10.1109/iccv.2019.00573
Gehrig, M., Shrestha, S. B., Mouritzen, D., & Scaramuzza, D. (2020). Event-based angular velocity regression with spiking networks. In: ICRA, pp. 4195–4202. IEEE, https://doi.org/10.1109/icra40945.2020.9197133.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR, 45, 770–778. https://doi.org/10.1109/cvpr.2016.90
Hendrycks, D., & Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, (2016). https://doi.org/10.48550/arXiv.1606.08415.
Huh, D., Sejnowski, T. J. (2018). Gradient descent for spiking neural networks. In: NeurIPS, 31. https://doi.org/10.48550/arXiv.1706.04698.
Jacot, A., Gabriel, F., & Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. NeurIPS, 31.
Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., & Liu, Y. (2020) Learning event-based motion deblurring. In: CVPR, pp. 3320–3329. https://doi.org/10.1109/cvpr42600.2020.00338.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. ICLR. https://doi.org/10.48550/arXiv.1412.6980.
Lagorce, X., Orchard, G., Galluppi, F., Shi, B. E., & Benosman, R. B. (2016). Hots: a hierarchy of event-based time-surfaces for pattern recognition. TPAMI, 39(7), 1346–1359. https://doi.org/10.1109/tpami.2016.2574707
Lee, J. H., Delbruck, T., & Pfeiffer, M. (2016). Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience, 10, 508. https://doi.org/10.3389/fnins.2016.00508.
Lin, S., Zhang, J., Pan, J., Jiang, Z., Zou, D., Wang, Y., Chen, J., & Ren, J. (2020). Learning event-driven video deblurring and interpolation. In: ECCV, pp. 695–710. Springer. https://doi.org/10.1007/978-3-030-58598-3_41.
Liu, C., Hui, L. (2023). Relu soothes the ntk condition number and accelerates optimization for wide neural networks. arXiv preprint arXiv:2305.08813.
Maqueda, A. I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. In: CVPR, pp. 5419–5427. https://doi.org/10.1109/cvpr.2018.00568.
Mitrokhin, A., Fermüller, C., Parameshwara, C., & Aloimonos, Y. (2018). Event-based moving object detection and tracking. In: IROS, pp. 1–9. IEEE. https://doi.org/10.1109/iros.2018.8593805.
Mitrokhin, A., Ye, C., Fermüller, C., Aloimonos, Y., & Delbruck, T. (2019). Ev-imo: Motion segmentation dataset and learning pipeline for event cameras. In: IROS, pp. 6105–6112. IEEE. https://doi.org/10.1109/iros40897.2019.8968520.
Mohtashemi, M., Smith, H., Walburger, D., Sutton, F., & Diggans, J., Sparse sensing dna microarray-based biosensor: Is it feasible? In: 2010 IEEE sensors applications symposium, pp. 127–130. IEEE (2010). https://doi.org/10.1109/sas.2010.5439412.
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149. https://doi.org/10.1177/0278364917691115
Neil, D., Pfeiffer, M., Liu, S.-C. (2016). Phased lstm: Accelerating recurrent network training for long or event-based sequences. NeurIPS, 29. https://doi.org/10.48550/arXiv.1610.09513.
Nguyen, T. L. N., & Shin, Y. (2013) Deterministic sensing matrices in compressive sensing: A survey. The Scientific World Journal, 2013. https://doi.org/10.1155/2013/192795.
Nyquist, H. (1928). Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2), 617–644. https://doi.org/10.1109/5.989875
Orchard, G., Jayawant, A., Cohen, G. K., & Thakor, N. (2015). Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience, 9, 437. https://doi.org/10.3389/fnins.2015.00437.
Orchard, G., Meyer, C., Etienne-Cummings, R., Posch, C., Thakor, N., & Benosman, R. (2015). Hfirst: A temporal approach to object recognition. TPAMI, 37(10), 2028–2040. https://doi.org/10.1109/tpami.2015.2392947
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., & DeVito, Z. , Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch.
Posch, C., Matolin, D., & Wohlgenannt, R. (2010). A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE Journal of Solid-State Circuits, 46(1), 259–275. https://doi.org/10.1109/jssc.2010.2085952
Rebecq, H., Gehrig, D., Scaramuzza, D. (2018). ESIM: an open event camera simulator. In: CoRL.
Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization.https://doi.org/10.5244/c.31.16
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). High speed and high dynamic range video with an event camera. TPAMI, 43(6), 1964–1980. https://doi.org/10.1109/tpami.2019.2963386
Saitoh, S., Sawano, Y., et al. (2016). Theory of reproducing kernels and applications. Springer.
Schaefer, S., Gehrig, D., & Scaramuzza, D. (2022). Aegnn: Asynchronous event-based graph neural networks. In: CVPR, pp. 12371–12381. https://doi.org/10.1109/cvpr52688.2022.01205.
Scheerlinck, C., Barnes, N., & Mahony, R. (2018). Continuous-time intensity estimation using event cameras. In: ACCV, pp. 308–324. Springer. https://doi.org/10.1007/978-3-030-20873-8_20.
Schölkopf, B., Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press.
Seeger, M. (2004). Gaussian processes for machine learning. International Journal of Neural Systems, 14(02), 69–106.
Sekikawa, Y., Hara, K., & Saito, H. (2019). Eventnet: Asynchronous recursive event processing. In: CVPR, pp. 3887–3896. https://doi.org/10.1109/cvpr.2019.00401.
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., & Benosman, R. (2018). Hats: Histograms of averaged time surfaces for robust event-based object classification. In: CVPR, pp. 1731–1740. https://doi.org/10.1109/cvpr.2018.00186.
Tancik, M., Srinivasan, P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J., & Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 33, 7537–7547. https://doi.org/10.1109/mmul.2021.3053698
Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, vol. 48. Cambridge university press.
Wang, L., Ho, Y.-S., & Yoon, K.-J. et al. (2019). Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In: CVPR, pp. 10081–10090. https://doi.org/10.1109/cvpr.2019.01032.
Wang, Z., Bovik, A. C. Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. TIP, 13(4), 600–612. https://doi.org/10.1109/tip.2003.819861.
Yang, J., Zhang, Q., Ni, B., Li, L., Liu, J., Zhou, M., & Tian, Q. (2019). Modeling point clouds with self-attention and gumbel subset sampling. In: CVPR, pp. 3323–3332. https://doi.org/10.1109/cvpr.2019.00344.
Zhang, H., Chen, X.-H., & Xin-Min, W. (2013). Seismic data reconstruction based on cs and fourier theory. Applied Geophysics, 10(2), 170–180. https://doi.org/10.1007/s11770-013-0375-3
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595. https://doi.org/10.1109/cvpr.2018.00068.
Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In: ECCV, pp. 666–682. Springer. https://doi.org/10.1007/978-3-030-58523-5_39.
Zhao, B., Ding, R., Chen, S., Linares-Barranco, B., & Tang, H. (2014). Feedforward categorization on aer motion events using cortex-like features in a spiking neural network. TNNLS, 26(9), 1963–1978. https://doi.org/10.1109/tnnls.2014.2362542
Zhu, A. Z., & Yuan, L. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. In: Robotics: Science and Systems. https://doi.org/10.15607/rss.2018.xiv.062.
Zhu, A. Z., Yuan, L., Chaney, K., & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In: CVPR, pp. 989–997. https://doi.org/10.1109/cvpr.2019.00108
Funding
This work was supported in part by the Ministry of Education, Republic of Singapore, through its Start-Up Grant and Academic Research Fund Tier 1 (RG61/22).
Author information
Authors and Affiliations
Contributions
Songnan Lin Conceptualization, Methodology, Software, Writing - original draft preparation; Ye Ma Methodology, Software, Writing - review and editing; Jing Chen Writing - review and editing; Bihan Wen Conceptualization, Writing - review and editing, Supervision;
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Code availability
The code of this work will be released after acceptance.
Additional information
Communicated by Yasuyuki Matsushita.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof of Theorem 1
Proof of Theorem 1
Theorem 1
Given a non-zero distinct s-sparse dataset \({\mathfrak {X}}=\{\vec {x_i}\}_{i=1}^I\), let \({\mathfrak {H}}_a\) and \({\mathfrak {H}}_b\) be the RKHS associated with the NTK of same-architecture fully-connected network with \(\Psi _a^T\vec {x}\), and \(\Psi _b^T\vec {x}\) as input, where \(\Psi _a\) holds the non-degenerate property while \(\Psi _b\) does not, the following subset inclusion relation hold:
We first introduce two key ingredients of the proof:
Lemma 1
(Theorem 2.17 in Saitoh et al. (2016)) Let \(K_a,K_b\, E \times E \rightarrow {\mathbb {C}}\) be two positive semi-definite kernels. Then the following two statements are equivalent:
-
1.
The Hilbert space \({\mathfrak {H}}_{b}\) is a subset of \({\mathfrak {H}}_{a}\)
-
2.
There exist \(\gamma > 0\), such that
$$\begin{aligned} K_b \preceq \gamma ^2 K_a. \end{aligned}$$(17)
Lemma 2
(Proposition 2 in Jacot et al. (2018), Theorem 6 in Luís et al. (2024)) For a fully-connected network adopting a non-polynomial Lipschitz nonlinearity activation function \(\sigma \), for any input dimension \(n_0\), the limiting NTK is strictly positive definite if the number of layer \(L \ge 2\).
Proof of Theorem 1
According to Lemma 1, to obtain \({\mathfrak {H}}_b \subsetneq {\mathfrak {H}}_a\), we require proof that \(\gamma ^2 K_a - K_b\) is a positive semidefinite kernel for some \(\gamma > 0\), whereas \(\gamma ^2 K_b - K_a\) is not a positive semidefinite kernel for all \(\gamma > 0\).
Consider arbitrary non-empty subset of \({\mathfrak {X}}\), \(\{\vec {x_i}\}_{i=1}^r \subset {\mathfrak {X}}\), for \(1 \le r \le I\), the NTK matrix \({\textbf{K}}\) with size \(r \times r\) could be constructed for kernel \(K_a\) and \(K_b\), whose entries are
As introduced in the proposed NTK model, deep learning methods first represent events using sensing matrix \(\Psi \), and then feed the representation into a neural network g. We refer to the NTK of the network g as \(K_g(\cdot , \cdot )\). Therefore, for two different sensing matrices \(\Psi _a\) and \(\Psi _b\), the NTK of the whole networks can be represented as
According to Lemma 2, when we adopt the same network settings to Jacot et al. (2018), we obtain that the NTK of the network \(K_g\) is a strictly positive definite for distinct network inputs.
Since \(\Psi _a\) holds the non-degenerate property as described in Equation (11), the compressed representations \(\Psi _a^T \vec {x_i}\) are distinct. Therefore, the NTK matrix \({\textbf{K}}_a\) is positive definite with eigenvalues \(\lambda _a^i >0\). Whereas, \(\Psi _b\) does not hold this property, i.e., there might exist “degenerate" vector pairs \(\vec {x_i}\) and \(\vec {x_j}\) such that \(\Psi _b^T \vec {x_i} = \Psi _b^T \vec {x_j}\), leading to identical values in i th and j th rows of the NTK matrix \({\textbf{K}}_b\). And thus, the eigenvalues \(\lambda _b^i \ge 0\).
Therefore, for each non-empty subset \(\{\vec {x_i}\}_{i=1}^r\) of \({\mathfrak {X}}\), \(\gamma ^2 {\textbf{K}}_a - {\textbf{K}}_b\) is positive semidefinite when
Here, let the \(\gamma _{max}\) be the maximum of \(\gamma \) respective to all the non-empty subsets of \({\mathfrak {X}}\). Based on the definition of positive semidefinite kernels (see Definition 12.6 in Martin (2019)), \(\gamma _{max}^2 K_a - K_b\) is a positive semidefinite kernel on \({\mathfrak {X}}\). This enables us to apply Lemma 1 to obtain that
Conversely, since the eigenvalues of \({\textbf{K}}_b\) contains zero in the degenerate case, for all \(\gamma > 0\), \(\gamma ^2 {\textbf{K}}_b - {\textbf{K}}_a\) is not positive semidefinite. Thus, for all \(\gamma > 0\), the kernel function \(\gamma ^2 K_b - K_a\) is not positive semidefinite. According to Lemma 1, \({\mathfrak {H}}_a\) is not a subset of \({\mathfrak {H}}_b\). Combining \({\mathfrak {H}}_b \subseteq {\mathfrak {H}}_a\), we can come to \({\mathfrak {H}}_a \ne {\mathfrak {H}}_b\), thereby concluding the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, S., Ma, Y., Chen, J. et al. Compressed Event Sensing (CES) Volumes for Event Cameras. Int J Comput Vis 133, 435–455 (2025). https://doi.org/10.1007/s11263-024-02197-2
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02197-2