Abstract
CEDFlow introduces a latent contour enhancement method into dark optical flow estimation and achieves advanced performance. Nevertheless, it largely focuses on addressing the motion boundary in a local manner. Unfortunately, it falls short in performance when addressing significant variations or large-scale degraded scenes. This paper introduces CEDFlow++, which features three innovative modules to address the key challenges of CEDFlow. Firstly, we introduce a decomposition-based feature encoder (DBFE), which captures both fine-grained and large-scale features through its local encoder and a uniquely designed sparse attention-based global encoder that suppresses noise and interference that only exist in the dark. Secondly, for reliable motion analysis, we propose a customized dual cost-volume reasoning (DCVR), which integrates important high-contrast feature correlations of the global cost volume into the local cost volume, effectively capturing salient yet holistic motion information while mitigating motion ambiguity caused by darkness. Importantly, we present a contour-guided attention (CGA) which enables context-adaptive extraction of contour features by modifying the sign properties of the Sobel kernel parameters in latent space, specifically targeting large-scale contours that are suitable for motion boundaries. Experimental results on the FCDN and VBOF datasets show that CEDFlow++ outperforms state-of-the-art methods in terms of the EPE index and produces more accurate and robust optical flow.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are openly available in [FCDN and VBOF datasets] at [https://github.com/mf-zhang/Optical-Flow-in-the-Dark].
References
Bayramli, B., Hur, J., & Lu, H. (2023). Raft-msf: Self-supervised monocular scene flow using recurrent optimizer. IJCV, 131(11), 2757–2769.
Butler, D.J., Wulff, J., Stanley, G.B., & Black, M.J. (2012). A naturalistic open source movie for optical flow evaluation. In: ECCV, pp. 611–625 Springer
Cai, Y., Bian, H., Lin, J., Wang, H., Timofte, R., & Zhang, Y. (2023). Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In: ICCV, pp. 12504–12513
Cai, B., Xu, X., Guo, K., Jia, K., Hu, B., & Tao, D. (2017). A joint intrinsic-extrinsic prior model for retinex. In: ICCV, pp. 4000–4009
Cao, B., Sun, Y., Zhu, P., & Hu, Q. (2023). Multi-modal gated mixture of local-to-global experts for dynamic image fusion. In: ICCV, pp. 23555–23564
Chi, C., Hao, T., Wang, Q., Guo, P., & Yang, X. (2022). Subspace-pnp: A geometric constraint loss for mutual assistance of depth and optical flow estimation. IJCV, 130(12), 3054–3069.
Chobola, T., Liu, Y., Zhang, H., Schnabel, J.A., & Peng, T. (2024). Fast context-based low-light image enhancement via neural implicit representations. In: ECCV, pp. 413–430 . Springer
Conde, M. V., Vazquez-Corral, J., Brown, M. S., & Timofte, R. (2024). Nilut: Conditional neural implicit 3d lookup tables for image enhancement. AAAI, 38, 1371–1379.
Dong, Q., Cao, C., & Fu, Y. (2023). Rethinking optical flow from geometric matching consistent perspective. In: CVPR, pp. 1337–1347
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR, pp. 3354–3361 IEEE
Guo, X., Li, Y., & Ling, H. (2016). Lime: Low-light image enhancement via illumination map estimation. IEEE TIP, 26(2), 982–993.
Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., Dai, J., & Li, H. (2022). Flowformer: A transformer architecture for optical flow. In: ECCV, pp. 668–685 . Springer
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470
Jiang, S., Campbell, D., Lu, Y., Li, H., & Hartley, R. (2021). Learning to estimate hidden motions with global motion aggregation. In: ICCV, pp. 9772–9781
Jiang, S., Lu, Y., Li, H., & Hartley, R. (2021). Learning optical flow from a few matches. In: CVPR, pp. 16592–16600
Li, H., Luo, K., & Liu, S. (2021). Gyroflow: Gyroscope-guided unsupervised optical flow learning. In: ICCV, pp. 12869–12878
Li, H., Luo, K., Zeng, B., & Liu, S. (2024). Gyroflow+: Gyroscope-guided unsupervised deep homography and optical flow learning. IJCV, 132(6), 2331–2349.
Lin, Z., Liang, T., Xiao, T., Wang, Y., & Yang, M.-H. (2024). Flownas: neural architecture search for optical flow estimation. IJCV, 132(4), 1055–1074.
Loshchilov, I., Hutter, F., et al. (2017). Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101 5, 5
Luo, A., Li, X., Yang, F., Liu, J., Fan, H., & Liu, S. (2024). Flowdiffuser: Advancing optical flow estimation with diffusion models. In: CVPR, pp. 19167–19176
Luo, K., Wang, C., Liu, S., Fan, H., Wang, J., & Sun, J. (2021). Upflow: Upsampling pyramid for unsupervised optical flow learning. In: CVPR, pp. 1045–1054
Luo, A., Yang, F., Li, X., & Liu, S. (2022). Learning optical flow with kernel patch attention. In: CVPR, pp. 8906–8915
Luo, A., Yang, F., Li, X., Nie, L., Lin, C., Fan, H., & Liu, S. (2023). Gaflow: Incorporating gaussian attention into optical flow. In: ICCV, pp. 9642–9651
Luo, A., Yang, F., Luo, K., Li, X., Fan, H., & Liu, S. (2022). Learning optical flow with adaptive graph reasoning. AAAI, 36, 1890–1898.
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In: CVPR, pp. 3061–3070
Menze, M., Heipke, C., & Geiger, A. (2015). Joint 3d estimation of vehicles and scene flow. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, 2, 427–434.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch
Ren, S., Zhou, D., He, S., Feng, J., & Wang, X. (2022). Shunted self-attention via multi-scale token aggregation. In: CVPR, pp. 10853–10862
Ren, Z., Luo, W., Yan, J., Liao, W., Yang, X., Yuille, A., & Zha, H. (2020). Stflow: Self-taught optical flow estimation using pseudo labels. IEEE TIP, 29, 9113–9124.
Shi, X., Huang, Z., Li, D., Zhang, M., Cheung, K.C., See, S., Qin, H., Dai, J., & Li, H. (2023). Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation. In: CVPR, pp. 1599–1610
Shi, Y., Liu, D., Zhang, L., Tian, Y., Xia, X., & Fu, X. (2024). Zero-ig: zero-shot illumination-guided joint denoising and adaptive enhancement for low-light images. In: CVPR, pp. 3015–3024
Smith, L.N., & Topin, N. (2019). Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-domain Operations Applications, vol. 11006, pp. 369–386 SPIE
Sui, X., Li, S., Geng, X., Wu, Y., Xu, X., Liu, Y., Goh, R., & Zhu, H. (2022). Craft: Cross-attentional flow transformer for robust optical flow. In: CVPR, pp. 17602–17611
Sun, D., Yang, X., Liu, M.-Y., & Kautz, J. (2018). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: CVPR, pp. 8934–8943
Sun, S., Chen, Y., Zhu, Y., Guo, G., & Li, G. (2022). Skflow: Learning optical flow with super kernels. NeurIPS, 35, 11313–11326.
Teed, Z., & Deng, J. (2020). Raft: Recurrent all-pairs field transforms for optical flow. In: ECCV, pp. 402–419 Springer
Wang, R., Xu, X., Fu, C.-W., Lu, J., Yu, B., & Jia, J. (2021). Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment. In: ICCV, pp. 9700–9709
Wang, W., Yang, H., Fu, J., & Liu, J. (2024). Zero-reference low-light enhancement via physical quadruple priors. In: CVPR, pp. 26057–26066
Wang, Y., Yu, Y., Yang, W., Guo, L., Chau, L.-P., Kot, A.C., & Wen, B. (2023). Exposurediffusion: Learning to expose for low-light image enhancement. In: ICCV, pp. 12438–12448
Wang, W., Wang, X., Yang, W., & Liu, J. (2022). Unsupervised face detection in the dark. IEEE TPAMI, 45(1), 1250–1266.
Wang, Y., Wan, R., Yang, W., Li, H., Chau, L.-P., & Kot, A. (2022). Low-light image enhancement with normalizing flow. AAAI,36, 2604–2612.
Wang, B., Zhang, Y., Li, J., Yu, Y., Sun, Z., Liu, L., & Hu, D. (2024). Splatflow: Learning multi-frame optical flow via splatting. IJCV, 132(8), 3023–3045.
Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560
Xu, X., Wang, R., & Lu, J. (2023). Low-light image enhancement via structure modeling and guidance. In: CVPR, pp. 9893–9903
Xu, X., Wang, R., Fu, C.-W., & Jia, J. (2022). Snr-aware low-light image enhancement. In: CVPR, pp. 17714–17724
Xu, H., Yang, J., Cai, J., Zhang, J., & Tong, X. (2021). High-resolution optical flow from 1d attention and correlation. In: ICCV, pp. 10498–10507
Xu, H., Zhang, J., Cai, J., Rezatofighi, H., & Tao, D. (2022). Gmflow: Learning optical flow via global matching. In: CVPR, pp. 8121–8130
Young, S. I., Naman, A. T., & Taubman, D. (2019). Graph laplacian regularization for robust optical flow estimation. IEEE TIP, 29, 3970–3983.
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., & Yang, M.-H. (2022). Restormer: Efficient transformer for high-resolution image restoration. In: CVPR, pp. 5728–5739
Zhang, F., Li, Y., You, S., & Fu, Y. (2021). Learning temporal consistency for low light video enhancement from single images. In: CVPR, pp. 4967–4976
Zhao, S., Zhao, L., Zhang, Z., Zhou, E., & Metaxas, D. (2022). Global matching with overlapping attention for optical flow estimation. In: CVPR, pp. 17592–17601
Zheng, Y., L, F., & Zhang, y. (2022) Optical flow in the dark. IEEE TPAMI 44(12), 9464–9476
Zheng, Y., L, F., & Zhang, M. (2020) Optical flow in the dark. In: CVPR, pp. 6749–6757
Zhou, H., Chang, Y., Liu, H., Yan, W., Duan, Y., Shi, Z., & Yan, L. (2024). Exploring the common appearance-boundary adaptation for nighttime optical flow. arXiv preprint arXiv:2401.17642
Zuo, F., Xiao, Z., Jin, H., & Su, H. (2024). Cedflow: latent contour enhancement for dark optical flow estimation. AAAI, 38, 7909–7916.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (62272383, 62371389, 62031023) and the Doctoral Dissertation Innovation Fund of Xian University of Technology (252072206).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ming-Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Basic and Context Backbones We follow a similar design as the well-known RAFT (Teed & Deng, 2020); we incorporate motion backbone and context backbone into our PsFE framework. The motion backbone, it outputs features at 1/8 resolution from \(\textbf{x} \in \mathbb {R}^{3 \times H \times W} \) \(\rightarrow \) \(\textbf{x} \in \mathbb {R}^{C \times H/8 \times W/8} \) where we set C = 256. The motion backbone consists of 6 residual blocks, 2 at 1/2 resolution, 2 at 1/4 resolution, and 2 at 1/8 resolution. Furthermore, the structure of the context backbone is identical to that of the feature extraction network, except that BatchNorm regularization is used in the context branch and InstanceNorm is used in the basic branch.
The Local Feature Encoder. We used the local encoder in CEDFlow as a continuation. In detail, the local encoder employs three 2D residual convolutional blocks (with a small receptive field) to encode the local properties of each point, followed by ReLU activation to ensure robust information propagation of the fine-grained local feature.
The Global Feature Encoder. Our global encoder consists of a sparse attention method and an interaction layer. As shown in Fig. 19, the interaction layer first predicts a set of scale weights from the local feature \(\hat{f^L}\), and then enhances the appearance information in the coarse \(f^H\) while retaining high-contrast feature. The process is as follows,
where ‘\(\cdot \)’ denotes an element-wise dot product. \(\mathcal{E}\mathcal{A}()\) is a spatial attention module, which consists of a convolutional block and a sigmoid function to extract the importance weight of each point, thereby improving the spatial expression of feature.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zuo, F., Jin, H., Xiao, Z. et al. CEDFlow++: Latent Contour Enhancement for Dark Optical Flow Estimation. Int J Comput Vis 133, 7222–7241 (2025). https://doi.org/10.1007/s11263-025-02528-x
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02528-x