这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

When deploying unmanned aerial vehicle (UAV) object detection networks to complex, real-world scenes, generalization ability is often reduced due to domain shift. While most existing domain-generalized object detection methods disentangle domain-invariant features spatially, our exploratory experiments revealed a key insight for UAV object detection (UAV-OD): frequency domain contributions exhibit more pronounced disparities in generalization compared to generic object detection involving larger objects, since UAV-OD detects smaller objects. Therefore, frequency domain disentanglement stands out as a more direct, effective approach for UAV-OD. This paper proposes a novel frequency domain disentanglement method to improve UAV-OD generalization. Specifically, our framework leverages two learnable filters extracting domain-invariant and domain-specific spectra. Additionally, we design two contrastive losses: an image-level loss and an instance-level loss guiding training. These losses enable the filters to focus on extracting domain-invariant and domain-specific spectra, achieving better disentangling. Extensive experiments across multiple datasets, including UAVDT and Visdrone2019-DET, utilizing Faster R-CNN and YOLOv5, show our approach consistently and significantly outperforms baseline and state-of-the-art domain generalization methods. Our code is available at https://github.com/wangkunyu241/UAV-Frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Cao, J., Cholakkal, H., Anwer, R.M., Khan, F. S., Pang, Y., & Shao, L. (2020). D2Det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11485–11494).

  • Cao, S., Joshi, D., Gui, L. Y., & Wang, Y. X. (2023). Contrastive mean teacher for domain adaptive object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 23839–23848).

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Springer.

  • Carlucci, F. M., D’Innocente, A., Bucci, S., Caputo, B., & Tommasi, T. (2019). Domain generalization by solving jigsaw puzzles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2229–2238).

  • Chen, C., Li, J., Zhou, H. Y., Han, X., Huang, Y., Ding, X., & Yu, Y. (2022a). Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3677–3694.

  • Chen, C., Zhang, Y., Lv, Q., Wei, S., Wang, X., Sun, X., & Dong, J. (2019). RRNet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF international conference on computer vision workshops.

  • Chen, C., Zheng, Z., Ding, X., Huang, Y., & Dou, Q. (2020). Harmonizing transferability and discriminability for adapting object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8869–8878).

  • Chen, M., Chen, W., Yang, S., Song, J., Wang, X., Zhang, L., Yan, Y., Qi, D., Zhuang, Y., Xie, D., et al. (2022b). Learning domain adaptive object detection with probabilistic teacher. arXiv preprint arXiv:2206.06293

  • Chen, Y., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2018). Domain adaptive faster R-CNN for object detection in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3339–3348).

  • Chen, Z., Wang, Y., Yang, Y., & Liu, D. (2021). PSD: Principled synthetic-to-real dehazing guided by physical priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7180–7189).

  • Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Xie, X., Han, J. (2023). Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Chi, L., Jiang, B., & Mu, Y. (2020). Fast Fourier convolution. Advances in Neural Information Processing Systems, 33, 4479–4488.

    Google Scholar 

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).

  • Dong, H., Pan, J., Xiang, L., Hu, Z., Zhang, X., Wang, F., & Yang, M. H. (2020). Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2157–2167).

  • Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., & Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European conference on computer vision (ECCV) (pp. 370–386).

  • Duarte, A., Borralho, N., Cabral, P., & Caetano, M. (2022). Recent advances in forest insect pests and diseases monitoring using UAV-based data: A systematic review. Forests, 13(6), 911.

    Article  Google Scholar 

  • Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning, PMLR (pp. 1180–1189).

  • Geraldes, R., Goncalves, A., Lai, T., Villerabel, M., Deng, W., Salta, A., Nakayama, K., Matsuo, Y., & Prendinger, H. (2019). UAV-based situational awareness system using deep learning. IEEE Access, 7, 122583–122594.

    Article  Google Scholar 

  • Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).

  • Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789).

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729–9738).

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Hsu, C.C., Tsai, Y.H., Lin, Y.Y., &Yang, M.H. (2020a). Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16 (pp. 733–748). Springer.

  • Hsu, H. K., Yao, C. H., Tsai, Y. H., Hung, W. C., Tseng, H. Y., Singh, M., & Yang, M. H. (2020b). Progressive domain adaptation for object detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 749–757).

  • Huang, J., Guan, D., Xiao, A., Lu, S. (2021). FSDR: Frequency space domain randomization for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6891–6902).

  • Huang, L., Zhou, Y., Zhu, F., Liu, L., & Shao, L. (2019). Iterative normalization: Beyond standardization towards efficient whitening. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4874–4883).

  • Huang, Z., Wang, H., Xing, E. P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 (pp. 124–140). Springer.

  • Jeon, S., Hong, K., Lee, P., Lee, J., & Byun, H. (2021). Feature stylization and domain-aware contrastive learning for domain generalization. In Proceedings of the 29th ACM international conference on multimedia (pp. 22–31).

  • Jiang, J., Chen, B., Wang, J., & Long, M. (2021a). Decoupled adaptation for cross-domain object detection. arXiv preprint arXiv:2110.02578

  • Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., & Wang, Z. (2021b). Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30, 2340–2349.

  • Jocher, G., Changyu, L., Hogan, A., Yu, L., Rai, P., Sullivan, T., et al. (2020). ultralytics/yolov5: Initial release. Zenodo

  • Kajiura, N., Liu, H., & Satoh, S. (2021). Improving camouflaged object detection with the uncertainty of pseudo-edge labels. In ACM multimedia Asia (pp. 1–7).

  • Kiefer, B., Ott, D., & Zell, A. (2022). Leveraging synthetic data in object detection on unmanned aerial vehicles. In 2022 26th international conference on pattern recognition (ICPR) (pp. 3564–3571). IEEE.

  • Lee, S., Bae, J., & Kim, H.Y. (2023). Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11776–11785).

  • Li, C., Guo, C., & Loy, C. C. (2021). Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4225–4238.

    Google Scholar 

  • Li, D., Huang, J.B., Li, Y., Wang, S., & Yang, M. H. (2016). Weakly supervised object localization with progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3512–3520).

  • Li, W., Liu, X., Yao, X., & Yuan, Y. (2022a). Scan: Cross domain object detection with semantic conditioned adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1421–1428.

  • Li, W., Liu, X., Yuan, Y. (2022b). Sigma: Semantic-complete graph matching for domain adaptive object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5291–5300).

  • Li, Y. J., Dai, X., Ma, C. Y., Liu, Y. C., Chen, K., Wu, B., He, Z., Kitani, K., & Vajda, P. (2022c). Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7581–7590).

  • Lin, C., Yuan, Z., Zhao, S., Sun, P., Wang, C., & Cai, J. (2021). Domain-invariant disentangled network for generalizable object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8771–8780).

  • Lin, S., Zhang, Z., Huang, Z., Lu, Y., Lan, C., Chu, P., You, Q., Wang, J., Liu, Z., Parulkar, A., et al. (2023). Deep frequency filtering for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11797–11807).

  • Liu, H., Song, P., & Ding, R. (2020a). Towards domain generalization in underwater object detection. In 2020 IEEE international conference on image processing (ICIP) (pp. 1971–1975). IEEE.

  • Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., & Piao, C. (2020). UAV-YOLO: Small object detection on unmanned aerial vehicle perspective. Sensors, 20(8), 2238.

    Article  Google Scholar 

  • Liu, Q., Chen, C., Qin, J., Dou, Q., Heng, P. A. (2021). FedDG: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1013–1023).

  • Liu, X., Ma, Y., Shi, Z., & Chen, J. (2019). Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7314–7323).

  • Liu, Y., Wang, J., Huang, C., Wang, Y., & Xu, Y. (2023). CIGAR: Cross-modality graph reasoning for domain adaptive object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 23776–23786).

  • Lu, Y., Zhong, Z., & Shu, Y. (2023). Multi-view domain adaptive object detection on camera networks. In AAAI.

  • Lygouras, E., Santavas, N., Taitzoglou, A., Tarchanidis, K., Mitropoulos, A., & Gasteratos, A. (2019). Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors, 19(16), 3542.

    Article  Google Scholar 

  • Ma, L., Ma, T., Liu, R., Fan, X., & Luo, Z. (2022). Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5637–5646).

  • Mittal, P., Singh, R., & Sharma, A. (2020). Deep learning-based object detection in low-altitude UAV datasets: A survey. Image and Vision Computing, 104, 104046.

    Article  Google Scholar 

  • Nussbaumer, H. J., & Nussbaumer, H. J. (1982). The fast Fourier transform. Springer.

  • Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via ibn-net. In Proceedings of the European conference on computer vision (ECCV) (pp. 464–479).

  • Pan, X., Zhan, X., Shi, J., Tang, X., & Luo, P. (2019). Switchable whitening for deep representation learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1863–1871).

  • Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). FFA-Net: Feature fusion attention network for single image dehazing. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11908–11915.

    Article  Google Scholar 

  • Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6956–6965).

  • Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126, 973–992.

    Article  Google Scholar 

  • San, K. T., Mun, S. J., Choe, Y. H., & Chang, Y. S. (2018). UAV delivery monitoring system. In MATEC web of conferences, EDP Sciences (Vol. 151, p. 04011).

  • Song, Y., He, Z., Qian, H., & Du, X. (2023). Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32, 1927–1941.

    Article  Google Scholar 

  • Sun, K., Liu, H., Ye, Q., Gao, Y., Liu, J., Shao, L., & Ji, R. (2021a). Domain general face forgery detection by learning to weight. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2638–2646.

  • Sun, W., Dai, L., Zhang, X., Chang, P., & He, X. (2021b). RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring. Applied Intelligence 1–16.

  • Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7167–7176).

  • Vidit, V., Engilberge, M., & Salzmann, M. (2023). Clip the gap: A single domain generalization approach for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3219–3229).

  • Wang, K., Fu, X., Huang, Y., Cao, C., Shi, G., Zha, Z. J. (2023a). Generalized uav object detection via frequency domain disentanglement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1064–1073).

  • Wang, T., Zhang, K., Shen, T., Luo, W., Stenger, B., & Lu, T. (2023b). Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 2654–2662.

  • Wang, Y., Wan, R., Yang, W., Li, H., Chau, L. P., & Kot, A. (2022). Low-light image enhancement with normalizing flow. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2604–2612.

    Article  Google Scholar 

  • Wu, A., & Deng, C. (2022). Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 847–856).

  • Wu, A., Liu, R., Han, Y., Zhu, L., & Yang. Y. (2021a). Vector-decomposed disentanglement for domain-invariant object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9342–9351).

  • Wu, X., Li, W., Hong, D., Tao, R., & Du, Q. (2021). Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geoscience and Remote Sensing Magazine, 10(1), 91–124.

    Article  Google Scholar 

  • Wu, Z., Suresh, K., Narayanan, P., Xu, H., Kwon, H., & Wang, Z. (2019). Delving into robust object detection from unmanned aerial vehicles: A deep nuisance disentanglement approach. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1201–1210).

  • Xu, C. D., Zhao, X. R., Jin, X., & Wei, X. S. (2020). Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11724–11733).

  • Xu, M., Qin, L., Chen, W., Pu, S., & Zhang, L. (2023). Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8103–8112).

  • Xu, Q., Zhang, R., Zhang, Y., Wang, Y., Tian, Q. (2021). A Fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14383–14392).

  • Yang, Q., Niu, H., Xia, P., Zhang, W., & Li, B. (2023). Frequency decomposition to tap the potential of single domain for generalization. arXiv preprint arXiv:2304.07261

  • Yang, Y., & Soatto, S. (2020). FDA: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4085–4095).

  • Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (2020). BDD100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2636–2645).

  • Yu, W., Yang, T., & Chen, C. (2021). Towards resolving the challenge of long-tail distribution in UAV images for object detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 3258–3267).

  • Zhang, P., Zhong, Y., & Li, X. (2019). Slimyolov3: Narrower, faster and better for real-time UAV applications. In Proceedings of the IEEE/CVF international conference on computer vision workshops.

  • Zhang, X., Cui, P., Xu, R., Zhou, L., He, Y., & Shen, Z. (2021). Deep stable learning for out-of-distribution generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5372–5382).

  • Zhang, X., Xu, Z., Xu, R., Liu, J., Cui, P., Wan, W., Sun, C., & Li, C. (2022). Towards domain generalization in object detection. arXiv preprint arXiv:2203.14387

  • Zhao, L., & Wang, L. (2022). Task-specific inconsistency alignment for domain adaptive object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14217–14226).

  • Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G. H. (2023). Style-hallucinated dual consistency learning: A unified framework for visual domain generalization. International Journal of Computer Vision.

  • Zheng, Y., Huang, D., Liu, S., & Wang, Y. (2020). Cross-domain object detection through coarse-to-fine feature adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13766–13775).

  • Zheng, Y., Zhan, J., He, S., Dong, J., & Du, Y. (2023). Curricular contrastive regularization for physics-aware single image dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5785–5794).

  • Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. Advances in Neural Information Processing Systems, 35, 338–350.

    Google Scholar 

  • Zhou, Z., Li, H., Liu, H., Wang, N., Yu, G., & Ji, R. (2023). Star loss: Reducing semantic ambiguity in facial landmark detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15475–15484).

  • Zhu, P., Du, D., Wen, L., Bian, X., Ling, H., Hu, Q., Peng, T., Zheng, J., Wang, X., Zhang, Y., et al. (2019). Visdrone-vid2019: The vision meets drone object detection in video challenge results. In Proceedings of the IEEE/CVF international conference on computer vision workshops.

  • Zhuang, C., Han, X., Huang, W., & Scott, M. (2020). iFAN: Image-instance full alignment networks for adaptive object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 13122–13129.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (NSFC) under Grants 62225207 and 62276243.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng-Jun Zha.

Additional information

Communicated by Hong Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Fu, X., Ge, C. et al. Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement. Int J Comput Vis 132, 5410–5438 (2024). https://doi.org/10.1007/s11263-024-02108-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02108-5

Keywords