Underwater acoustic target recognition based on multi-scale feature and CRDNet

Li, Jing; Chen, Yanru; Yang, Xudong; Zhang, Xinglong; Zhang, Lili; Wei, Wei; Yu, Pei; Tan, Hongxin

doi:10.1007/s11227-025-07806-6

Underwater acoustic target recognition based on multi-scale feature and CRDNet

Published: 19 September 2025

Volume 81, article number 1358, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

127 Accesses
Explore all metrics

Abstract

To enhance the recognition accuracy of underwater acoustic target recognition (UATR) via artificial neural networks, a novel UATR approach based on multi-scale features and convolutional residual dense network (CRDNet) is proposed. This paper incorporates a multi-scale convolutional structure into the enhanced ConvNextV2 module and proposes an acoustic feature structure SFbank based on singular value decomposition (SVD). Compared to traditional network frameworks and single acoustic filtering features, this structure demonstrates significant improvements in recognition accuracy, precision, and F1-scores. Experimental validation of the proposed method is conducted on the ShipsEar dataset, achieving a recognition accuracy of 99.08%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Underwater acoustic target recognition using RCRNN and wavelet-auditory feature

Article 26 October 2023

Recognition of Underwater Acoustic Target Using Sub-pretrained Convolutional Neural Networks

Deep convolution stack for waveform in underwater acoustic target recognition

Article Open access 05 May 2021

Data availability

No datasets were generated or analysed during the current study.

References

Liao S, Xiao W, Wang Y (2024) Dynamic hybrid parallel computing of the ray model for solving underwater acoustic fields in vast sea. Sci Rep 14(1):25385. https://doi.org/10.1038/s41598-024-76564-x
Article Google Scholar
Li P, Wu J, Wang Y et al (2022) STM: spectrogram transformer model for underwater acoustic target recognition. J Mar Sci Eng 10(10):1428
Article Google Scholar
Wang P, Peng Y (2020) Research on feature extraction and recognition method of underwater acoustic target based on deep convolutional network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp 863–868. https://doi.org/10.1109/AEECA49918.2020.9213504
Feng S, Ma S, Zhu X, Yan M (2024) Artificial intelligence-based underwater acoustic target recognition: a survey. Remote Sens 16:3333. https://doi.org/10.3390/rs16173333
Article Google Scholar
Luo X, Chen L, Zhou H et al (2023) A survey of underwater acoustic target recognition methods based on machine learning. J Mar Sci Eng 11(2):384
Article Google Scholar
Woo S, Debnath S, Hu R et al (2023) ConvNeXt V2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16133–16142
Jiang J, Shi T, Huang M et al (2020) Multi-scale spectral feature extraction for underwater acoustic target recognition. Measurement 166:108227
Article Google Scholar
Zhou A, Li X, Zhang W et al (2023) An attention-based multi-scale convolution network for intelligent underwater acoustic signal recognition. Ocean Eng 287:115784
Article Google Scholar
Pan X, Sun J, Feng TH et al (2024) Underwater target recognition based on adaptive multi-feature fusion network. Multimedia Tools Appl 83:1–21
Google Scholar
Zhao D, Lei Y, Xu J et al (2022) A comparative study of four types of multi-scale entropies in feature extraction of underwater acoustic signals for potential GNSS positioning applications. Front Phys 10:1058474
Article Google Scholar
National Park Service (2022) Soundclips. Available at: https://www.nps.gov/glba/learn/nature/soundclips
Song H, Wang H, Xiao S, Wang Y, Zhong Z, Yu L, Shan M, Liu B (2025) Underwater acoustic target recognition based on multi-scale residuals and dual attention mechanism. In: IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE), pp 285–289. https://doi.org/10.1109/CISCE65916.2025.11065495
Kim SJ, Chung YJ (2022) Multi-scale features for transformer model to improve the performance of sound event detection. Appl Sci 12(5):2626
Article Google Scholar
Hu F, Song P, He R et al (2023) MSARN: a multi-scale attention residual network for end-to-end environmental sound classification. Neural Process Lett 55(8):11449–11465
Article Google Scholar
Wang X, Song Y, Su L et al (2023) Recognition of abnormal car door noise based on multi-scale feature fusion. Proc Inst Mech Eng D J Automob Eng 237(6):1353–1364
Article Google Scholar
Chen Q, Wu Z, Zhong Q et al (2022) Heart sound classification based on mel-frequency cepstrum coefficient features and multi-scale residual recurrent neural networks. J Nanoelectron Optoelectron 17(8):1144–1153
Article Google Scholar
Zhou N, Wang L (2023) Triple feature extraction method based on multi-scale dispersion entropy and multi-scale permutation entropy in sound-based fault diagnosis. Front Phys 11:1180595
Article Google Scholar
Hu Y, Sun X, He L et al (2022) A generalized network based on multi-scale densely connection and residual attention for sound source localization and detection. J Acoust Soc Am 151(3):1754–1768
Article Google Scholar
Zeng D, Yan S, Yang J, Pan X (2025) An efficient deep learning approach with frequency and channel optimization for underwater acoustic target recognition. Sci Rep 15:27369. https://doi.org/10.1038/s41598-025-12452-2
Article Google Scholar
Li J, Wang J, Xu T, Shu J, Liu Y, Ma Y, Xu Y (2025) Dynamic stochastic model optimization for underwater acoustic navigation via singular value decomposition. J Mar Sci Eng 13:1329. https://doi.org/10.3390/jmse13071329
Article Google Scholar
Ji F, Lu S, Ni J, Li Z, Feng W (2025) Underwater target recognition method based on singular spectrum analysis and channel attention convolutional neural network. Sensors (Basel) 25(8):2573. https://doi.org/10.3390/s25082573
Article Google Scholar
Gao S, Li W, Zhang Y et al (2024) Extraction of acoustic normal mode depth functions using range-difference method with vertical linear array data. J Ocean Univ China 23:871–882. https://doi.org/10.1007/s11802-024-5742-6
Article Google Scholar
Chang D, Wang C, Jiang C (2012) Singular value decomposition based feature extraction technique for physiological signal analysis. J Med Syst 36(3):1769–1777. https://doi.org/10.1007/s10916-010-9636-3
Article Google Scholar
Kristomo D (2019) Dimensionality reduction of speech signals using singular value decomposition and Karhunen-Loeve. In: Proceedings of International Conference on Information System and Technology (ICIST), SCITEPRESS
Grondin F, Glass J (2018) SVD-PHAT: a fast sound source localization method. arXiv preprint arXiv:1811.11785v2
Wang Y, Tian Y, Liu J et al (2023) Multi-stage multi-scale local feature fusion for infrared small target detection. Remote Sens 15(18):4506
Article Google Scholar
Yu L, Xu F, Qu Y et al (2024) Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion. Appl Acoust 216:109752
Article Google Scholar
Guo H, Liu W (2024) Dmaf-net: deep multi-scale attention fusion network for hyperspectral image classification with limited samples. Sensors 24(10):3153
Article Google Scholar
Pang S, Chen Z, Yin F (2022) Lightweight multi-scale aggregated residual attention networks for image super-resolution. Multimedia Tools Appl 81(4):4797–4819
Article Google Scholar
Pan H, Yang H, Xie L et al (2023) Multi-scale fusion visual attention network for facial micro-expression recognition. Front Neurosci 17:1216181
Article Google Scholar
Deng Y, Hu X, Li B et al (2023) Multi-scale self-attention-based feature enhancement for detection of targets with small image sizes. Pattern Recognit Lett 166:46–52
Article Google Scholar
Xie Y, Chen T, Xu J (2023) Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization. arXiv preprint arXiv:2304.11907
Jin A, Zeng X (2023) A novel deep learning method for underwater target recognition based on Res-Dense convolutional neural network with attention mechanism. J Mar Sci Eng 11(1):69
Article MathSciNet Google Scholar
Tan J, Pan X (2023) Underwater acoustic target recognition based on convolutional neural network and multi-feature fusion. In: Proceedings of 3rd International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), SPIE, vol 12754, pp 778–784
Santos-Domínguez D, Torres-Guijarro S, Cardenal-López A et al (2016) ShipsEar: an underwater vessel noise database. Appl Acoust 113:64–69
Article Google Scholar
Yang S, Jin A, Zeng X et al (2024) Underwater acoustic target recognition based on sub-band concatenated Mel spectrogram and multidomain attention mechanism. Eng Appl Artif Intell 133:107983
Article Google Scholar
Li J, Wang B, Cui X et al (2022) Underwater acoustic target recognition based on attention residual network. Entropy 24(11):1657
Article Google Scholar
Ong JB, Ng WK, Kuo CC (2018) Convolutional neural networks with transformed input based on robust tensor network decomposition. arXiv preprint arXiv:1812.02622. https://doi.org/10.48550/arXiv.1812.02622
Park DS, Chan W, Zhang Y et al (2019) SpecAugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779
Nam H, Lee J, Kim S et al (2021) FilterAugment: an acoustic environmental data augmentation method. arXiv preprint arXiv:2110.03282
Liu C, Dollár P, He K et al (2020) Are labels necessary for neural architecture search? In: Proceedings of European Conference on Computer Vision (ECCV)
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proc Eur Conf on Computer Vision (ECCV)
Wang H, Zheng S, Chen Y et al (2023) CAM++: a fast and efficient network for speaker verification using context-aware masking. arXiv preprint arXiv:2303.00332
Okabe K, Koshinaka T, Shinoda K (2018) Attentive statistics pooling for deep speaker embedding. In: Proceedings of Interspeech, pp 2252–2256
Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. Proc Interspeech 2015:3214–3218. https://doi.org/10.21437/Interspeech.2015-647
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR), pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR)4
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Chapter, p 6
MATH Google Scholar
Loshchilov I, Hutter F (2017) SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the International Conference on Learning Representations (ICLR)
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
MATH Google Scholar
Irfan M, Islam MR, Kim JM, Kim TS (2021) DeepShip: an underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Syst Appl 183:115270. https://doi.org/10.1016/j.eswa.2021.115270
Article Google Scholar

Download references

Funding

This work is supported in part by Ningxia Natural Science Foundation General Project (2022AAC03757, 2023AAC03889), R&D Program of Beijing Municipal Education Commission (KM202410017006), and National Key Research and Development Program of China (2023YFC3011704-2).

Author information

Authors and Affiliations

Beijing Institute of Petrochemical Technology, Beijing, China
Jing Li, Yanru Chen, Xudong Yang, Xinglong Zhang, Lili Zhang & Wei Wei
China Fire and Rescue Institute, Beijing, China
Pei Yu
Science and Technology On Complex Aviation Systems Simulation Laboratory, Beijing, China
Hongxin Tan

Authors

Jing Li
View author publications
Search author on:PubMed Google Scholar
Yanru Chen
View author publications
Search author on:PubMed Google Scholar
Xudong Yang
View author publications
Search author on:PubMed Google Scholar
Xinglong Zhang
View author publications
Search author on:PubMed Google Scholar
Lili Zhang
View author publications
Search author on:PubMed Google Scholar
Wei Wei
View author publications
Search author on:PubMed Google Scholar
Pei Yu
View author publications
Search author on:PubMed Google Scholar
Hongxin Tan
View author publications
Search author on:PubMed Google Scholar

Contributions

J.L., X.Y., and Y.C. contributed to the methodology design of the study. J.L., X.Y., L.Z., and W.W. were responsible for the implementation of the proposed approach. P.Y. and H.T. carried out the formal analysis and investigation. The original draft of the manuscript was written by X.Y. W.W., Y.C., and X.Z. contributed to the review and editing of the manuscript. J.L., L.Z., and W.W. supervised the entire project. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Wei Wei.

Ethics declarations

Conflict of interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Chen, Y., Yang, X. et al. Underwater acoustic target recognition based on multi-scale feature and CRDNet. J Supercomput 81, 1358 (2025). https://doi.org/10.1007/s11227-025-07806-6

Download citation

Received: 19 May 2025
Accepted: 26 August 2025
Published: 19 September 2025
Version of record: 19 September 2025
DOI: https://doi.org/10.1007/s11227-025-07806-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Underwater acoustic target recognition based on multi-scale feature and CRDNet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Underwater acoustic target recognition using RCRNN and wavelet-auditory feature

Recognition of Underwater Acoustic Target Using Sub-pretrained Convolutional Neural Networks

Deep convolution stack for waveform in underwater acoustic target recognition

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now