Abstract
To enhance the recognition accuracy of underwater acoustic target recognition (UATR) via artificial neural networks, a novel UATR approach based on multi-scale features and convolutional residual dense network (CRDNet) is proposed. This paper incorporates a multi-scale convolutional structure into the enhanced ConvNextV2 module and proposes an acoustic feature structure SFbank based on singular value decomposition (SVD). Compared to traditional network frameworks and single acoustic filtering features, this structure demonstrates significant improvements in recognition accuracy, precision, and F1-scores. Experimental validation of the proposed method is conducted on the ShipsEar dataset, achieving a recognition accuracy of 99.08%.
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Liao S, Xiao W, Wang Y (2024) Dynamic hybrid parallel computing of the ray model for solving underwater acoustic fields in vast sea. Sci Rep 14(1):25385. https://doi.org/10.1038/s41598-024-76564-x
Li P, Wu J, Wang Y et al (2022) STM: spectrogram transformer model for underwater acoustic target recognition. J Mar Sci Eng 10(10):1428
Wang P, Peng Y (2020) Research on feature extraction and recognition method of underwater acoustic target based on deep convolutional network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp 863–868. https://doi.org/10.1109/AEECA49918.2020.9213504
Feng S, Ma S, Zhu X, Yan M (2024) Artificial intelligence-based underwater acoustic target recognition: a survey. Remote Sens 16:3333. https://doi.org/10.3390/rs16173333
Luo X, Chen L, Zhou H et al (2023) A survey of underwater acoustic target recognition methods based on machine learning. J Mar Sci Eng 11(2):384
Woo S, Debnath S, Hu R et al (2023) ConvNeXt V2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16133–16142
Jiang J, Shi T, Huang M et al (2020) Multi-scale spectral feature extraction for underwater acoustic target recognition. Measurement 166:108227
Zhou A, Li X, Zhang W et al (2023) An attention-based multi-scale convolution network for intelligent underwater acoustic signal recognition. Ocean Eng 287:115784
Pan X, Sun J, Feng TH et al (2024) Underwater target recognition based on adaptive multi-feature fusion network. Multimedia Tools Appl 83:1–21
Zhao D, Lei Y, Xu J et al (2022) A comparative study of four types of multi-scale entropies in feature extraction of underwater acoustic signals for potential GNSS positioning applications. Front Phys 10:1058474
National Park Service (2022) Soundclips. Available at: https://www.nps.gov/glba/learn/nature/soundclips
Song H, Wang H, Xiao S, Wang Y, Zhong Z, Yu L, Shan M, Liu B (2025) Underwater acoustic target recognition based on multi-scale residuals and dual attention mechanism. In: IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE), pp 285–289. https://doi.org/10.1109/CISCE65916.2025.11065495
Kim SJ, Chung YJ (2022) Multi-scale features for transformer model to improve the performance of sound event detection. Appl Sci 12(5):2626
Hu F, Song P, He R et al (2023) MSARN: a multi-scale attention residual network for end-to-end environmental sound classification. Neural Process Lett 55(8):11449–11465
Wang X, Song Y, Su L et al (2023) Recognition of abnormal car door noise based on multi-scale feature fusion. Proc Inst Mech Eng D J Automob Eng 237(6):1353–1364
Chen Q, Wu Z, Zhong Q et al (2022) Heart sound classification based on mel-frequency cepstrum coefficient features and multi-scale residual recurrent neural networks. J Nanoelectron Optoelectron 17(8):1144–1153
Zhou N, Wang L (2023) Triple feature extraction method based on multi-scale dispersion entropy and multi-scale permutation entropy in sound-based fault diagnosis. Front Phys 11:1180595
Hu Y, Sun X, He L et al (2022) A generalized network based on multi-scale densely connection and residual attention for sound source localization and detection. J Acoust Soc Am 151(3):1754–1768
Zeng D, Yan S, Yang J, Pan X (2025) An efficient deep learning approach with frequency and channel optimization for underwater acoustic target recognition. Sci Rep 15:27369. https://doi.org/10.1038/s41598-025-12452-2
Li J, Wang J, Xu T, Shu J, Liu Y, Ma Y, Xu Y (2025) Dynamic stochastic model optimization for underwater acoustic navigation via singular value decomposition. J Mar Sci Eng 13:1329. https://doi.org/10.3390/jmse13071329
Ji F, Lu S, Ni J, Li Z, Feng W (2025) Underwater target recognition method based on singular spectrum analysis and channel attention convolutional neural network. Sensors (Basel) 25(8):2573. https://doi.org/10.3390/s25082573
Gao S, Li W, Zhang Y et al (2024) Extraction of acoustic normal mode depth functions using range-difference method with vertical linear array data. J Ocean Univ China 23:871–882. https://doi.org/10.1007/s11802-024-5742-6
Chang D, Wang C, Jiang C (2012) Singular value decomposition based feature extraction technique for physiological signal analysis. J Med Syst 36(3):1769–1777. https://doi.org/10.1007/s10916-010-9636-3
Kristomo D (2019) Dimensionality reduction of speech signals using singular value decomposition and Karhunen-Loeve. In: Proceedings of International Conference on Information System and Technology (ICIST), SCITEPRESS
Grondin F, Glass J (2018) SVD-PHAT: a fast sound source localization method. arXiv preprint arXiv:1811.11785v2
Wang Y, Tian Y, Liu J et al (2023) Multi-stage multi-scale local feature fusion for infrared small target detection. Remote Sens 15(18):4506
Yu L, Xu F, Qu Y et al (2024) Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion. Appl Acoust 216:109752
Guo H, Liu W (2024) Dmaf-net: deep multi-scale attention fusion network for hyperspectral image classification with limited samples. Sensors 24(10):3153
Pang S, Chen Z, Yin F (2022) Lightweight multi-scale aggregated residual attention networks for image super-resolution. Multimedia Tools Appl 81(4):4797–4819
Pan H, Yang H, Xie L et al (2023) Multi-scale fusion visual attention network for facial micro-expression recognition. Front Neurosci 17:1216181
Deng Y, Hu X, Li B et al (2023) Multi-scale self-attention-based feature enhancement for detection of targets with small image sizes. Pattern Recognit Lett 166:46–52
Xie Y, Chen T, Xu J (2023) Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization. arXiv preprint arXiv:2304.11907
Jin A, Zeng X (2023) A novel deep learning method for underwater target recognition based on Res-Dense convolutional neural network with attention mechanism. J Mar Sci Eng 11(1):69
Tan J, Pan X (2023) Underwater acoustic target recognition based on convolutional neural network and multi-feature fusion. In: Proceedings of 3rd International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), SPIE, vol 12754, pp 778–784
Santos-Domínguez D, Torres-Guijarro S, Cardenal-López A et al (2016) ShipsEar: an underwater vessel noise database. Appl Acoust 113:64–69
Yang S, Jin A, Zeng X et al (2024) Underwater acoustic target recognition based on sub-band concatenated Mel spectrogram and multidomain attention mechanism. Eng Appl Artif Intell 133:107983
Li J, Wang B, Cui X et al (2022) Underwater acoustic target recognition based on attention residual network. Entropy 24(11):1657
Ong JB, Ng WK, Kuo CC (2018) Convolutional neural networks with transformed input based on robust tensor network decomposition. arXiv preprint arXiv:1812.02622. https://doi.org/10.48550/arXiv.1812.02622
Park DS, Chan W, Zhang Y et al (2019) SpecAugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779
Nam H, Lee J, Kim S et al (2021) FilterAugment: an acoustic environmental data augmentation method. arXiv preprint arXiv:2110.03282
Liu C, Dollár P, He K et al (2020) Are labels necessary for neural architecture search? In: Proceedings of European Conference on Computer Vision (ECCV)
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proc Eur Conf on Computer Vision (ECCV)
Wang H, Zheng S, Chen Y et al (2023) CAM++: a fast and efficient network for speaker verification using context-aware masking. arXiv preprint arXiv:2303.00332
Okabe K, Koshinaka T, Shinoda K (2018) Attentive statistics pooling for deep speaker embedding. In: Proceedings of Interspeech, pp 2252–2256
Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. Proc Interspeech 2015:3214–3218. https://doi.org/10.21437/Interspeech.2015-647
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR), pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR)4
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Chapter, p 6
Loshchilov I, Hutter F (2017) SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the International Conference on Learning Representations (ICLR)
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Irfan M, Islam MR, Kim JM, Kim TS (2021) DeepShip: an underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Syst Appl 183:115270. https://doi.org/10.1016/j.eswa.2021.115270
Funding
This work is supported in part by Ningxia Natural Science Foundation General Project (2022AAC03757, 2023AAC03889), R&D Program of Beijing Municipal Education Commission (KM202410017006), and National Key Research and Development Program of China (2023YFC3011704-2).
Author information
Authors and Affiliations
Contributions
J.L., X.Y., and Y.C. contributed to the methodology design of the study. J.L., X.Y., L.Z., and W.W. were responsible for the implementation of the proposed approach. P.Y. and H.T. carried out the formal analysis and investigation. The original draft of the manuscript was written by X.Y. W.W., Y.C., and X.Z. contributed to the review and editing of the manuscript. J.L., L.Z., and W.W. supervised the entire project. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Chen, Y., Yang, X. et al. Underwater acoustic target recognition based on multi-scale feature and CRDNet. J Supercomput 81, 1358 (2025). https://doi.org/10.1007/s11227-025-07806-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s11227-025-07806-6