+

CN118135388A - An underwater target detection model and method based on image enhancement and AR mechanism - Google Patents

An underwater target detection model and method based on image enhancement and AR mechanism Download PDF

Info

Publication number
CN118135388A
CN118135388A CN202410418942.5A CN202410418942A CN118135388A CN 118135388 A CN118135388 A CN 118135388A CN 202410418942 A CN202410418942 A CN 202410418942A CN 118135388 A CN118135388 A CN 118135388A
Authority
CN
China
Prior art keywords
image
target detection
image enhancement
underwater
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410418942.5A
Other languages
Chinese (zh)
Inventor
张沁悦
郑冰
王柘
李继哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanya Institute Of Oceanography Ocean University Of China
Original Assignee
Sanya Institute Of Oceanography Ocean University Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanya Institute Of Oceanography Ocean University Of China filed Critical Sanya Institute Of Oceanography Ocean University Of China
Priority to CN202410418942.5A priority Critical patent/CN118135388A/en
Publication of CN118135388A publication Critical patent/CN118135388A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an underwater target detection model and method based on an image enhancement and AR mechanism, which belong to the technical field of underwater target image enhancement and detection. The use of the remaining attention structure allows the network to select valid feature information. The method combines residual feature attention blocks and novel combinations of multi-scale and multi-block structures. The multiple networks extract local features to accommodate various underwater images. The problems of low contrast, unclear edges, color distortion and the like of the original underwater image are solved. Then, detection of marine life based on visual attention and relational mechanisms employs a new approach to applying improved AR modules to high-efficiency marine life detectors that can well improve the recognition of life in complex underwater environments.

Description

一种基于图像增强和AR机制的水下目标检测模型及方法An underwater target detection model and method based on image enhancement and AR mechanism

技术领域Technical Field

本发明属于水下目标图像增强及检测技术领域,尤其涉及一种基于图像增强和AR机制的水下目标检测模型及方法。The present invention belongs to the technical field of underwater target image enhancement and detection, and in particular relates to an underwater target detection model and method based on image enhancement and AR mechanism.

背景技术Background technique

水下目标检测是水下探测应用中的一个重要课题。众所周知,海洋覆盖了地球表面的70%以上。海洋为人类提供了丰富的矿产资源、生物资源和海洋食物。《中国海洋经济公报》显示,2017年我国海洋相关国内生产总值超过7.7万亿,同比增长6.9%,海洋生产总值占GDP比重达9.4%。随着海洋资源价值的日益凸显,如何准确探测和识别水下目标已成为海洋技术研究领域的热点问题。Underwater target detection is an important topic in underwater detection applications. As we all know, the ocean covers more than 70% of the earth's surface. The ocean provides humans with rich mineral resources, biological resources and marine food. The "China Marine Economic Bulletin" shows that my country's marine-related GDP exceeded 7.7 trillion in 2017, a year-on-year increase of 6.9%, and the marine GDP accounted for 9.4% of GDP. With the increasing value of marine resources, how to accurately detect and identify underwater targets has become a hot issue in the field of marine technology research.

由于水下目标散射复杂、可见度低、目标杂波多,水下目标检测面临着巨大的挑战。传统的成像技术往往无法捕捉到水下目标的细节和表面特征,难以检测和识别目标的物质特征。因此对于水下图像进行图像增强是十分必要的。现有的水下图像增强算法可以分为两大类:基于图像模型的方法和基于深度学习的方法。基于图像模型的方法已经较为成熟,并且有各种各样的算法。它也可以分为两个主要类别:基于颜色平衡模型的方法和基于光学模型的方法。前者主要是根据自然图像的特点,如灰度世界假设,对彩色图像进行色彩校正。然而不同的技术虽各具优势,但也存在一些缺点:Due to the complex scattering of underwater targets, low visibility, and high target clutter, underwater target detection faces huge challenges. Traditional imaging technology often fails to capture the details and surface features of underwater targets, and it is difficult to detect and identify the material characteristics of the target. Therefore, it is very necessary to perform image enhancement on underwater images. Existing underwater image enhancement algorithms can be divided into two categories: image model-based methods and deep learning-based methods. Image model-based methods are relatively mature, and there are a variety of algorithms. It can also be divided into two main categories: color balance model-based methods and optical model-based methods. The former mainly performs color correction on color images based on the characteristics of natural images, such as the grayscale world hypothesis. However, although different technologies have their own advantages, they also have some disadvantages:

1.获取真实世界的水下图像数据集进行深度学习是相当困难的。为了解决这个问题,通常使用合成图像数据集进行基于深度学习的方法,但由于水下图像的复杂性和多样性,合成水下图像与真实的水下图像相差甚远。基于合成水下图像的方法对于合成水下图像和某些类型的真实水下图像是有效的。然而,这些方法缺乏处理各种各样的水下图像的能力。1. It is quite difficult to obtain real-world underwater image datasets for deep learning. To solve this problem, synthetic image datasets are usually used for deep learning-based methods, but due to the complexity and diversity of underwater images, synthetic underwater images are far from real underwater images. Methods based on synthetic underwater images are effective for synthetic underwater images and certain types of real underwater images. However, these methods lack the ability to handle a wide variety of underwater images.

2.在海洋生物检测时生物学家需要在获取到高质量图像或视频后,这些大规模的图像和视频中手动注释和识别物种及其位置以进行进一步的研究这样既耗时、费力又不适用于现场实时分析。2. When detecting marine life, biologists need to obtain high-quality images or videos, and then manually annotate and identify species and their locations in these large-scale images and videos for further research. This is time-consuming, laborious and not suitable for real-time analysis on site.

目前,深度卷积神经网络(CNN)已被广泛应用于图像和视频的更复杂的对象检测且展现出强大的在线检测能力。然而,海洋环境中的视频监控与陆地环境中的视频监控有很大的不同,针对海洋生物检测深度CNN面临着严峻的挑战:Currently, deep convolutional neural networks (CNNs) have been widely used for more complex object detection in images and videos and have shown strong online detection capabilities. However, video surveillance in marine environments is very different from video surveillance in terrestrial environments, and deep CNNs for marine life detection face severe challenges:

1.水下图像和视频遭受强烈的吸收、散射、颜色失真和来自人造光源以及海洋雪颗粒的噪声,导致图像模糊、模糊和蓝色或绿色色调。1. Underwater images and videos suffer from strong absorption, scattering, color distortion, and noise from artificial light sources and marine snow particles, resulting in blurred, fuzzy images and blue or green tints.

2.水下物体通常很难保持静止,特别是当海洋生物和水下航行器也在运动时,导致运动模糊和生物的各种姿势。2. It is often difficult for underwater objects to remain still, especially when marine life and underwater vehicles are also moving, resulting in motion blur and various postures of the creatures.

3.由于照明以及时空尺度的变化水下成像环境复杂多变。虽然低级别的图像增强可以在一定程度上解决这些问题,使水下图像数据更清晰,但一般的图像/视频理解算法仍然难以处理海洋环境中的此类挑战性问题。3. The underwater imaging environment is complex and changeable due to changes in lighting and spatiotemporal scales. Although low-level image enhancement can solve these problems to a certain extent and make underwater image data clearer, general image/video understanding algorithms still have difficulty handling such challenging problems in the marine environment.

综上所述,现有的水下目标检测技术应用在处理复杂水下环境时存在一定的缺陷。In summary, the existing underwater target detection technology has certain defects when dealing with complex underwater environments.

发明内容Summary of the invention

针对上述问题,本发明第一方面提出了一种基于图像增强和AR机制的水下目标检测模型,包括相连接的图像增强网络模块和目标检测模块;In view of the above problems, the first aspect of the present invention proposes an underwater target detection model based on image enhancement and AR mechanism, including an image enhancement network module and a target detection module connected to each other;

所述图像增强网络模块采用端到端的图像增强网络,并设计一种多尺度损失函数,基于采集到的图像数据集Dataset进行水下图像增强训练,用于得到强化后的水下图像网络数据;所述图像增强网络模块包含若干个FA模块,负责特征提取和融合,FA模块等价地对待不同的特征和像素区域,在处理不同类型的信息时提供额外的灵活性;The image enhancement network module adopts an end-to-end image enhancement network and designs a multi-scale loss function to perform underwater image enhancement training based on the collected image dataset Dataset to obtain enhanced underwater image network data; the image enhancement network module includes several FA modules, which are responsible for feature extraction and fusion. The FA modules treat different features and pixel areas equivalently, providing additional flexibility when processing different types of information;

所述目标检测模块包括特征提取模块和检测器模块,并在其中增加了AR机制,所述AR机制包括改进的Relation结构和Attention结构,所述Relation结构用于提升捕获通道关系的能力,所述Attention结构采用Self-Attention,用于捕捉图片的全局特征,降低图片背景噪声的影响;所述目标检测模块使用图像增强网络模块增强后的图像数据集进行训练,用于水下目标检测。The target detection module includes a feature extraction module and a detector module, and an AR mechanism is added thereto. The AR mechanism includes an improved Relation structure and an Attention structure. The Relation structure is used to improve the ability to capture channel relationships. The Attention structure adopts Self-Attention to capture the global features of the image and reduce the impact of background noise on the image. The target detection module is trained using an image data set enhanced by an image enhancement network module for underwater target detection.

优选的,所述图像增强网络模块包含6个FA模块,每个FA模块由通道注意力模块CA和像素注意力模块PA组成,CA能够帮助网络更好地学习通道间的特征重要性,PA能够增强网络中像素之间交互关系。Preferably, the image enhancement network module includes 6 FA modules, each FA module is composed of a channel attention module CA and a pixel attention module PA, CA can help the network better learn the importance of features between channels, and PA can enhance the interaction between pixels in the network.

优选的,所述通道注意力模块CA首先使用全局平均池将通道方式的全局空间信息带入通道描述符,如下式所示:Preferably, the channel attention module CA first uses a global average pool to bring the channel-wise global spatial information into the channel descriptor, as shown in the following formula:

其中,gc为特征向量,Fc为输入特征,Xc(i,j)为第c个通道Xc在坐标为(i,j)的像素点上的像素值,Hp为全局池化函数,W为图像的行维度、H为图像的列维度;Where g c is the feature vector, F c is the input feature, X c (i, j) is the pixel value of the cth channel X c at the pixel point with coordinates (i, j), H p is the global pooling function, W is the row dimension of the image, and H is the column dimension of the image;

然后将gc依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:Then g c is input into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula:

CAc=σ(Conv(δ(Conv(gc))))CA c =σ(Conv(δ(Conv(g c ))))

其中CAc为各通道权重,σ为sigmoid激活函数,δ为ReLu激活函数,Conv为卷积层;Where CA c is the weight of each channel, σ is the sigmoid activation function, δ is the ReLu activation function, and Conv is the convolutional layer;

最后将Fc与CAc进行点乘运算,如下式所示:Finally, perform a dot product operation on F c and CA c , as shown in the following formula:

其中表示点乘运算符。in Represents the dot product operator.

优选的,所述像素注意力模块PA,首先将CA输出的特征向量依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:Preferably, the pixel attention module PA first converts the feature vector output by CA into It is input into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula:

PA=σ(Conv(δ(Conv(F*))))PA=σ(Conv(δ(Conv(F * ))))

最后将F*与PA进行点乘运算,如下式所示:Finally, perform a dot product operation on F * and PA, as shown in the following formula:

其中为PA输出的特征向量。in is the feature vector output by PA.

优选的,所述设计一种多尺度损失函数,具体为:Preferably, the design of a multi-scale loss function is specifically:

针对被检测图像I∈RC×H×W,C×H×W为第I幅图像形状的特征图,其中W为行维度、H为列维度和C为通道数维度,从重建损失、感知损失与总变分损失三个尺度构建所述端对端图像增强网络的损失函数,如下式所示:For the detected image I∈R C×H×W , C×H×W is the feature map of the shape of the I-th image, where W is the row dimension, H is the column dimension and C is the channel number dimension. The loss function of the end-to-end image enhancement network is constructed from three scales: reconstruction loss, perceptual loss and total variation loss, as shown in the following formula:

L=λrLrpLpTVLTV L λrLr + λpLp + λTVLTV

其中,Lr、Lp、LTV分别对应图像的重建损失、感知损失、总变分损失;λr、λp、λTV为重建损失、感知损失和总变分损失的权重系数;Among them, L r , L p , and L TV correspond to the reconstruction loss, perceptual loss, and total variational loss of the image respectively; λ r , λ p , and λ TV are the weight coefficients of the reconstruction loss, perceptual loss, and total variational loss;

图像重建损失Lr的计算公式为:The calculation formula of image reconstruction loss Lr is:

其中,为重建后的图像I,||·||1为L1范数;in, is the reconstructed image I, ||·|| 1 is the L 1 norm;

图像感知损失Lp的计算公式为:The calculation formula of image perceptual loss Lp is:

其中,为图像I的特征向量,/>为图像/>的特征向量,||·||2为L2范数;in, is the feature vector of image I, /> For image/> The eigenvector of , ||·|| 2 is the L 2 norm;

总变分损失LTV的计算公式为:The calculation formula of the total variational loss L TV is:

其中,表示计算/>沿W轴的梯度,/>表示计算/>沿H轴的梯度。in, Indicates calculation/> The gradient along the W axis, /> Indicates calculation/> Gradient along the H axis.

优选的,所述改进的Relation结构采用了两层全连接层来构建改进的relation结构,实现对通道间的特征关系进行建模;利用两个相邻的全连接层对通道间的关系进行建模,并输出与输入特征相同数量的通道,关系运算定义为:Preferably, the improved Relation structure uses two fully connected layers to construct the improved relation structure to achieve modeling of the feature relationship between channels; two adjacent fully connected layers are used to model the relationship between channels, and the same number of channels as the input features are output, and the relation operation is defined as:

R(·)=σ(Wc2δ(Wc1))R(·)=σ(W c2 δ(W c1 ))

其中,Wc2和Wc1表示两个全连接层,δ(·)表示ReLU激活函数,σ(·)表示Sigmoid激活函数。Among them, W c2 and W c1 represent two fully connected layers, δ(·) represents the ReLU activation function, and σ(·) represents the Sigmoid activation function.

优选的,所述Attention结构采用Self-Attention,首先计算特征向量feat的查询向量Q,键向量K和值向量V,如下式所示:Preferably, the Attention structure adopts Self-Attention, and firstly calculates the query vector Q, key vector K and value vector V of the feature vector feat, as shown in the following formula:

Q=WQ*featQ=W Q *feat

K=WK*featK=W K *feat

V=WK*featV=W K *feat

其中WQ,WK和WK分别为Q,K和V的权重参数矩阵;Where W Q , W K and W K are the weight parameter matrices of Q, K and V respectively;

然后计算注意力权重a,如下式所示:Then calculate the attention weight a as shown below:

其中Dk为K的模长;Where D k is the modulus length of K;

最后计算输出向量:Finally, the output vector is calculated:

其中anj表示第j个输入到第n个输出关注的权重,vj为V的第j个元素,hn为注意力向量。Where anj represents the weight of attention from the j-th input to the n-th output, vj is the j-th element of V, and hn is the attention vector.

本发明第二方面提供了一种基于图像增强和AR机制的水下目标检测方法,包括以下过程:A second aspect of the present invention provides an underwater target detection method based on image enhancement and AR mechanism, comprising the following process:

获取待检测目标图像;Acquire the target image to be detected;

将待检测目标图像输入到如第一方面所述的水下目标检测模型中;Inputting the target image to be detected into the underwater target detection model as described in the first aspect;

输出目标检测结果。Output target detection results.

本发明第三方面提供了一种基于图像增强和AR机制的水下目标检测设备,其特征在于:所述设备包括至少一个处理器和至少一个存储器,所述处理器和存储器相耦合;所述存储器中存储有如第一方面所述的水下目标检测模型的计算机执行程序;所述处理器执行存储器中存储的计算机执行程序时,使处理器执行一种基于图像增强和AR机制的水下目标检测方法。The third aspect of the present invention provides an underwater target detection device based on image enhancement and AR mechanism, characterized in that: the device includes at least one processor and at least one memory, and the processor and memory are coupled; the memory stores a computer execution program of the underwater target detection model as described in the first aspect; when the processor executes the computer execution program stored in the memory, the processor executes an underwater target detection method based on image enhancement and AR mechanism.

本发明第四方面提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有如第一方面所述的水下目标检测模型的计算机程序或指令,所述程序或指令被处理器执行时,使处理器执行一种基于图像增强和AR机制的水下目标检测方法。The fourth aspect of the present invention provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program or instruction of the underwater target detection model as described in the first aspect, and when the program or instruction is executed by a processor, the processor executes an underwater target detection method based on image enhancement and AR mechanism.

与现有技术相比,本发明具有如下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

(1)自适应性:采用端到端的水下图像增强网络在复杂水下环境仍具有很强的自适应性,能够适应不同的水下场景,为水下生物识别起到了正向激励作用。(1) Adaptability: The end-to-end underwater image enhancement network still has strong adaptability in complex underwater environments and can adapt to different underwater scenes, which plays a positive role in underwater biometrics.

(2)较高的检测准确性:通过引入视觉注意和关系机制,提高了水下目标检测的准确性。(2) Higher detection accuracy: The accuracy of underwater target detection is improved by introducing visual attention and relationship mechanisms.

(3)具有实时性:该方法无需生物学家需要从大规模的图像和视频中手动注释和识别物种及其位置,可以实时装配,实时分析。(3) Real-time: This method does not require biologists to manually annotate and identify species and their locations from large-scale images and videos. It can be assembled and analyzed in real time.

总的来说,该研究的方法通过结合自适应性、视觉注意、关系机制、实时性等多个因素,提高了水下目标检测的效果,尤其适用于复杂的水下场景。In general, the method of this study improves the effect of underwater target detection by combining multiple factors such as adaptivity, visual attention, relational mechanism, and real-time, and is especially suitable for complex underwater scenes.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做简单的介绍,显而易见地,下面描述的仅仅是本发明的一个实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他附图。In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, what is described below is only one embodiment of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1为本发明基于注意力机制和端到端图像增强技术结合的水下目标检测模型图。FIG1 is a diagram of an underwater target detection model based on the present invention combining the attention mechanism and the end-to-end image enhancement technology.

图2为本发明增强网络结构示意图。FIG. 2 is a schematic diagram of an enhanced network structure of the present invention.

图3为本发明FA模块的网络框架图。FIG3 is a network framework diagram of the FA module of the present invention.

图4为本发明结合改进AR模型的目标检测模块示意图。FIG. 4 is a schematic diagram of a target detection module in combination with an improved AR model according to the present invention.

图5为本发明改进的AR机制框架图。FIG5 is a diagram showing the improved AR mechanism framework of the present invention.

图6为实施例2中水下目标检测设备的简易结构框图。FIG6 is a simplified structural block diagram of the underwater target detection device in Example 2.

具体实施方式Detailed ways

下面结合具体实施例对发明进行进一步说明。The invention will be further described below in conjunction with specific embodiments.

基于注意力机制和CNN网络将图像增强技术和水下目标检测技术有机结合能够有效提高水下目标识别准确率,对于复杂水下环境中生物识别能力有效增强。在发明中,首先采用了一种端到端的水下图像增强网络,它结合了多补丁和多尺度的剩余注意力结构,可以增强真实世界的水下图像的多样性。采用剩余注意力结构允许网络选择有效的特征信息。该方法结合了残差特征关注块和多尺度与多块结构的新颖组合。多块网络提取局部特征,以适应各种水下图像。克服了原始水下图像存在对比度低、边缘不清晰和颜色失真等问题。然后,基于视觉注意和关系机制对于海洋生物进行检测采用了一种将改进的注意-关系(AR)模块应用于高效海洋生物检测器(EMOD)的新方法,该方法能够很好地提高复杂水下环境中生物的识别能力。The organic combination of image enhancement technology and underwater target detection technology based on the attention mechanism and CNN network can effectively improve the accuracy of underwater target recognition and effectively enhance the ability to identify organisms in complex underwater environments. In the invention, an end-to-end underwater image enhancement network is first used, which combines multi-patch and multi-scale residual attention structures to enhance the diversity of real-world underwater images. The use of residual attention structures allows the network to select effective feature information. The method combines residual feature attention blocks and a novel combination of multi-scale and multi-block structures. The multi-block network extracts local features to adapt to various underwater images. It overcomes the problems of low contrast, unclear edges and color distortion in the original underwater images. Then, based on visual attention and relational mechanisms, a new method of applying an improved attention-relation (AR) module to an efficient marine organism detector (EMOD) is used for detecting marine organisms. This method can well improve the recognition ability of organisms in complex underwater environments.

本发明基于注意力机制和端到端图像增强技术结合的水下目标检测模型流程如图1所示,包括相连接的图像增强网络模块和目标检测模块;The underwater target detection model process based on the combination of attention mechanism and end-to-end image enhancement technology of the present invention is shown in Figure 1, including a connected image enhancement network module and a target detection module;

图像增强网络模块采用端到端的图像增强网络,并设计一种多尺度损失函数,基于采集到的图像数据集Dataset进行水下图像增强训练,用于得到强化后的水下图像网络数据;图像增强网络模块包含若干个FA模块,负责特征提取和融合,FA模块等价地对待不同的特征和像素区域,在处理不同类型的信息时提供额外的灵活性;The image enhancement network module adopts an end-to-end image enhancement network and designs a multi-scale loss function. It performs underwater image enhancement training based on the collected image dataset Dataset to obtain enhanced underwater image network data. The image enhancement network module contains several FA modules, which are responsible for feature extraction and fusion. The FA modules treat different features and pixel areas equivalently, providing additional flexibility when processing different types of information.

目标检测模块包括特征提取模块和检测器模块,并在其中增加了AR机制,所述AR机制包括改进的Relation结构和Attention结构,所述Relation结构用于提升捕获通道关系的能力,所述Attention结构采用Self-Attention,用于捕捉图片的全局特征,降低图片背景噪声的影响;所述目标检测模块使用图像增强网络模块增强后的图像数据集进行训练,用于水下目标检测。The target detection module includes a feature extraction module and a detector module, and an AR mechanism is added thereto. The AR mechanism includes an improved Relation structure and an Attention structure. The Relation structure is used to improve the ability to capture channel relationships, and the Attention structure adopts Self-Attention to capture the global features of the image and reduce the impact of background noise on the image. The target detection module is trained using an image data set enhanced by an image enhancement network module for underwater target detection.

1.图像数据集Dataset的获取1. Obtaining the image dataset

本发明采用不同水域的水下摄像机实时监视图像数据。收集的数据为统一标准尺寸的图像,本发明将图像的标准尺寸设置为640*640像素。每张图片中根据实地环境,采集到的图像质量不同。收集水下待检测目标的视频数据将数据处理结果进行整合得到数据集Dataset。数据包含鲈鱼目,鲤鱼目和扇贝目。The present invention uses underwater cameras in different waters to monitor image data in real time. The collected data is an image of a unified standard size. The present invention sets the standard size of the image to 640*640 pixels. The quality of the image collected in each picture is different according to the actual environment. The video data of the underwater target to be detected is collected and the data processing results are integrated to obtain a data set Dataset. The data includes the order Perciformes, the order Cypriniformes and the order Scallopoides.

2.图像增强网络2. Image Enhancement Network

端对端图像增强网络结构如图2所示,该网络包含6个FA模块,负责特征提取和融合,FA模块等价地对待不同的特征和像素区域,可以在处理不同类型的信息时提供额外的灵活性。每个FA模块由通道注意力模块(CA)和像素注意力模块(PA)组成,CA能够帮助网络更好地学习通道间的特征重要性,PA能够增强网络中像素之间交互关系。FA模块的网络框架如图3所示。The end-to-end image enhancement network structure is shown in Figure 2. The network contains 6 FA modules, which are responsible for feature extraction and fusion. The FA modules treat different features and pixel regions equivalently, which can provide additional flexibility when processing different types of information. Each FA module consists of a channel attention module (CA) and a pixel attention module (PA). CA can help the network better learn the importance of features between channels, and PA can enhance the interaction between pixels in the network. The network framework of the FA module is shown in Figure 3.

通道注意力模块CA首先使用全局平均池将通道方式的全局空间信息带入通道描述符,如下式所示:The channel attention module CA first uses the global average pooling to bring the channel-wise global spatial information into the channel descriptor, as shown in the following formula:

其中,gc为特征向量,Fc为输入特征,Xc(i,j)为第c个通道Xc在坐标为(i,j)的像素点上的像素值,Hp为全局池化函数,W为图像的行维度、H为图像的列维度;Where g c is the feature vector, F c is the input feature, X c (i, j) is the pixel value of the cth channel X c at the pixel point with coordinates (i, j), H p is the global pooling function, W is the row dimension of the image, and H is the column dimension of the image;

然后将gc依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:Then g c is input into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula:

CAc=σ(Conv(δ(Conv(gc))))CA c =σ(Conv(δ(Conv(g c ))))

其中CAc为各通道权重,σ为sigmoid激活函数,δ为ReLu激活函数,Conv为卷积层;Where CA c is the weight of each channel, σ is the sigmoid activation function, δ is the ReLu activation function, and Conv is the convolutional layer;

最后将Fc与CAc进行点乘运算,如下式所示:Finally, perform a dot product operation on F c and CA c , as shown in the following formula:

其中表示点乘运算符。in Represents the dot product operator.

像素注意力模块PA,首先将CA输出的特征向量依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:The pixel attention module PA first converts the feature vector output by CA into It is input into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula:

PA=σ(Conv(δ(Conv(F*))))PA=σ(Conv(δ(Conv(F * ))))

最后将F*与PA进行点乘运算,如下式所示:Finally, perform a dot product operation on F * and PA, as shown in the following formula:

其中为PA输出的特征向量。in is the feature vector output by PA.

为端对端图像增强网络设计了一种多尺度损失函数,针对被检测图像I∈RC×H×W,C×H×W为第I幅图像形状的特征图,其中W为行维度、H为列维度和C为通道数维度,从重建损失、感知损失与总变分损失三个尺度构建所述端对端图像增强网络的损失函数,如下式所示:A multi-scale loss function is designed for an end-to-end image enhancement network. For the detected image I∈R C×H×W , C×H×W is the feature map of the shape of the I-th image, where W is the row dimension, H is the column dimension, and C is the channel number dimension. The loss function of the end-to-end image enhancement network is constructed from three scales: reconstruction loss, perceptual loss, and total variational loss, as shown in the following formula:

L=λrLrpLpTVLTV L λrLr + λpLp + λTVLTV

其中,Lr、Lp、LTV分别对应图像的重建损失、感知损失、总变分损失;λr、λp、λTV为重建损失、感知损失和总变分损失的权重系数。Among them, L r , L p , and L TV correspond to the reconstruction loss, perceptual loss, and total variational loss of the image, respectively; λ r , λ p , and λ TV are the weight coefficients of the reconstruction loss, perceptual loss, and total variational loss.

图像重建损失Lr的计算公式为:The calculation formula of image reconstruction loss Lr is:

其中,为重建后的图像I,||·||1为L1范数。in, is the reconstructed image I, and ||·|| 1 is the L 1 norm.

图像感知损失Lp的计算公式为The calculation formula of image perceptual loss Lp is:

其中,为图像I的特征向量,/>为图像/>的特征向量,||·||2为L2范数。in, is the feature vector of image I, /> For image/> is the eigenvector of , and ||·|| 2 is the L 2 norm.

总变分损失LTV的计算公式为The calculation formula of the total variational loss L TV is:

其中,表示计算/>沿W轴的梯度,/>表示计算/>沿H轴的梯度。in, Indicates calculation/> The gradient along the W axis, /> Indicates calculation/> Gradient along the H axis.

3.目标检测模块3. Object Detection Module

目标检测模块的整体结构如图4所示,包括特征提取模块和检测器模块,并在其中增加了AR机制,AR机制包括改进的Relation结构和Attention结构,如图5所示;Relation结构用于提升捕获通道关系的能力,Attention结构采用Self-Attention,用于捕捉图片的全局特征,降低图片背景噪声的影响;目标检测模块使用图像增强网络模块增强后的图像数据集进行训练,用于水下目标检测。The overall structure of the target detection module is shown in Figure 4, which includes a feature extraction module and a detector module, and an AR mechanism is added thereto. The AR mechanism includes an improved Relation structure and an Attention structure, as shown in Figure 5. The Relation structure is used to improve the ability to capture channel relationships, and the Attention structure uses Self-Attention to capture the global features of the image and reduce the impact of background noise on the image. The target detection module is trained using the image dataset enhanced by the image enhancement network module for underwater target detection.

本发明通过改进AR模块中的Relation结构提升捕获通道关系的能力,改进的Relation结构采用了两层全连接层来构建改进的relation结构,实现对通道间的特征关系进行建模;利用两个相邻的全连接层对通道间的关系进行建模,并输出与输入特征相同数量的通道,关系运算定义为:The present invention improves the ability to capture channel relationships by improving the Relation structure in the AR module. The improved Relation structure uses two fully connected layers to construct the improved relation structure to achieve modeling of feature relationships between channels. Two adjacent fully connected layers are used to model the relationship between channels, and the same number of channels as the input features are output. The relation operation is defined as:

R(·)=σ(Wc2δ(Wc1))R(·)=σ(W c2 δ(W c1 ))

其中,Wc2和Wc1表示两个全连接层,δ(·)表示ReLU激活函数,σ(·)表示Sigmoid激活函数。Among them, W c2 and W c1 represent two fully connected layers, δ(·) represents the ReLU activation function, and σ(·) represents the Sigmoid activation function.

改进的Attention结构采用Self-Attention,首先计算特征向量feat的查询向量Q,键向量K和值向量V,如下式所示:The improved Attention structure adopts Self-Attention. First, the query vector Q, key vector K and value vector V of the feature vector feat are calculated, as shown in the following formula:

Q=WQ*featQ=W Q *feat

K=WK*featK=W K *feat

V=WK*featV=W K *feat

其中WQ,WK和Wk分别为Q,K和V的权重参数矩阵;Where W Q , W K and W k are the weight parameter matrices of Q, K and V respectively;

然后计算注意力权重a,如下式所示:Then calculate the attention weight a as shown below:

其中Dk为K的模长;Where D k is the modulus length of K;

最后计算输出向量:Finally, the output vector is calculated:

其中anj表示第j个输入到第n个输出关注的权重,vj为V的第j个元素,hn为注意力向量。Where anj represents the weight of attention from the j-th input to the n-th output, vj is the j-th element of V, and hn is the attention vector.

4.实验结果4. Experimental results

为了验证本发明提出的一种基于注意力机制和端到端图像增强技术结合的水下目标检测方法,采用700张鲤鱼目,700张鲈鱼目和700张扇贝目的图片作为训练集,300张鲤鱼目,300张鲈鱼目和300张扇贝目的图片作为测试集。将本发明提出的方法与著名算法Faster R-CNN,YOLOv5和SSD512进行比较,评估指标是平均精度(average precision,AP),其计算为IoU(intersection over union)阈值从0.5到0.95的精度-召回曲线下的面积,每种算法独立运行五次,结果如表1所示。In order to verify the underwater target detection method based on the combination of attention mechanism and end-to-end image enhancement technology proposed in the present invention, 700 carp, 700 perch and 700 scallop pictures were used as training sets, and 300 carp, 300 perch and 300 scallop pictures were used as test sets. The method proposed in the present invention is compared with the famous algorithms Faster R-CNN, YOLOv5 and SSD512. The evaluation index is the average precision (AP), which is calculated as the area under the precision-recall curve with IoU (intersection over union) thresholds from 0.5 to 0.95. Each algorithm is run five times independently, and the results are shown in Table 1.

表1实验结果Table 1 Experimental results

本发明方法拥有最高的平均精度和第二低的标准差,说明本发明方法对鱼类的分类精度较高且具有良好的鲁棒性。The method of the present invention has the highest average accuracy and the second lowest standard deviation, which shows that the method of the present invention has high classification accuracy for fish and good robustness.

实施例2:Embodiment 2:

如图6所示,本发明同时提供了一种基于图像增强和AR机制的水下目标检测设备,设备包括至少一个处理器和至少一个存储器,同时还包括通信接口和内部总线;存储器中存储有如实施例1所述的水下目标检测模型的计算机执行程序;所述处理器执行存储器存储的计算机执行程序时,可以使处理器执行一种基于图像增强和AR机制的水下目标检测方法。其中内部总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(XtendedIndustry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。其中存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。As shown in FIG6 , the present invention also provides an underwater target detection device based on image enhancement and AR mechanism, the device includes at least one processor and at least one memory, and also includes a communication interface and an internal bus; the memory stores a computer execution program of the underwater target detection model as described in Example 1; when the processor executes the computer execution program stored in the memory, the processor can execute an underwater target detection method based on image enhancement and AR mechanism. The internal bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, the bus in the drawings of the present application is not limited to only one bus or one type of bus. The memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and may also be a U disk, a mobile hard disk, a read-only memory, a disk or an optical disk.

设备可以被提供为终端、服务器或其它形态的设备。The device may be provided as a terminal, a server or other forms of devices.

图6是为示例性示出的一种设备的框图。设备可以包括以下一个或多个组件:处理组件,存储器,电源组件,多媒体组件,音频组件,输入/输出(I/O)的接口,传感器组件,以及通信组件。处理组件通常控制电子设备的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件可以包括一个或多个处理器来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件可以包括一个或多个模块,便于处理组件和其他组件之间的交互。例如,处理组件可以包括多媒体模块,以方便多媒体组件和处理组件之间的交互。FIG6 is a block diagram of a device for exemplary purposes. The device may include one or more of the following components: a processing component, a memory, a power component, a multimedia component, an audio component, an input/output (I/O) interface, a sensor component, and a communication component. The processing component typically controls the overall operation of the electronic device, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component may include one or more processors to execute instructions to complete all or part of the steps of the above method. In addition, the processing component may include one or more modules to facilitate interaction between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.

存储器被配置为存储各种类型的数据以支持在电子设备的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory is configured to store various types of data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phone book data, messages, pictures, videos, etc. The memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

电源组件为电子设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电子设备生成、管理和分配电力相关联的组件。多媒体组件包括在所述电子设备和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件包括一个前置摄像头和/或后置摄像头。当电子设备处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The power supply assembly provides power to various components of the electronic device. The power supply assembly may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device. The multimedia assembly includes a screen that provides an output interface between the electronic device and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundaries of the touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia assembly includes a front camera and/or a rear camera. When the electronic device is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

音频组件被配置为输出和/或输入音频信号。例如,音频组件包括一个麦克风(MIC),当电子设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音频信号。I/O接口为处理组件和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The audio component is configured to output and/or input audio signals. For example, the audio component includes a microphone (MIC), and when the electronic device is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signal can be further stored in a memory or sent via a communication component. In some embodiments, the audio component also includes a speaker for outputting an audio signal. The I/O interface provides an interface between the processing component and the peripheral interface module, and the above-mentioned peripheral interface module can be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

传感器组件包括一个或多个传感器,用于为电子设备提供各个方面的状态评估。例如,传感器组件可以检测到电子设备的打开/关闭状态,组件的相对定位,例如所述组件为电子设备的显示器和小键盘,传感器组件还可以检测电子设备或电子设备一个组件的位置改变,用户与电子设备接触的存在或不存在,电子设备方位或加速/减速和电子设备的温度变化。传感器组件可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。The sensor assembly includes one or more sensors for providing various aspects of status assessment for the electronic device. For example, the sensor assembly can detect the on/off state of the electronic device, the relative positioning of components, such as the display and keypad of the electronic device, and the sensor assembly can also detect the position change of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, the orientation or acceleration/deceleration of the electronic device and the temperature change of the electronic device. The sensor assembly may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may also include an accelerometer, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信组件被配置为便于电子设备和其他设备之间有线或无线方式的通信。电子设备可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中,电子设备可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, the electronic device may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to perform the above methods.

实施例3:Embodiment 3:

本发明还提供了一种计算机可读存储介质,计算机可读存储介质中存储有如实施例1所述的水下目标检测模型的计算机程序或指令,所述程序或指令被处理器执行时可以使处理器执行一种基于图像增强和AR机制的水下目标检测方法。The present invention also provides a computer-readable storage medium, which stores a computer program or instruction of the underwater target detection model as described in Example 1. When the program or instruction is executed by a processor, the processor can execute an underwater target detection method based on image enhancement and AR mechanism.

具体地,可以提供配有可读存储介质的系统、装置或设备,在该可读存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统、装置或设备的计算机或处理器读出并执行存储在该可读存储介质中的指令。在这种情况下,从可读介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此机器可读代码和存储机器可读代码的可读存储介质构成了本发明的一部分。Specifically, a system, device or equipment equipped with a readable storage medium may be provided, on which a software program code for implementing the functions of any of the above-mentioned embodiments is stored, and a computer or processor of the system, device or equipment reads and executes the instructions stored in the readable storage medium. In this case, the program code read from the readable medium itself can implement the functions of any of the above-mentioned embodiments, so the machine-readable code and the readable storage medium storing the machine-readable code constitute a part of the present invention.

上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘(如CD-ROM、CD-R、CD-RW、DVD-20ROM、DVD-RAM、DVD-RW、DVD-RW)、磁带等。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk (such as CD-ROM, CD-R, CD-RW, DVD-20ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tape, etc. The storage medium can be any available medium that can be accessed by a general or special-purpose computer.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。The computer program instructions for performing the operation of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as "C" language or similar programming languages. Computer-readable program instructions may be executed completely on a user's computer, partially on a user's computer, as an independent software package, partially on a user's computer, partially on a remote computer, or completely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., using an Internet service provider to connect via the Internet). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be personalized by utilizing the state information of the computer-readable program instructions, and the electronic circuit may execute the computer-readable program instructions, thereby realizing various aspects of the present disclosure.

以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description is only the preferred embodiment of the present application and is not intended to limit the present application. For those skilled in the art, the present application may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the protection scope of the present application.

上述虽然对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the above describes the specific implementation methods of the present invention, it is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art on the basis of the technical solution of the present invention without creative work are still within the scope of protection of the present invention.

Claims (10)

1.一种基于图像增强和AR机制的水下目标检测模型,其特征在于:包括相连接的图像增强网络模块和目标检测模块;1. An underwater target detection model based on image enhancement and AR mechanism, characterized by: comprising an image enhancement network module and a target detection module connected; 所述图像增强网络模块采用端到端的图像增强网络,并设计一种多尺度损失函数,基于采集到的图像数据集Dataset进行水下图像增强训练,用于得到强化后的水下图像网络数据;所述图像增强网络模块包含若干个FA模块,负责特征提取和融合,FA模块等价地对待不同的特征和像素区域,在处理不同类型的信息时提供额外的灵活性;The image enhancement network module adopts an end-to-end image enhancement network and designs a multi-scale loss function to perform underwater image enhancement training based on the collected image dataset Dataset to obtain enhanced underwater image network data; the image enhancement network module includes several FA modules, which are responsible for feature extraction and fusion. The FA modules treat different features and pixel areas equivalently, providing additional flexibility when processing different types of information; 所述目标检测模块包括特征提取模块和检测器模块,并在其中增加了AR机制,所述AR机制包括改进的Relation结构和Attention结构,所述Relation结构用于提升捕获通道关系的能力,所述Attention结构采用Self-Attention,用于捕捉图片的全局特征,降低图片背景噪声的影响;所述目标检测模块使用图像增强网络模块增强后的图像数据集进行训练,用于水下目标检测。The target detection module includes a feature extraction module and a detector module, and an AR mechanism is added thereto. The AR mechanism includes an improved Relation structure and an Attention structure. The Relation structure is used to improve the ability to capture channel relationships. The Attention structure adopts Self-Attention to capture the global features of the image and reduce the impact of background noise on the image. The target detection module is trained using an image data set enhanced by an image enhancement network module for underwater target detection. 2.如权利要求1所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于:所述图像增强网络模块包含6个FA模块,每个FA模块由通道注意力模块CA和像素注意力模块PA组成,CA能够帮助网络更好地学习通道间的特征重要性,PA能够增强网络中像素之间交互关系。2. An underwater target detection model based on image enhancement and AR mechanism as described in claim 1, characterized in that: the image enhancement network module includes 6 FA modules, each FA module is composed of a channel attention module CA and a pixel attention module PA, CA can help the network better learn the importance of features between channels, and PA can enhance the interaction between pixels in the network. 3.如权利要求2所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于:所述通道注意力模块CA首先使用全局平均池将通道方式的全局空间信息带入通道描述符,如下式所示:3. An underwater target detection model based on image enhancement and AR mechanism as claimed in claim 2, characterized in that: the channel attention module CA first uses a global average pool to bring the global spatial information of the channel into the channel descriptor, as shown in the following formula: 其中,gc为特征向量,Fc为输入特征,Xc(i,j)为第c个通道Xc在坐标为(i,j)的像素点上的像素值,Hp为全局池化函数,W为图像的行维度、H为图像的列维度;Where g c is the feature vector, F c is the input feature, X c (i, j) is the pixel value of the cth channel X c at the pixel point with coordinates (i, j), H p is the global pooling function, W is the row dimension of the image, and H is the column dimension of the image; 然后将gc依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:Then g c is input into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula: CAc=σ(Conv(δ(Conv(gc))))CA c =σ(Conv(δ(Conv(g c )))) 其中CAc为各通道权重,σ为sigmoid激活函数,δ为ReLu激活函数,Conv为卷积层;Where CA c is the weight of each channel, σ is the sigmoid activation function, δ is the ReLu activation function, and Conv is the convolutional layer; 最后将Fc与CAc进行点乘运算,如下式所示:Finally, perform a dot product operation on F c and CA c , as shown in the following formula: 其中表示点乘运算符。in Represents the dot product operator. 4.如权利要求3所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于:所述像素注意力模块PA,首先将CA输出的特征向量Fc *依次输入到两层卷积层、sigmoid激活函数和ReLu激活函数中,如下式所示:4. An underwater target detection model based on image enhancement and AR mechanism as claimed in claim 3, characterized in that: the pixel attention module PA first inputs the feature vector F c * output by CA into two convolutional layers, sigmoid activation function and ReLu activation function in sequence, as shown in the following formula: 最后将F*与PA进行点乘运算,如下式所示:Finally, perform a dot product operation on F * and PA, as shown in the following formula: 其中为PA输出的特征向量。in is the feature vector output by PA. 5.如权利要求1所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于,所述设计一种多尺度损失函数,具体为:5. The underwater target detection model based on image enhancement and AR mechanism according to claim 1, characterized in that the multi-scale loss function is designed, specifically: 针对被检测图像I∈RC×H×W,C×H×W为第I幅图像形状的特征图,其中W为行维度、H为列维度和C为通道数维度,从重建损失、感知损失与总变分损失三个尺度构建所述端对端图像增强网络的损失函数,如下式所示:For the detected image I∈R C×H×W , C×H×W is the feature map of the shape of the I-th image, where W is the row dimension, H is the column dimension and C is the channel number dimension. The loss function of the end-to-end image enhancement network is constructed from three scales: reconstruction loss, perceptual loss and total variation loss, as shown in the following formula: L=λrLrpLpTVLTV L λrLr + λpLp + λTVLTV 其中,Lr、Lp、LTV分别对应图像的重建损失、感知损失、总变分损失;λr、λp、λTV为重建损失、感知损失和总变分损失的权重系数;Among them, L r , L p , and L TV correspond to the reconstruction loss, perceptual loss, and total variational loss of the image respectively; λ r , λ p , and λ TV are the weight coefficients of the reconstruction loss, perceptual loss, and total variational loss; 图像重建损失Lr的计算公式为:The calculation formula of image reconstruction loss Lr is: 其中,为重建后的图像I,||·||1为L1范数;in, is the reconstructed image I, ||·|| 1 is the L 1 norm; 图像感知损失Lp的计算公式为:The calculation formula of image perceptual loss Lp is: 其中,为图像I的特征向量,/>为图像/>的特征向量,||·||2为L2范数;in, is the feature vector of image I, /> For image/> The eigenvector of , ||·|| 2 is the L 2 norm; 总变分损失LTV的计算公式为:The calculation formula of the total variational loss L TV is: 其中,表示计算/>沿W轴的梯度,/>表示计算/>沿H轴的梯度。in, Indicates calculation/> The gradient along the W axis, /> Indicates calculation/> Gradient along the H axis. 6.如权利要求1所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于:所述改进的Relation结构采用了两层全连接层来构建改进的relation结构,实现对通道间的特征关系进行建模;利用两个相邻的全连接层对通道间的关系进行建模,并输出与输入特征相同数量的通道,关系运算定义为:6. The underwater target detection model based on image enhancement and AR mechanism according to claim 1, characterized in that: the improved Relation structure uses two fully connected layers to construct the improved relation structure to achieve modeling of feature relationships between channels; two adjacent fully connected layers are used to model the relationship between channels, and the same number of channels as the input features are output, and the relation operation is defined as: R(·)=σ(Wc2δ(Wc1))R(·)=σ(W c2 δ(W c1 )) 其中,Wc2和Wc1表示两个全连接层,δ(·)表示ReLU激活函数,σ(·)表示Sigmoid激活函数。Among them, W c2 and W c1 represent two fully connected layers, δ(·) represents the ReLU activation function, and σ(·) represents the Sigmoid activation function. 7.如权利要求1所述的一种基于图像增强和AR机制的水下目标检测模型,其特征在于:所述Attention结构采用Self-Attention,首先计算特征向量feat的查询向量Q,键向量K和值向量V,如下式所示:7. The underwater target detection model based on image enhancement and AR mechanism as claimed in claim 1, characterized in that: the Attention structure adopts Self-Attention, firstly calculates the query vector Q, key vector K and value vector V of the feature vector feat, as shown in the following formula: Q=WQ*featQ=W Q *feat K=WK*featK=W K *feat V=WK*featV=W K *feat 其中WQ,WK和WK分别为Q,K和V的权重参数矩阵;Where W Q , W K and W K are the weight parameter matrices of Q, K and V respectively; 然后计算注意力权重a,如下式所示:Then calculate the attention weight a as shown below: 其中Dk为K的模长;Where D k is the modulus length of K; 最后计算输出向量:Finally, the output vector is calculated: 其中anj表示第j个输入到第n个输出关注的权重,vj为V的第j个元素,hn为注意力向量。Where anj represents the weight of attention from the j-th input to the n-th output, vj is the j-th element of V, and hn is the attention vector. 8.一种基于图像增强和AR机制的水下目标检测方法,其特征在于,包括以下过程:8. An underwater target detection method based on image enhancement and AR mechanism, characterized by comprising the following processes: 获取待检测目标图像;Acquire the target image to be detected; 将待检测目标图像输入到如权利要求1至7任意一项所述的水下目标检测模型中;Inputting the target image to be detected into the underwater target detection model according to any one of claims 1 to 7; 输出目标检测结果。Output target detection results. 9.一种基于图像增强和AR机制的水下目标检测设备,其特征在于:所述设备包括至少一个处理器和至少一个存储器,所述处理器和存储器相耦合;所述存储器中存储有如权利要求1至7任意一项所述的水下目标检测模型的计算机执行程序;所述处理器执行存储器中存储的计算机执行程序时,使处理器执行一种基于图像增强和AR机制的水下目标检测方法。9. An underwater target detection device based on image enhancement and AR mechanism, characterized in that: the device includes at least one processor and at least one memory, the processor and the memory are coupled; the memory stores a computer execution program of the underwater target detection model as described in any one of claims 1 to 7; when the processor executes the computer execution program stored in the memory, the processor executes an underwater target detection method based on image enhancement and AR mechanism. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有如权利要求1至7任意一项所述的水下目标检测模型的计算机程序或指令,所述程序或指令被处理器执行时,使处理器执行一种基于图像增强和AR机制的水下目标检测方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program or instruction of an underwater target detection model as described in any one of claims 1 to 7, and when the program or instruction is executed by a processor, the processor executes an underwater target detection method based on image enhancement and AR mechanism.
CN202410418942.5A 2024-04-09 2024-04-09 An underwater target detection model and method based on image enhancement and AR mechanism Pending CN118135388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410418942.5A CN118135388A (en) 2024-04-09 2024-04-09 An underwater target detection model and method based on image enhancement and AR mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410418942.5A CN118135388A (en) 2024-04-09 2024-04-09 An underwater target detection model and method based on image enhancement and AR mechanism

Publications (1)

Publication Number Publication Date
CN118135388A true CN118135388A (en) 2024-06-04

Family

ID=91242798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410418942.5A Pending CN118135388A (en) 2024-04-09 2024-04-09 An underwater target detection model and method based on image enhancement and AR mechanism

Country Status (1)

Country Link
CN (1) CN118135388A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 A Face Recognition Detection Method Based on Hybrid Attention Mechanism
CN113344806A (en) * 2021-07-23 2021-09-03 中山大学 Image defogging method and system based on global feature fusion attention network
CN116385896A (en) * 2023-03-20 2023-07-04 西安电子科技大学 A remote sensing small target detection method, system, device and medium based on fusion cascade attention mechanism
CN116402721A (en) * 2023-05-15 2023-07-07 大连海事大学 Underwater image enhancement method based on contrast perception loss
CN116777782A (en) * 2023-06-21 2023-09-19 四川师范大学 A multi-patch defogging method based on dual attention level feature fusion
CN117671473A (en) * 2024-02-01 2024-03-08 中国海洋大学 Underwater target detection model and method based on attention and multi-scale feature fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 A Face Recognition Detection Method Based on Hybrid Attention Mechanism
CN113344806A (en) * 2021-07-23 2021-09-03 中山大学 Image defogging method and system based on global feature fusion attention network
CN116385896A (en) * 2023-03-20 2023-07-04 西安电子科技大学 A remote sensing small target detection method, system, device and medium based on fusion cascade attention mechanism
CN116402721A (en) * 2023-05-15 2023-07-07 大连海事大学 Underwater image enhancement method based on contrast perception loss
CN116777782A (en) * 2023-06-21 2023-09-19 四川师范大学 A multi-patch defogging method based on dual attention level feature fusion
CN117671473A (en) * 2024-02-01 2024-03-08 中国海洋大学 Underwater target detection model and method based on attention and multi-scale feature fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAN HU ET AL.: "Relation Networks for Object Detection", 《ARXIV:1711.11575V2》, 14 June 2018 (2018-06-14), pages 1 - 11 *
傅罡: "《人工智能注意力机制体系、模型与算法剖析》", 31 March 2024, 北京:机械工业出版社, pages: 115 - 116 *
尤洋: "《实战AI大模型》", 31 January 2024, 北京:机械工业出版社, pages: 98 *

Similar Documents

Publication Publication Date Title
Yang et al. Underwater image enhancement based on conditional generative adversarial network
CN110610510B (en) Target tracking method, device, electronic device and storage medium
CN111783620B (en) Expression recognition method, device, equipment and storage medium
Yang et al. Single image haze removal via region detection network
CN110399888B (en) Weiqi judging system based on MLP neural network and computer vision
CN110443366B (en) Neural network optimization method and device, and target detection method and device
CN116580305B (en) A tea bud detection method based on deep learning and its model building method
CN112184635A (en) Object detection method, device, storage medium and device
WO2018233254A1 (en) Terminal-based object recognition method, device and electronic device
CN115661628A (en) A Fish Detection Method Based on Improved YOLOv5S Model
CN117671473B (en) Underwater target detection model and method based on attention and multi-scale feature fusion
CN109919073B (en) Pedestrian re-identification method with illumination robustness
CN116863286B (en) Double-flow target detection method and model building method thereof
CN112116620A (en) A method for semantic segmentation and painting display of indoor images
CN116778415A (en) Crowd counting network model for unmanned aerial vehicle and counting method
CN117636341B (en) Multi-frame seaweed microscopic image enhancement recognition method and model building method thereof
CN115331097A (en) Image detection model training method, device and image detection method
CN115623313A (en) Image processing method, image processing apparatus, electronic device, and storage medium
WO2022179087A1 (en) Video processing method and apparatus
CN116704385A (en) Method for detecting and tracking moving object target under unmanned airport scene and model thereof
CN113962873A (en) Image denoising method, storage medium and terminal device
CN117690011B (en) Object detection method and model building method suitable for noisy underwater scenes
CN119012001A (en) Auxiliary shooting method, auxiliary shooting device, electronic equipment and storage medium
CN118379629A (en) A building color recognition detection model and method based on clustering algorithm
CN118135388A (en) An underwater target detection model and method based on image enhancement and AR mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载