CN114663292A

CN114663292A - Ultra-lightweight picture defogging and identification network model and picture defogging and identification method

Info

Publication number: CN114663292A
Application number: CN202011527239.6A
Authority: CN
Inventors: 王中风; 王美琪; 苏天祺; 陈思依; 林军
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-06-24
Anticipated expiration: 2040-12-22
Also published as: CN114663292B

Abstract

The application discloses an ultra-lightweight picture defogging and identification network model, through which the defogging and identification of pictures are realized, and the network model comprises a bidirectional GAN network model and a target detection network model which are sequentially connected. And the bidirectional GAN network model demists the fog image and outputs a clear image to the target detection network model for feature recognition processing. And performing pruning retraining on the target detection network model, wherein the training process comprises the steps of training the original images of the training set for multiple times, performing down-sampling on the original images by preset times before each training, sequencing and comparing the scaling coefficients of the batch normalization layer after each training, and removing the previous layer of convolution kernels corresponding to the channels of which the scaling coefficients are smaller than the preset scaling threshold value to realize pruning. The target detection network model is further pruned on the basis of the existing micro recognition model, the scale of ultra-lightweight picture defogging and the recognition network model is greatly reduced, and the target detection network model can be deployed on an end-side platform with limited calculation capacity and power consumption resources.

Description

Ultra-lightweight image dehazing and identification network model, image dehazing and identification method

技术领域technical field

本申请涉及图片处理技术领域，尤其涉及超轻量级图片去雾及识别网络模型、图片去雾及识别方法。The present application relates to the technical field of image processing, and in particular, to an ultra-lightweight image dehazing and identification network model, and an image dehazing and identification method.

背景技术Background technique

雾霾天气将降低摄像机采集到的图片清晰度，导致计算机难以识别图像中的物体特征。因此在对图像进行识别之前，需先对图片进行去雾处理。目前均可通过神经网络模型来实现图片的去雾处理以及识别处理。The hazy weather will reduce the clarity of the pictures captured by the camera, making it difficult for the computer to recognize the features of the objects in the images. Therefore, before the image is recognized, the image needs to be dehazed. At present, the dehazing and recognition processing of pictures can be achieved through neural network models.

针对去雾处理，常用的神经网络模型为生成式对抗网络(GenerativeAdversarial Networks，GAN)，该网络模型采用无监督式学习方法，通过大量的训练数据，学习有雾图片和清晰图片的映射关系，进而对雾图实现去雾。针对识别处理，常用的神经网络模型为YOLO(You Only Look Once)目标检测模型，该网络模型将输入图像划分为S*S个网格，每个网格负责检测中心落在自身内部的物体，通过在输入图像的多个位置上直接回归出目标的回归框以及预测类别，实现对图片物体特征的识别。For dehazing processing, the commonly used neural network model is Generative Adversarial Networks (GAN), which adopts an unsupervised learning method to learn the mapping relationship between foggy pictures and clear pictures through a large amount of training data, and then Dehaze the fog map. For recognition processing, the commonly used neural network model is the YOLO (You Only Look Once) target detection model. The network model divides the input image into S*S grids, and each grid is responsible for detecting objects whose center falls within itself. By directly regressing the regression frame of the target and predicting the category at multiple positions of the input image, the recognition of the feature of the picture object is realized.

目前的YOLO目标检测模型比较大，在将其应用到图片去雾及识别过程中所得到的神经网络模型，不适合部署在算力和功耗资源有限的端侧平台，比如手机或者自动驾驶相关设备上。The current YOLO target detection model is relatively large, and the neural network model obtained in the process of applying it to image dehazing and recognition is not suitable for deployment on end-side platforms with limited computing power and power consumption resources, such as mobile phones or autonomous driving related on the device.

发明内容SUMMARY OF THE INVENTION

为了解决目前的YOLO目标检测模型比较大，在将其应用到图片去雾及识别过程中所得到的神经网络模型，不适合部署在算力和功耗资源有限的端侧平台的问题，本申请通过以下实施例公开了超轻量级图片去雾及识别网络模型、图片去雾及识别方法。In order to solve the problem that the current YOLO target detection model is relatively large, the neural network model obtained in the process of applying it to image dehazing and recognition is not suitable for deployment on end-side platforms with limited computing power and power consumption resources. The following embodiments disclose ultra-lightweight image dehazing and identification network models, and image dehazing and identification methods.

本申请第一方面公开了一种超轻量级图片去雾及识别网络模型，包括：依次相接的双向GAN网络模型以及目标检测网络模型；A first aspect of the present application discloses an ultra-lightweight image dehazing and recognition network model, including: a bidirectional GAN network model and a target detection network model connected in sequence;

所述双向GAN网络模型用于对输入的待去雾图片进行处理，并输出清晰图片，所述目标检测网络模型用于对所述清晰图进行特征识别处理；The two-way GAN network model is used to process the input image to be dehazed and output a clear image, and the target detection network model is used to perform feature recognition processing on the clear image;

所述目标检测网络模型为经过行剪枝重训练的Yolo-Tiny-S网络模型；所述行剪枝重训练过程中，对目标检测网络模型训练集中的原始图像进行多次训练，每次训练之前，对所述原始图像进行预设倍数的降采样，每次训练完成之后，针对批归一化层的缩放系数进行排序比较，将其中缩放系数小于预设缩放阈值的通道对应的前一层卷积核去掉，实现剪枝；The target detection network model is a Yolo-Tiny-S network model that has undergone row pruning and retraining; in the row pruning and retraining process, the original images in the training set of the target detection network model are trained multiple times, and each training Before, the original image is down-sampled by a preset multiple, and after each training is completed, the scaling coefficients of the batch normalization layer are sorted and compared, and the previous layer corresponding to the channel whose scaling coefficient is smaller than the preset scaling threshold is sorted and compared. The convolution kernel is removed to realize pruning;

所述目标检测网络模型包括特征直接处理模块以及特征融合处理模块，所述特征直接处理模块包括依次相连的前部提取单元、中部提取单元以及第一后部提取单元，所述清晰图通过所述前部提取单元输入至所述目标检测网络模型中，所述特征融合处理模块包括依次相连的特征融合拼接单元以及第二后部提取单元，所述前部提取单元与所述中部提取单元的输出端均接至所述特征融合拼接单元的输入端。The target detection network model includes a feature direct processing module and a feature fusion processing module. The feature direct processing module includes a front extraction unit, a middle extraction unit and a first rear extraction unit that are connected in sequence. The front extraction unit is input into the target detection network model, the feature fusion processing module includes a feature fusion splicing unit and a second rear extraction unit that are connected in sequence, and the outputs of the front extraction unit and the middle extraction unit are The terminals are all connected to the input terminal of the feature fusion splicing unit.

可选的，所述前部提取单元包括依次连接的一个DBL-S组合子单元及多个MDBL-S组合子单元；Optionally, the front extraction unit includes a DBL-S combination subunit and a plurality of MDBL-S combination subunits connected in turn;

所述中部提取单元包括依次连接的多个所述MDBL-S组合子单元及一个所述DBL-S组合子单元；The middle extraction unit includes a plurality of the MDBL-S combination subunits and one of the DBL-S combination subunits connected in sequence;

所述第一后部提取单元及所述第二后部提取单元均包括依次连接的一个所述DBL-S组合子单元及一个卷积子单元；The first rear extraction unit and the second rear extraction unit include a DBL-S combination subunit and a convolution subunit connected in sequence;

所述特征融合拼接单元包括上采样子单元和特征拼接子单元，所述上采样子单元的输入端与所述所述中部提取单元的输出端相连，所述上采样子单元的输出端与所述前部提取单元的输出端接至所述特征拼接子单元的输入端；The feature fusion and splicing unit includes an upsampling subunit and a feature splicing subunit, the input end of the upsampling subunit is connected with the output end of the middle extraction unit, and the output end of the upsampling subunit is connected to the The output end of the front extraction unit is connected to the input end of the feature splicing subunit;

所述DBL-S组合子单元由依次连接的暗网络卷积层、批归一化层及泄露修正线性层组成；The DBL-S combination subunit is composed of a dark network convolution layer, a batch normalization layer and a leakage correction linear layer connected in sequence;

所述MDBL-S组合子单元由依次连接的最大池化层及所述DBL-S组合子单元组成。The MDBL-S combination subunit is composed of the sequentially connected max pooling layer and the DBL-S combination subunit.

可选的，所述双向GAN网络模型包括输入模块、生成模块及判别模块；Optionally, the bidirectional GAN network model includes an input module, a generation module and a discrimination module;

所述输入模块包括清晰图输入端口及雾图输入端口，所述生成模块包括第一生成单元及第二生成单元，所述判别模块包括第一判别器及第二判别器；The input module includes a clear image input port and a fog image input port, the generation module includes a first generation unit and a second generation unit, and the discrimination module includes a first discriminator and a second discriminator;

所述清晰图输入端口、所述第一生成单元及所述第一判别器依次连接，用于针对清晰图进行特征提取及重建，所述雾图输入端口、所述第二生成单元及所述第二判别器依次连接，用于针对雾图进行特征提取及重建；The clear image input port, the first generating unit and the first discriminator are connected in sequence for feature extraction and reconstruction for the clear image, the fog image input port, the second generating unit and the The second discriminators are connected in sequence, and are used for feature extraction and reconstruction for the fog map;

所述第一生成单元包括依次连接的第一编码器、共享潜在空间及第一解码器，所述第二生成单元包括依次连接的第二编码器、所述共享潜在空间及第二解码器；The first generation unit includes a first encoder, a shared latent space, and a first decoder connected in sequence, and the second generation unit includes a second encoder, the shared latent space, and a second decoder connected in sequence;

所述共享潜在空间用于存储高层特征，并将所述高层特征输出至所述第一解码器及所述第二解码器，所述高层特征包括所述第一编码器针对清晰图提取的高层特征以及所述第二编码器针对雾图提取的高层特征；The shared latent space is used for storing high-level features, and outputting the high-level features to the first decoder and the second decoder, the high-level features including the high-level features extracted by the first encoder for sharp images features and high-level features extracted by the second encoder for the haze image;

所述双向GAN网络模型的训练集及验证集中均包括成对的清晰图及雾图。Both the training set and the verification set of the bidirectional GAN network model include pairs of clear images and fog images.

可选的，所述第一编码器及所述第二编码器分别包括依次连接的第一卷积块、第二卷积块及第一耦合残差块；Optionally, the first encoder and the second encoder respectively include a first convolution block, a second convolution block and a first coupled residual block connected in sequence;

所述第一解码器及所述第二解码器分别包括依次连接的第二耦合残差块、第一步长卷积块及第二步长卷积块；The first decoder and the second decoder respectively include a second coupled residual block, a first-sized convolutional block and a second-sized convolutional block connected in sequence;

所述第一耦合残差块的输出端接至所述共享潜在空间的输入端，所述共享潜在空间的输出端接至所述第二耦合残差块的输入端；The output of the first coupled residual block is connected to the input of the shared latent space, and the output of the shared latent space is connected to the input of the second coupled residual block;

所述第一卷积块的输出端跳变连接至所述第二步长卷积块的输入端；The output terminal of the first convolution block is jump connected to the input terminal of the second step size convolution block;

所述第二卷积块的输出端跳变连接至所述第一步长卷积块的输入端。The output terminal of the second convolution block is jump connected to the input terminal of the first length convolution block.

可选的，所述第一耦合残差块及所述第二耦合残差块分别由多个子残差块级联组成；Optionally, the first coupled residual block and the second coupled residual block are respectively formed by concatenating multiple sub-residual blocks;

任一级所述子残差块包括依次连接的第一卷积层、激活函数层及第二卷积层，用于处理上一级子残差块的输出结果以及上上一级子残差块的输出结果，并把处理结果输出至下一级子残差块以及下下一级子残差块。The sub-residual block at any stage includes a first convolution layer, an activation function layer and a second convolution layer that are connected in sequence, and are used to process the output results of the sub-residual block of the previous stage and the sub-residual of the previous stage. The output result of the block, and the processing result is output to the next-level sub-residual block and the next-level sub-residual block.

可选的，所述第一判别器及所述第二判别器均包括三层判别网络及一层粒度判别网络，任一层所述判别网络包含三个卷积层和一个激活函数层。Optionally, both the first discriminator and the second discriminator include three layers of discriminant networks and one layer of granularity discriminant networks, and any layer of the discriminant network includes three convolution layers and one activation function layer.

可选的，所述双向GAN网络模型通过以下损失函数进行优化：生成对抗损失函数、MSE损失函数及总变分损失函数。Optionally, the bidirectional GAN network model is optimized by the following loss functions: a generative adversarial loss function, an MSE loss function, and a total variational loss function.

本申请第二方面公开了一种图片去雾及识别方法，所述方法包括：A second aspect of the present application discloses a method for image dehazing and identification, the method comprising:

提取待去雾图片中不同像素对应的深度信息；Extract the depth information corresponding to different pixels in the image to be dehazed;

根据所述深度信息，对所述待去雾图片进行深度处理；performing depth processing on the picture to be dehazed according to the depth information;

将深度处理后的所述待去雾图片输入至预先构建的超轻量级图片去雾及识别网络模型中；Input the deeply processed image to be dehazed into a pre-built ultra-lightweight image dehazing and recognition network model;

获取所述超轻量级图片去雾及识别网络模型输出的图片去雾及识别结果。Obtain the image dehazing and recognition results output by the ultra-lightweight image dehazing and recognition network model.

可选的，所述提取待去雾图片中不同像素对应的深度信息，包括：Optionally, the extracting depth information corresponding to different pixels in the image to be dehazed includes:

对所述待去雾图片进行格式转换，提取出所述待去雾图片的亮度和饱和度；Perform format conversion on the picture to be dehazed, and extract the brightness and saturation of the picture to be dehazed;

根据所述亮度和饱和度，生成所述待去雾图片中不同像素对应的深度信息。According to the brightness and saturation, depth information corresponding to different pixels in the image to be dehazed is generated.

本申请第三方面公开了一种计算机设备，包括：A third aspect of the present application discloses a computer device, comprising:

存储器，用于存储计算机程序；memory for storing computer programs;

处理器，用于执行所述计算机程序时实现如本申请第二方面所述的图片去雾及识别方法的步骤。The processor is configured to implement the steps of the image dehazing and identification method according to the second aspect of the present application when executing the computer program.

本申请第四方面公开了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理执行时实现如本申请第二方面所述的图片去雾及识别方法的步骤。A fourth aspect of the present application discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is processed and executed, the image dehazing and Identify the steps of the method.

本申请公开了超轻量级图片去雾及识别网络模型，并通过该网络模型实现了图片去雾及识别，该网络模型由依次相接的双向GAN网络模型以及目标检测网络模型组成。其中的双向GAN网络模型用于对输入的待去雾图片进行处理，并输出清晰图片，目标检测网络模型用于对所述清晰图进行特征识别处理。目标检测网络模型为经过行剪枝重训练的Yolo-Tiny-S网络模型，行剪枝重训练过程中，对目标检测网络模型训练集中的原始图像进行多次训练，每次训练之前，对原始图像进行预设倍数的降采样，每次训练完成之后，针对批归一化层的缩放系数进行排序比较，将其中缩放系数小于预设缩放阈值的通道对应的前一层卷积核去掉，实现剪枝。本申请公开的目标检测网络模型在目前微型识别模型Yolov3-tiny的基础上做了进一步剪枝，极大减小超轻量级图片去雾及识别网络模型的规模，可以方便的部署在算力和功耗资源有限的端侧平台或者车载摄像头等移动端的芯片中，进而完成高速目标检测，更具有实用性。The present application discloses an ultra-lightweight image dehazing and recognition network model, and realizes image dehazing and recognition through the network model. The network model consists of a bidirectional GAN network model and a target detection network model that are connected in sequence. The bidirectional GAN network model is used to process the input image to be dehazed and output a clear image, and the target detection network model is used to perform feature recognition processing on the clear image. The target detection network model is the Yolo-Tiny-S network model that has undergone row pruning and retraining. During the row pruning and retraining process, the original images in the training set of the target detection network model are trained multiple times. The image is down-sampled by a preset multiple. After each training is completed, the scaling coefficients of the batch normalization layer are sorted and compared, and the convolution kernel of the previous layer corresponding to the channel whose scaling coefficient is less than the preset scaling threshold is removed to realize Pruning. The target detection network model disclosed in this application is further pruned on the basis of the current micro-recognition model Yolov3-tiny, which greatly reduces the scale of the ultra-lightweight image dehazing and recognition network model, and can be easily deployed in computing power It is more practical to complete high-speed target detection by integrating it into mobile-end chips such as end-side platforms or vehicle-mounted cameras with limited power consumption resources.

附图说明Description of drawings

为了更清楚地说明本申请的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the present application more clearly, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, for those of ordinary skill in the art, without creative work, the Additional drawings can be obtained from these drawings.

图1为本申请实施例公开的超轻量级图片去雾及识别网络模型的结构示意图；1 is a schematic structural diagram of an ultra-lightweight image dehazing and identification network model disclosed in an embodiment of the present application;

图2为本申请实施例公开的目标检测网络模型的结构示意图；2 is a schematic structural diagram of a target detection network model disclosed in an embodiment of the present application;

图3为本申请实施例公开的目标检测网络模型的又一种结构示意图；FIG. 3 is another schematic structural diagram of a target detection network model disclosed in an embodiment of the present application;

图4为现有的去雾卷积神经网络的结构示意图；4 is a schematic structural diagram of an existing dehazing convolutional neural network;

图5为本申请实施例公开的一种双向GAN网络模型的结构示意图；5 is a schematic structural diagram of a bidirectional GAN network model disclosed in an embodiment of the application;

图6为本申请实施例公开的又一种双向GAN网络模型的结构示意图；6 is a schematic structural diagram of another bidirectional GAN network model disclosed in an embodiment of the present application;

图7为本申请实施例公开的一种双向GAN网络模型中，共享潜在空间的示意图；7 is a schematic diagram of a shared latent space in a bidirectional GAN network model disclosed in an embodiment of the present application;

图8为本申请实施例公开的一种双向GAN网络模型中，第一生成单元和第二生成单元的结构示意图；8 is a schematic structural diagram of a first generation unit and a second generation unit in a bidirectional GAN network model disclosed in an embodiment of the present application;

图9为本申请实施例公开的一种双向GAN网络模型中，第一耦合残差块和第二耦合残差块的结构示意图；9 is a schematic structural diagram of a first coupled residual block and a second coupled residual block in a bidirectional GAN network model disclosed in an embodiment of the present application;

图10为本申请实施例公开的一种双向GAN网络模型中，第一判别器和第二判别器的结构示意图；10 is a schematic structural diagram of a first discriminator and a second discriminator in a bidirectional GAN network model disclosed in an embodiment of the application;

图11为本申请实施例公开的一种双向GAN网络模型计算跨域转换一致性的示意图；11 is a schematic diagram of calculating cross-domain transformation consistency in a bidirectional GAN network model disclosed in an embodiment of the application;

图12为本申请实施例公开的一种双向GAN网络模型的应用示意图；12 is a schematic diagram of the application of a bidirectional GAN network model disclosed in an embodiment of the application;

图13为本申请实施例公开的一种图片去雾及识别方法的工作流程示意图。FIG. 13 is a schematic workflow diagram of a method for image dehazing and identification disclosed in an embodiment of the present application.

具体实施方式Detailed ways

参见图1，本申请第一实施例公开的一种超轻量级图片去雾及识别网络模型包括：依次相接的双向GAN网络模型以及目标检测网络模型。Referring to FIG. 1 , an ultra-lightweight image dehazing and recognition network model disclosed in the first embodiment of the present application includes: a bidirectional GAN network model and a target detection network model that are connected in sequence.

所述双向GAN网络模型用于对输入的待去雾图片进行处理，并输出清晰图片，即去雾图片。The two-way GAN network model is used to process the input image to be dehazed, and output a clear image, that is, a dehazed image.

所述目标检测网络模型用于对所述清晰图进行特征识别处理。所述目标检测网络模型为经过行剪枝重训练的Yolo-Tiny-S网络模型。所述行剪枝重训练过程中，对目标检测网络模型训练集中的原始图像进行多次训练，每次训练之前，对所述原始图像进行预设倍数的降采样，每次训练完成之后，针对批归一化层的缩放系数进行排序比较，将其中缩放系数小于预设缩放阈值的通道对应的前一层卷积核去掉，实现剪枝。The target detection network model is used to perform feature recognition processing on the clear image. The target detection network model is a Yolo-Tiny-S network model retrained by row pruning. In the process of row pruning and retraining, the original images in the target detection network model training set are trained for multiple times. Before each training, the original images are down-sampled by a preset multiple. The scaling coefficients of the batch normalization layer are sorted and compared, and the convolution kernel of the previous layer corresponding to the channel whose scaling coefficient is smaller than the preset scaling threshold is removed to realize pruning.

本实施例中，为了使得目标检测网络模型可以更好地完成快速检测，对模型的尺寸进行压缩并缩小输入图片大小，但压缩过程中需要保证模型的收敛性能，因而采用行剪枝重训练方法进行模型训练，具体过程包括：In this embodiment, in order to enable the target detection network model to better complete rapid detection, the size of the model is compressed and the size of the input image is reduced, but the convergence performance of the model needs to be guaranteed during the compression process, so the row pruning and retraining method is adopted. Model training, the specific process includes:

第一步，对训练集中的原始图像直接做训练，此时原始图像的分辨率较大，训练过程中使用数据增强，并采用较大的缩放比，来制造不同大小的训练图片。The first step is to directly train the original images in the training set. At this time, the resolution of the original images is larger, and data enhancement is used during the training process, and a larger scaling ratio is used to create training images of different sizes.

第二步，针对训练后批归一化层的缩放系数进行排序比较，对于其中缩放系数较小的通道以及压缩需求进行通道的剪枝。具体来说，将其中缩放系数较小的通道对应的前一层卷积核去掉，仅保留批归一化层中缩放系数较大的通道对应的卷积核。The second step is to sort and compare the scaling coefficients of the batch normalization layer after training, and prune the channels with smaller scaling coefficients and compression requirements. Specifically, the convolution kernels of the previous layer corresponding to the channels with smaller scaling coefficients are removed, and only the convolution kernels corresponding to the channels with larger scaling coefficients in the batch normalization layer are retained.

第三步，将原始图像做二倍降采样，重复第一步及第二步以再次进行训练。训练过程中减小缩放比对原图做图像增强，避免小图缩放过程中使得某些物体分辨率过低影响模型收敛。需要注意的是，在对批归一化层排序比较剪枝通道过程中，提升剪枝通道的比例，进一步进行通道缩减。The third step is to downsample the original image by a factor of two, and repeat the first and second steps to train again. During the training process, reduce the zoom ratio to perform image enhancement on the original image, so as to avoid the low resolution of some objects during the zooming process of the small image, which will affect the convergence of the model. It should be noted that in the process of sorting and comparing the pruning channels of the batch normalization layer, the proportion of pruning channels is increased, and the channel is further reduced.

第四步，将原始图像做更高倍数降采样，重复第一步及第二步以进行再次训练，直到最后的计算量和输入图像大小满足资源和速度的要求。例如：使得原始图像从输入时的图片辨率512*512降至128*128，模型通道数降至目前通用的超轻量级目标检测模型Yolov3-tiny的一半左右。The fourth step is to downsample the original image by a higher multiple, and repeat the first and second steps for retraining until the final calculation amount and input image size meet the requirements of resources and speed. For example, the original image resolution is reduced from 512*512 to 128*128 at the time of input, and the number of model channels is reduced to about half of the current general ultra-lightweight target detection model Yolov3-tiny.

在模型收敛到较好的精度时，将训练好的超轻量级目标检测模型Yolo-Tiny-S接在双向GAN网络模型的输出端口，即可形成去雾识别一体化模型，即超轻量级的端到端图片去雾及识别网络模型，通过该模型，便可以通过端对端的方式对存在的真实雾图进行直接的目标识别。When the model converges to a better accuracy, connect the trained ultra-lightweight target detection model Yolo-Tiny-S to the output port of the bidirectional GAN network model to form an integrated model of dehazing and recognition, that is, ultra-lightweight A high-level end-to-end image dehazing and recognition network model, through which the target recognition can be performed directly on the existing real fog images in an end-to-end manner.

本实施例首次提出一个超轻量级的目标检测模型Yolo-Tiny-S(微小型Yolo检测模型)，和双向GAN网络模型联合完成目标检测，超轻量级的目标检测模型是在轻量级目标检测模型Yolov3-tiny上做了基于批归一化层，针对小输入图像的迭代通道高比例剪枝训练，这一训练方案为本方案首次提出，得到的目标检测网络模型规模远小前人系统中用到的Faster-RCNN，剪枝迭代算法可以有效收敛可对较小图像快速识别的网络，有利于模型在实际应用场景，如车联网、手机或者自动驾驶等资源受限平台下的实时目标检测。In this embodiment, an ultra-lightweight target detection model Yolo-Tiny-S (miniature Yolo detection model) is proposed for the first time, and the two-way GAN network model is combined to complete target detection. The ultra-lightweight target detection model is a lightweight target detection model. The target detection model Yolov3-tiny is based on batch normalization layer, which is trained for iterative channel high-proportion pruning of small input images. This training scheme is first proposed for this scheme, and the scale of the obtained target detection network model is much smaller than that of predecessors. The Faster-RCNN used in the system, the pruning iterative algorithm can effectively converge the network that can quickly identify small images, which is beneficial to the real-time application of the model in practical application scenarios, such as the Internet of Vehicles, mobile phones or autonomous driving and other resource-constrained platforms. Target Detection.

参见图2，所述目标检测网络模型包括特征直接处理模块以及特征融合处理模块，所述特征直接处理模块包括依次相连的前部提取单元、中部提取单元以及第一后部提取单元，所述清晰图通过所述前部提取单元输入至所述目标检测网络模型中。所述特征融合处理模块包括依次相连的特征融合拼接单元以及第二后部提取单元，所述前部提取单元与所述中部提取单元的输出端均接至所述特征融合拼接单元的输入端。Referring to FIG. 2, the target detection network model includes a feature direct processing module and a feature fusion processing module. The feature direct processing module includes a front extraction unit, a middle extraction unit and a first rear extraction unit that are connected in sequence. The clear A graph is input into the target detection network model through the front extraction unit. The feature fusion processing module includes a feature fusion splicing unit and a second rear extraction unit that are connected in sequence, and the output ends of the front extraction unit and the middle extraction unit are both connected to the input end of the feature fusion splicing unit.

参见图3，所述特征融合拼接单元包括上采样子单元和特征拼接子单元，所述上采样子单元的输入端与所述所述中部提取单元的输出端相连，所述上采样子单元的输出端与所述前部提取单元的输出端接至所述特征拼接子单元的输入端。Referring to Fig. 3, the feature fusion splicing unit includes an upsampling subunit and a feature splicing subunit, the input end of the upsampling subunit is connected with the output end of the middle extraction unit, and the upsampling subunit's output end is connected. The output end and the output end of the front extraction unit are connected to the input end of the feature splicing subunit.

所述前部提取单元包括依次连接的一个DBL-S组合子单元及多个MDBL-S组合子单元。本实施例中，前部提取单元包括依次连接的一个DBL-S组合子单元及四个MDBL-S组合子单元。The front extraction unit includes a DBL-S combination sub-unit and a plurality of MDBL-S combination sub-units connected in sequence. In this embodiment, the front extraction unit includes one DBL-S combination subunit and four MDBL-S combination subunits that are connected in sequence.

所述中部提取单元包括依次连接的多个所述MDBL-S组合子单元及一个所述DBL-S组合子单元。本实施例中，前部提取单元包括依次连接的两个MDBL-S组合子单元及一个DBL-S组合子单元。The middle extraction unit includes a plurality of the MDBL-S combination subunits and one of the DBL-S combination subunits connected in sequence. In this embodiment, the front extraction unit includes two MDBL-S combining subunits and one DBL-S combining subunit which are connected in sequence.

所述第一后部提取单元及所述第二后部提取单元均包括依次连接的一个所述DBL-S组合子单元及一个卷积(Conv)子单元。Both the first post extraction unit and the second post extraction unit include one of the DBL-S combining subunits and one convolution (Conv) subunit connected in sequence.

所述DBL-S组合子单元由依次连接的暗网络卷积层(DarkNet卷积层)、批归一化层(BN层)及泄露修正线性层(LeakyRelu层)组成。其中，S指代Slim，用于表示相应模块已被精简过，暗网络卷积层为经过通道精简的暗网络。The DBL-S combination subunit is composed of a dark network convolutional layer (DarkNet convolutional layer), a batch normalization layer (BN layer) and a leak correction linear layer (LeakyRelu layer) connected in sequence. Among them, S refers to Slim, which is used to indicate that the corresponding module has been reduced, and the dark network convolution layer is a channel-reduced dark network.

所述MDBL-S组合子单元由依次连接的最大池化层(MaxPool层)及所述DBL-S组合子单元组成。The MDBL-S combination subunit is composed of a maximum pooling layer (MaxPool layer) and the DBL-S combination subunit connected in sequence.

结合图2及图3，将去雾后的清晰图片作为输入，通过前部提取单元和中部提取单元的组合完成特征提取，然后分别从网络中部和后部获取两不同同尺寸的特征图进行预测。具体的，在一个特征图上的每一个网格预测3个预测框，每个预测框需要(x,y,w,h,confidence)(预测x位置，预测y位置，预测框宽度，预测框高度，预测置信度)五个基本参数，表示预测框的中心点位置，宽高，和有框存在的置信度，并对每个类别输出一个概率。之后，使用特征拼接子单元将网络后部引出的特征图进行上采样后，和网络中部(即前部提取单元的输出端)的特征图进行连接形成一个新的融合特征图。将网络后部(即中部提取单元的输出端)直接产生的特征图和新的融合特征图分别送入第一后部提取单元和第二后部提取单元，产生最后的预测分类和回归框，即可对去雾后的清晰图片完成识别，图3中的y1、y2表示两个作用相同的分支，y1和y2各自都包含预测分类结果和回归框的信息，预测分类结果是指图像检测到的物体是识别出来是哪一类对象，回归框就是表征检测出来对象位置的框。Combined with Figure 2 and Figure 3, the clear image after dehazing is used as input, and the feature extraction is completed through the combination of the front extraction unit and the middle extraction unit, and then two different feature maps of the same size are obtained from the middle and rear of the network for prediction. . Specifically, each grid on a feature map predicts 3 prediction frames, and each prediction frame requires (x, y, w, h, confidence) (predicted x position, predicted y position, predicted frame width, predicted frame Height, prediction confidence) five basic parameters, indicating the center point position, width and height of the prediction frame, and the confidence of the existence of the frame, and output a probability for each category. After that, use the feature splicing subunit to upsample the feature map drawn from the back of the network, and connect it with the feature map in the middle of the network (ie, the output of the front extraction unit) to form a new fusion feature map. The feature map and the new fusion feature map directly generated at the back of the network (ie, the output of the middle extraction unit) are sent to the first and second rear extraction units, respectively, to generate the final prediction classification and regression frame, The clear picture after dehazing can be recognized. In Figure 3, y1 and y2 represent two branches with the same function. Each of y1 and y2 contains the information of the predicted classification result and the regression frame. The predicted classification result refers to the image detected by the image. The object is what type of object is identified, and the regression box is the box that represents the position of the detected object.

针对图片去雾，目前常用的去雾卷积神经网络为生成式对抗网络(GenerativeAdversarial Networks，GAN)，该网络模型采用无监督式学习方法，通过两个模块：生成模块和判别模块的互相博弈学习产生准确率高的输出。其中生成模块包括编码器和解码器，如图4所示，编码器用于提取输入图片中的特征向量，解码器用于从特征向量中还原出低级特征。For image dehazing, the commonly used dehazing convolutional neural network is Generative Adversarial Networks (GAN). Produces highly accurate output. The generation module includes an encoder and a decoder. As shown in Figure 4, the encoder is used to extract the feature vector in the input picture, and the decoder is used to restore the low-level features from the feature vector.

但是，现有技术中，只是通过GAN进行雾图到清晰图的单向转换，因此GAN中仅包含雾域到清晰域的单向映射关系，雾图在经过编码器和解码器处理之后，输出的清晰图中通常会存在光环效应和人工瑕疵，原图片的信息保留不佳。However, in the prior art, only one-way conversion from fog map to clear map is performed through GAN, so GAN only includes the one-way mapping relationship from fog domain to clear domain. After the fog map is processed by the encoder and decoder, the output There are usually halo effects and artificial artifacts in the clear images of the original image, and the information of the original image is not well preserved.

本实施例涉及了一种双向GAN网络模型，参见图5，该网络模型包括输入模块、生成模块及判别模块。This embodiment involves a bidirectional GAN network model, see FIG. 5 , the network model includes an input module, a generation module and a discrimination module.

所述输入模块包括清晰图输入端口及雾图输入端口，所述生成模块包括第一生成单元及第二生成单元，所述判别模块包括第一判别器及第二判别器。The input module includes a clear image input port and a fog image input port, the generation module includes a first generation unit and a second generation unit, and the discrimination module includes a first discriminator and a second discriminator.

所述清晰图输入端口、所述第一生成单元及所述第一判别器依次连接，用于针对清晰图进行特征提取及重建，所述雾图输入端口、所述第二生成单元及所述第二判别器依次连接，用于针对雾图进行特征提取及重建。The clear image input port, the first generating unit and the first discriminator are connected in sequence for feature extraction and reconstruction for the clear image, the fog image input port, the second generating unit and the The second discriminators are connected in sequence, and are used for feature extraction and reconstruction for the fog image.

如图5所示的双向GAN网络模型，可以对雾图进行去雾，同样可以将清晰图片进行反向加雾，增强跨域之间变化的一致性，进而能更好地约束互相之间的转换效果，使得具体应用时去雾图片更加自然，增强真实雾图上的去雾效果。而目前现有的去雾技术中，仅使用GAN做了雾图到清晰图的单向转换，仅能输出雾域到清晰域的映射关系，使得图片的输出存在光环效应和人工瑕疵，最后的原图信息保留不佳。The two-way GAN network model shown in Figure 5 can dehaze the fog image, and can also reverse the fog on the clear image to enhance the consistency of changes across domains, and thus better constrain the mutual The conversion effect makes the dehazing image more natural when applied, and enhances the dehazing effect on the real fog map. However, in the current dehazing technology, only GAN is used to do the one-way conversion from the fog image to the clear image, and only the mapping relationship between the fog domain and the clear domain can be output, so that the output of the image has halo effect and artificial defects. The original image information is not well preserved.

参加图6，本实施例设计的双向GAN网络模型中，所述第一生成单元包括依次连接的第一编码器、共享潜在空间及第一解码器，所述第二生成单元包括依次连接的第二编码器、所述共享潜在空间及第二解码器。Referring to FIG. 6 , in the bidirectional GAN network model designed in this embodiment, the first generation unit includes a first encoder, a shared latent space, and a first decoder connected in sequence, and the second generation unit includes a sequentially connected first encoder. Two encoders, the shared latent space and a second decoder.

所述共享潜在空间用于存储高层特征，并将所述高层特征输出至所述第一解码器及所述第二解码器，所述高层特征包括所述第一编码器针对清晰图提取的高层特征以及所述第二编码器针对雾图提取的高层特征。The shared latent space is used for storing high-level features, and outputting the high-level features to the first decoder and the second decoder, the high-level features including the high-level features extracted by the first encoder for sharp images features and high-level features extracted by the second encoder for the haze image.

本实施例中，假设认为雾图和清晰图之间共享一个潜在空间，那么通过潜在空间可以完成雾图到清晰图的特征连接。参见图7，清晰图片X和雾图Y，它们之间能够通过一个共享潜在空间Z连接，恢复两个域中的图像。基于这一理论基础，本实施例公开的双向GAN网络模型，能够很好的对真实雾图和合成雾图进行处理，通过先变编码后解码的方式，结合双向生成对抗网络对雾图进行恢复重建。所述双向GAN网络模型通过成对的清晰图及雾图完成训练及验证，包含雾域与清晰域的双向映射关系，能够对不同域下的图片进行处理，有效确保图片重建的真实性。In this embodiment, it is assumed that a latent space is shared between the fog map and the clear map, and the feature connection from the fog map to the clear map can be completed through the latent space. Referring to Fig. 7, clear image X and fog image Y, which can be connected by a shared latent space Z, recover images in both domains. Based on this theoretical basis, the bidirectional GAN network model disclosed in this embodiment can process real fog images and synthetic fog images very well, and recover the fog images by combining the bidirectional generative adversarial network with the method of encoding first and then decoding. reconstruction. The two-way GAN network model completes training and verification through paired clear images and fog images, including the two-way mapping relationship between the fog domain and the clear domain, and can process images in different domains, effectively ensuring the authenticity of image reconstruction.

本实施例公开了一种双向GAN网络模型作为去雾网络用以有效处理真实浓雾，该网络结构新颖有效，可以提取不同深度下的雾图的深度信息，在远端和浓雾处理方面有很大的优势，能够大幅提高后续物体识别的准确率。前人方法中由于去雾模型限制，没有办法用真实雾图进行直接训练，所以相应的去雾结果并不能很好地适应实际雾图场景下的检测。双向GAN网络模型可以利用真实图像进行端到端的训练，从而可以更好地利用真实雾图的信息，来辅助现实雾图场景中检测效果的提升，通过真实与合成数据集的联合训练，能够更好地利用真实场景下的雾图信息，避免了缺少成对训练数据而导致的难以进行网络训练的困难，真实信息的学习使后续的目标检测网络模型能够在真实场景下达到良好的泛化效果。This embodiment discloses a bidirectional GAN network model as a dehazing network to effectively deal with real dense fog. The network structure is novel and effective, and it can extract the depth information of fog images at different depths. It is a great advantage, which can greatly improve the accuracy of subsequent object recognition. Due to the limitation of the dehazing model in previous methods, there is no way to directly train with real haze images, so the corresponding dehazing results cannot be well adapted to the detection in the actual haze scene. The two-way GAN network model can use real images for end-to-end training, so that the information of real fog images can be better used to assist the improvement of detection effects in real fog image scenes. Through joint training of real and synthetic data sets, it can be more Make good use of the fog map information in the real scene to avoid the difficulty of network training caused by the lack of paired training data. The learning of real information enables the subsequent target detection network model to achieve good generalization effects in real scenes. .

进一步的，参见图8所示，所述第一编码器及所述第二编码器分别包括依次连接的第一卷积块、第二卷积块及第一耦合残差块。Further, as shown in FIG. 8 , the first encoder and the second encoder respectively include a first convolution block, a second convolution block and a first coupled residual block which are connected in sequence.

所述第一解码器及所述第二解码器分别包括依次连接的第二耦合残差块、第一步长卷积块及第二步长卷积块。The first decoder and the second decoder respectively include a second coupled residual block, a first-sized convolutional block, and a second-sized convolutional block connected in sequence.

所述第一耦合残差块的输出端接至所述共享潜在空间的输入端，所述共享潜在空间的输出端接至所述第二耦合残差块的输入端。The output of the first coupled residual block is connected to the input of the shared latent space, and the output of the shared latent space is connected to the input of the second coupled residual block.

所述第一卷积块的输出端跳变连接至所述第二步长卷积块的输入端。The output of the first convolution block is jump connected to the input of the second stride convolution block.

其中，第一卷积块和第二卷积块用于提取输入图片的高层特征，第一耦合残差块和第二耦合残差块负责学习图片不同特征的细节信息，便于图片的恢复重建。第一步长卷积和第二步长卷积完成了图片由高层特征到输出图片(清晰图或雾图)的重建过程。编码器和解码器通过跳变连接紧密连接。在每一层卷积块之后，会有跳变链接链接到对应的步长卷积块前，跳变连接将编码器的卷积块的输出特征图，链接到对应解码器的输入中，并在维度层面进行拼接，使得解码器输出的特征图能够包括原图的特征信息。跳变连接使得图片的细节纹理能够被更好的习得，通过两层跳变连接在高维度和低维度上的连接，能够对图片不同的位置信息进行单独处理，更进一步确保了输出图片和输入图片的一致性，以及重建的真实性。Among them, the first convolution block and the second convolution block are used to extract high-level features of the input picture, and the first coupled residual block and the second coupled residual block are responsible for learning the detailed information of different features of the picture, so as to facilitate the restoration and reconstruction of the picture. The first step convolution and the second step convolution complete the reconstruction process of the image from high-level features to the output image (clear image or fog image). The encoder and decoder are tightly connected by jump connections. After each layer of convolution block, there will be a jump link linked to the corresponding stride convolution block. Before the jump link, the output feature map of the encoder's convolution block is linked to the input of the corresponding decoder. The splicing is performed at the dimension level, so that the feature map output by the decoder can include the feature information of the original image. The jump connection enables the detailed texture of the picture to be better learned. Through the connection of two layers of jump connections in high and low dimensions, the different position information of the picture can be processed separately, which further ensures that the output picture and The consistency of the input image, and the authenticity of the reconstruction.

进一步的，参见图9，所述第一耦合残差块及所述第二耦合残差块分别由多个子残差块级联组成。Further, referring to FIG. 9 , the first coupled residual block and the second coupled residual block are respectively formed by concatenating multiple sub-residual blocks.

耦合残差块能够增强网络处理信息的能力，更好的拟合深层网络潜在恒等映射，本实施例中，每个子残差块由两个卷积层和一个激活函数层构成，将多个(图9示出了三个)子残差块的输出和输入相互级联，每一级子残差块分别处理本级的输入和上一级的低维输入结果，以提取不同维度的特征信息，并将计算结果分别送入下一级子残差块和下下一级子残差块。通过多层子残差块的紧密连接，可习得不同维度雾特征的残差信息。The coupled residual block can enhance the ability of the network to process information and better fit the potential identity mapping of the deep network. In this embodiment, each sub-residual block is composed of two convolutional layers and one activation function layer. (Fig. 9 shows that the outputs and inputs of the three) sub-residual blocks are cascaded with each other, and each sub-residual block processes the input of the current stage and the low-dimensional input results of the previous stage respectively to extract features of different dimensions information, and send the calculation results to the next-level sub-residual block and the next-level sub-residual block respectively. Through the tight connection of multi-layer sub-residual blocks, the residual information of fog features of different dimensions can be learned.

经过编码器和解码器的编码和解码，得到生成图片，将生成图片输入至判别模块进行判别。参见图10，所述第一判别器及所述第二判别器分别包括三层判别网络，任一层所述判别网络包含三个卷积层和一层激活函数，每一层判别网络对上一级输入进行降采样，将三层判别网络不同粒度输出结果和真实的特征图比较，判断网络输出是否符合预期，对不符合预期的训练结果施加惩罚项，计算判别损失，回传到网络中更新参数。采用多级判别网络，将双向GAN网络模型产生的输出结果经过三层特征采样和比较，进行粗粒度和细粒度的比较，扩大了判别模块的判别准确率。After the encoding and decoding of the encoder and the decoder, the generated picture is obtained, and the generated picture is input to the discrimination module for discrimination. Referring to FIG. 10 , the first discriminator and the second discriminator respectively include three-layer discriminant networks, and the discriminant network at any layer includes three convolutional layers and a layer of activation functions, and each layer of the discriminant network is on the top The first-level input is down-sampled, and the output results of different granularities of the three-layer discriminant network are compared with the real feature maps to determine whether the network output meets expectations, impose a penalty term on the training results that do not meet expectations, calculate the discriminant loss, and return it to the network. Update parameters. Using a multi-level discriminant network, the output results generated by the bidirectional GAN network model are sampled and compared with three layers of features, and the coarse-grained and fine-grained comparisons are carried out, which expands the discrimination accuracy of the discriminant module.

针对双向GAN网络模型，本实施例采用损失函数，辅助模型能够得到更好的表达，优化的损失函数包括生成对抗损失函数、MSE损失函数及总变分损失函数。For the bidirectional GAN network model, a loss function is used in this embodiment, and the auxiliary model can be better expressed. The optimized loss function includes a generative adversarial loss function, an MSE loss function, and a total variational loss function.

为了达到良好的训练效果，网络需要损失函数都包含生成对抗损失,生成对抗损失来源于图片是来自生成器G(I)还是真实的数据J。经典的生成对抗损失定义如下：In order to achieve a good training effect, the network needs the loss function to contain a generative adversarial loss, and the generative adversarial loss comes from whether the image comes from the generator G(I) or the real data J. The classic generative adversarial loss is defined as follows:

L_GAN(G,D)＝E_I(log D(J))+E_I(log(1-D(G(I)))；L _GAN (G, D)=E _I (log D(J))+E _I (log(1-D(G(I)));

结合图11，双向GAN网络模型通过双向生成对抗损失，计算跨域转换一致性如下：Combined with Figure 11, the bidirectional GAN network model generates the adversarial loss in both directions, and the cross-domain transformation consistency is calculated as follows:

L_adv＝L_GAN(G_c(I_c)，Dis_h)+L_GAN(G_h(I_h)，Dis_c)； _{La adv} =L _GAN (G _c (I _c ), Dis _h )+L _GAN (G _h (I _h ), _Disc );

其中，LGAN()表示经典对抗生成损失，Ih表示输入的雾图X，Ic表示输入的清晰图Y，Gc()清晰场景下对应的生成网络，Gh()表示有雾场景下对应的生成网络，Disc表示清晰场景下的判别模块，Dish表示有雾场景下的判别模块。通过共享潜在空间理论，结合双向生成对抗损失，能保证重建图片的真实性以及跨域间图片转换的结果的相似性。Among them, LGAN() represents the classical adversarial generation loss, Ih represents the input fog image X, Ic represents the input clear image Y, Gc() corresponds to the generation network in the clear scene, and Gh() represents the corresponding generation network in the foggy scene. , Disc represents the discrimination module in clear scenes, and Dish represents the discrimination module in foggy scenes. By sharing the latent space theory, combined with the bidirectional generative adversarial loss, the authenticity of the reconstructed image and the similarity of the results of cross-domain image transformation can be guaranteed.

在合成雾图的训练中，采用MSE损失函数，确保预测出来的图片G(I)和真实图片(ground truth)J相似。MSE损失函数如下所示：In the training of the synthetic fog image, the MSE loss function is used to ensure that the predicted image G(I) is similar to the ground truth J. The MSE loss function looks like this:

L_MSE＝||G_c(I_c)-J_h||₁+||G_h(I_h)-J_c||₁；L _MSE =||G _c (I _c )-J _h || ₁ +||G _h (I _h )-J _c || ₁ ;

其中，Jh和Jc分别表示在真实场景下的雾图数据和清晰图数据，损失定为生成雾(清晰)图和真实雾(清晰)图的像素的差异||()||1表示图片上对应像素点的误差的平方和的均值(均方误差)。Among them, Jh and Jc represent the fog map data and clear map data in the real scene, respectively, and the loss is determined as the difference between the pixels of the fog (clear) map and the real fog (clear) map ||()||1 represents the image on the image The mean of the squared sum of the errors for the corresponding pixel point (mean squared error).

雾图的训练中采用总变分损失函数来消除相关人工缺陷，使图片视觉效果更好，并且保存图片的纹理和细节。总变分损失函数如下：The total variational loss function is used in the training of the fog image to eliminate the related artificial defects, make the image visual effect better, and preserve the texture and details of the image. The total variational loss function is as follows:

其中，

和

表示水平和垂直的梯度差。in,

and

Represents the horizontal and vertical gradient difference.

通过上述的训练，在模型收敛到较好的精度时，便可以保存得到的训练权值，进而得到最终的双向GAN去雾网络模型，此时的模型便可以通过端对端的方式对存在的雾图进行去雾。Through the above training, when the model converges to a better accuracy, the obtained training weights can be saved, and then the final two-way GAN dehazing network model can be obtained. Figure to dehaze.

参见图12，将本实施例公开的双向GAN网络模型部署到实际应用场景中时，去雾简要步骤如下：Referring to FIG. 12 , when the bidirectional GAN network model disclosed in this embodiment is deployed in an actual application scenario, the brief steps for dehazing are as follows:

1、输入真实场景下的真实雾图。1. Input the real fog map in the real scene.

2、对图片进行预处理，从RGB空间转换到HSV空间，结合已知参数，输出图片的深度信息。2. Preprocess the image, convert from RGB space to HSV space, and combine known parameters to output the depth information of the image.

3、结合深度信息，处理雾图，对雾图解码存入共享潜在空间，对共享潜在空间的图片进行编码，生成真实场景下的去雾清晰图片。3. Combine the depth information, process the fog map, decode the fog map and store it in the shared latent space, encode the pictures in the shared latent space, and generate clear and dehazed pictures in the real scene.

为了增强跨域转换的一致性，得到更加真实的去雾图片，在现有的单向GAN网络进行了改进，基于同一个共享潜在空间，将清晰图→潜在空间→雾图和雾图→潜在空间→清晰图两个单向的GAN网络，通过两组编码器和解码器融合在一个模块中，形成了能够生成雾/清晰图的双向GAN网络模型，通过对应的判别器计算雾图→清晰图→雾图以及清晰图→雾图→清晰图的跨域转化损失，能够更好习得雾和清晰图的分布情况。In order to enhance the consistency of cross-domain transformation and obtain more realistic dehazing images, the existing one-way GAN network is improved. Based on the same shared latent space, clear map→latent space→fog map and fog map→latent Space→Clear Map Two one-way GAN networks are combined in one module through two sets of encoders and decoders, forming a two-way GAN network model that can generate fog/clear map, and calculate fog map through corresponding discriminator→clear The cross-domain transformation loss of graph→fog graph and clear graph→fog graph→clear graph can better learn the distribution of fog and clear graph.

本实施例公开的双向GAN网络模型，基于共享潜在空间假设，提出了先编码后解码的双向去雾网络，这个网络能够对不同域下的图片进行处理，对跨域之间的图片转换有很强的适应性，能够处理不同深度下的雾信息，并且在远端和浓雾处理有很强的优势。应用了跳变链接和耦合残差块以及多层判别结构，大大提高了网络的鲁棒性，能够更精确的拟合去雾清晰图的图像分布。可以使用真实雾信息，经过预处理能够提取深度信息，方便处理深层次浓度下的雾，并且在真实场景下有很好地表达。The bidirectional GAN network model disclosed in this embodiment, based on the shared latent space assumption, proposes a bidirectional dehazing network that encodes first and then decodes. This network can process images in different domains, and is very useful for image conversion across domains. Strong adaptability, capable of processing fog information at different depths, and has strong advantages in remote and dense fog processing. The jump link and coupling residual block and multi-layer discriminant structure are applied, which greatly improves the robustness of the network and can more accurately fit the image distribution of the dehazing clear map. Real fog information can be used, and depth information can be extracted after preprocessing, which is convenient for dealing with fog in deep concentration, and can be well expressed in real scenes.

利用上述实施例公开的双向GAN网络模型，可执行对图片的去雾处理，实际应用中，首先通过提取得到的待去雾图片中不同像素对应的深度信息，对待去雾图片进行深度处理；将深度处理后的待去雾图片输入至预先构建的双向GAN网络模型中，便可获取双向GAN网络模型输出的去雾图片。Using the bidirectional GAN network model disclosed in the above-mentioned embodiment, the image dehazing process can be performed. In practical application, the depth information corresponding to different pixels in the image to be dehazed is firstly extracted, and the depth processing of the image to be dehazed is performed; The deeply processed image to be dehazed is input into the pre-built bidirectional GAN network model, and the dehazed image output by the bidirectional GAN network model can be obtained.

本申请公开了超轻量级图片去雾及识别网络模型，并通过该网络模型实现了图片去雾及识别，该网络模型由依次相接的双向GAN网络模型以及目标检测网络模型组成。其中的目标检测网络模型在目前微型识别模型Yolov3-tiny的基础上做了进一步剪枝，极大减小超轻量级图片去雾及识别网络模型的规模，可以方便的部署在算力和功耗资源有限的端侧平台或者车载摄像头等移动端的芯片中，进而完成高速目标检测，更具有实用性。The present application discloses an ultra-lightweight image dehazing and recognition network model, and realizes image dehazing and recognition through the network model. The network model consists of a bidirectional GAN network model and a target detection network model that are connected in sequence. The target detection network model is further pruned on the basis of the current micro-recognition model Yolov3-tiny, which greatly reduces the scale of the ultra-lightweight image dehazing and recognition network model, and can be easily deployed in computing power and power. It is more practical to complete high-speed target detection in terminal-side platforms with limited resource consumption or chips on mobile terminals such as vehicle-mounted cameras.

本申请公开的超轻量级图片去雾及识别网络模型包括去雾和识别两个模块，去雾模块采用了双向GAN网络模型，识别模块采用Yolo-Tiny-S作为目标检测网络模型，这两个模块可以在各自独立训练的基础上直接连接，以进行识别率较高、速度较快的雾图场景目标检测，避免了联合训练过程中，标注用于目标检测任务的真实雾图数据集的巨大人力和时间消耗。由于去雾模块已能够结合真实雾图的深度信息，学习到雾的真实分布情况，前序去雾模块和物体识别网络的连接后并不需要真实雾图下目标检测数据集(这一数据集目前是缺失的)的联合训练，而是可以直接将分别训练好的去雾模块和识别模块相连接，即可形成在真实场景下去雾识别一体化工作系统。The ultra-lightweight image dehazing and identification network model disclosed in this application includes two modules: dehazing and identification. The dehazing module adopts a bidirectional GAN network model, and the identification module adopts Yolo-Tiny-S as the target detection network model. Each module can be directly connected on the basis of their independent training to perform target detection in haze scenes with high recognition rate and speed, avoiding the need to mark real haze data sets for target detection tasks during the joint training process. Huge manpower and time consumption. Since the dehazing module has been able to combine the depth information of the real haze map to learn the real distribution of the fog, the connection between the pre-order dehazing module and the object recognition network does not require the target detection data set under the real haze map (this data set It is currently missing) joint training, but the separately trained dehazing module and recognition module can be directly connected to form an integrated working system for dehazing and recognition in real scenes.

参见图13，本申请第二实施例公开了一种图片去雾及识别方法，所述方法包括：Referring to FIG. 13 , the second embodiment of the present application discloses a method for image dehazing and identification, and the method includes:

步骤S11，提取待去雾图片中不同像素对应的深度信息。Step S11, extracting depth information corresponding to different pixels in the image to be dehazed.

步骤S12，根据所述深度信息，对所述待去雾图片进行深度处理。Step S12: Perform depth processing on the picture to be dehazed according to the depth information.

具体的，对所述待去雾图片进行格式转换，基于OpenCV，将待去雾图片从RGB格式转换到HSV格式，从而提取出所述待去雾图片的亮度v和饱和度s。其中，OpenCV是一个基于BSD许可(开源)发行的跨平台计算机视觉和机器学习软件库。具体的，在阴影检测算法中经常将RGB格式的图像转化为HSV格式，对于阴影区域而言，它的色度和饱和度相对于原图像变化不大，主要是亮度信息变化较大，将RGB格式转化为HSV格式，就可以得到H、S、V分量，从而得到色度、饱和度、亮度的值。Specifically, format conversion is performed on the image to be dehazed, and based on OpenCV, the image to be dehazed is converted from RGB format to HSV format, so as to extract the brightness v and saturation s of the image to be dehazed. Among them, OpenCV is a cross-platform computer vision and machine learning software library based on BSD license (open source). Specifically, in the shadow detection algorithm, the image in RGB format is often converted into HSV format. For the shadow area, its chroma and saturation do not change much compared to the original image, mainly because the brightness information changes greatly. The format is converted to HSV format, and the H, S, and V components can be obtained, thereby obtaining the values of chroma, saturation, and brightness.

根据所述亮度v和饱和度s，生成所述待去雾图片中不同像素对应的深度信息。According to the brightness v and saturation s, depth information corresponding to different pixels in the image to be dehazed is generated.

实际操作中，根据所述亮度v和饱和度s，通过现有的深度信息公式进行计算，对已知的参数θ0，θ1，θ2线性计算，得到待去雾图片中不同像素对应的深度信息d。In actual operation, according to the brightness v and saturation s, the existing depth information formula is used for calculation, and the known parameters θ0, θ1, θ2 are linearly calculated to obtain the depth information d corresponding to different pixels in the image to be dehazed. .

深度信息计算公式如下：The depth information calculation formula is as follows:

d(X)＝θ0+θ1v(X)+θ2s(X)+ε(X)；d(X)=θ0+θ1v(X)+θ2s(X)+ε(X);

其中，X表示待去雾图片，ε(X)是随机变量表示模型的随机错误，可以把ε当做随机图。Among them, X represents the image to be dehazed, ε(X) is the random variable representing the random error of the model, and ε can be regarded as a random image.

通过深度信息对输入的待去雾原图片进行深度处理，强化深区域的表现效果，比如针对远的地方提升亮度以及增强对比度，使得远景更加清楚等。Deep processing is performed on the input original image to be dehazed through depth information to enhance the performance effect of deep areas, such as improving brightness and contrast in distant places, making the distant view clearer, etc.

步骤S13，将深度处理后的所述待去雾图片输入至预先构建的超轻量级图片去雾及识别网络模型中。Step S13: Input the deeply processed image to be dehazed into a pre-built ultra-lightweight image dehazing and recognition network model.

步骤S14，获取所述超轻量级图片去雾及识别网络模型输出的图片去雾及识别结果。Step S14, obtaining the image dehazing and identification results output by the ultra-lightweight image dehazing and identification network model.

本申请公开的超轻量级图片去雾及识别网络模型包括去雾和识别两个模块，去雾模块的双向GAN网络模型能够很好的提取真实雾图的深度信息和合成雾图进行处理，通过先变编码后解码的方式，将图片解译到共享潜在空间，随后进行重建，随后将重建得到的清晰图送入识别模块，识别模块的Yolo-Tiny-S模型能够有效进行物体识别，以实现去雾后物体识别的目的。The ultra-lightweight image dehazing and recognition network model disclosed in this application includes two modules: dehazing and recognition. The bidirectional GAN network model of the dehazing module can well extract the depth information of the real haze map and process the synthetic haze map. Through the method of encoding first and then decoding, the picture is interpreted into the shared latent space, and then reconstructed, and then the reconstructed clear picture is sent to the recognition module. The Yolo-Tiny-S model of the recognition module can effectively recognize the object, with To achieve the purpose of object recognition after dehazing.

本申请第三实施例公开了一种计算机设备，包括：The third embodiment of the present application discloses a computer device, including:

存储器，用于存储计算机程序。Memory for storing computer programs.

处理器，用于执行所述计算机程序时实现如本申请第二实施例所述的图片去雾及识别方法的步骤。The processor is configured to implement the steps of the image dehazing and identification method according to the second embodiment of the present application when executing the computer program.

本申请第四实施例公开了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理执行时实现如本申请第二实施例所述的图片去雾及识别方法的步骤。The fourth embodiment of the present application discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is processed and executed, the image removal method described in the second embodiment of the present application is realized. Fog and steps of identification method.

本申请针对雾图在物体识别任务中识别准确率低的问题，通过上述实施例首次提出了一种针对微小型物体识别网络的集图像去雾与图像识别于一体的网络模型。通过改进端对端去雾网络的训练模型，提出了采用真实图片和合成图片混合训练的用于去雾、识别的网络模型，该模型能够提取图片的深度信息进行分析，从而使端对端去雾网络能够正确地分析图像的深度细节和浓雾信息，使得图片能够在浓雾场景下有较好的去雾处理效果，并且使得去雾后模型在后续微型目标检测网络中也能保持较好的效果。Aiming at the problem of low recognition accuracy of haze images in object recognition tasks, the present application first proposes a network model integrating image dehazing and image recognition for micro and small object recognition networks through the above embodiments. By improving the training model of end-to-end dehazing network, a network model for dehazing and identification is proposed, which is trained by mixing real pictures and synthetic pictures. The model can extract the depth information of pictures for analysis, so that end-to-end dehazing The fog network can correctly analyze the depth details and dense fog information of the image, so that the image can have a better dehazing effect in the dense fog scene, and the model after dehazing can also keep well in the subsequent micro-target detection network. Effect.

以上结合具体实施方式和范例性实例对本申请进行了详细说明，不过这些说明并不能理解为对本申请的限制。本领域技术人员理解，在不偏离本申请精神和范围的情况下，可以对本申请技术方案及其实施方式进行多种等价替换、修饰或改进，这些均落入本申请的范围内。本申请的保护范围以所附权利要求为准。The present application has been described in detail above with reference to the specific embodiments and exemplary examples, but these descriptions should not be construed as a limitation on the present application. Those skilled in the art understand that, without departing from the spirit and scope of the present application, various equivalent replacements, modifications or improvements can be made to the technical solutions of the present application and the embodiments thereof, which all fall within the scope of the present application. The scope of protection of the present application is determined by the appended claims.

Claims

1. an ultra-lightweight image dehazing and identification network model, is characterized in that, comprises: two-way GAN network model and target detection network model that are connected successively;

The two-way GAN network model is used to process the input image to be dehazed and output a clear image, and the target detection network model is used to perform feature recognition processing on the clear image;

The target detection network model is a Yolo-Tiny-S network model that has undergone row pruning and retraining; in the row pruning and retraining process, the original images in the training set of the target detection network model are trained multiple times, and each training Before, the original image is down-sampled by a preset multiple, and after each training is completed, the scaling coefficients of the batch normalization layer are sorted and compared, and the previous layer corresponding to the channel whose scaling coefficient is smaller than the preset scaling threshold is sorted and compared. The convolution kernel is removed to realize pruning;

The target detection network model includes a feature direct processing module and a feature fusion processing module. The feature direct processing module includes a front extraction unit, a middle extraction unit and a first rear extraction unit that are connected in sequence. The front extraction unit is input into the target detection network model, the feature fusion processing module includes a feature fusion splicing unit and a second rear extraction unit that are connected in sequence, and the outputs of the front extraction unit and the middle extraction unit are The terminals are all connected to the input terminal of the feature fusion splicing unit.

2. a kind of ultra-lightweight image dehazing and identification network model according to claim 1, is characterized in that,

The front extraction unit includes a DBL-S combination subunit and a plurality of MDBL-S combination subunits connected in sequence;

The middle extraction unit includes a plurality of the MDBL-S combination subunits and one of the DBL-S combination subunits connected in sequence;

The first rear extraction unit and the second rear extraction unit include a DBL-S combination subunit and a convolution subunit connected in turn;

The feature fusion and splicing unit includes an upsampling subunit and a feature splicing subunit, the input end of the upsampling subunit is connected with the output end of the middle extraction unit, and the output end of the upsampling subunit is connected to the The output end of the front extraction unit is connected to the input end of the feature splicing subunit;

The DBL-S combination subunit is composed of a dark network convolution layer, a batch normalization layer and a leakage correction linear layer that are connected in sequence;

The MDBL-S combination subunit is composed of the sequentially connected max pooling layer and the DBL-S combination subunit.

3. a kind of ultra-lightweight image dehazing and identification network model according to claim 1, is characterized in that, described two-way GAN network model comprises input module, generation module and discriminating module;

The input module includes a clear image input port and a fog image input port, the generation module includes a first generation unit and a second generation unit, and the discrimination module includes a first discriminator and a second discriminator;

The clear image input port, the first generating unit and the first discriminator are connected in sequence for feature extraction and reconstruction for the clear image, the fog image input port, the second generating unit and the The second discriminators are connected in sequence, and are used for feature extraction and reconstruction for the fog map;

The first generation unit includes a first encoder, a shared latent space, and a first decoder connected in sequence, and the second generation unit includes a second encoder, the shared latent space, and a second decoder connected in sequence;

The shared latent space is used for storing high-level features, and outputting the high-level features to the first decoder and the second decoder, the high-level features including the high-level features extracted by the first encoder for sharp images features and high-level features extracted by the second encoder for the haze image;

Both the training set and the verification set of the bidirectional GAN network model include pairs of clear images and fog images.

4 . The ultra-lightweight image dehazing and recognition network model according to claim 3 , wherein the first encoder and the second encoder respectively comprise first convolution blocks connected in sequence. 5 . , the second convolution block and the first coupled residual block;

The first decoder and the second decoder respectively include a second coupled residual block, a first-sized convolutional block and a second-sized convolutional block connected in sequence;

The output of the first coupled residual block is connected to the input of the shared latent space, and the output of the shared latent space is connected to the input of the second coupled residual block;

The output terminal of the first convolution block is jump connected to the input terminal of the second step size convolution block;

The output terminal of the second convolution block is jump connected to the input terminal of the first length convolution block.

5 . The ultra-lightweight image dehazing and recognition network model according to claim 4 , wherein the first coupled residual block and the second coupled residual block are respectively composed of a plurality of sub-residuals. 6 . block cascade composition;

The sub-residual block at any stage includes a first convolution layer, an activation function layer and a second convolution layer that are connected in sequence, and are used to process the output results of the sub-residual block of the previous stage and the sub-residual of the previous stage. The output result of the block, and the processing result is output to the next-level sub-residual block and the next-level sub-residual block.

6. The ultra-lightweight image dehazing and identification network model according to claim 3, wherein the first discriminator and the second discriminator both comprise three layers of discriminant networks and one layer of granularity Discriminant network, any layer The discriminant network includes three convolutional layers and one activation function layer.

7. a kind of ultra-lightweight image dehazing and identification network model according to claim 3, is characterized in that, described two-way GAN network model is optimized by following loss function: generate confrontation loss function, MSE loss function and total. Variational loss function.

8. A picture dehazing and identification method, wherein the method comprises:

Extract the depth information corresponding to different pixels in the image to be dehazed;

performing depth processing on the picture to be dehazed according to the depth information;

Input the deeply processed image to be dehazed into a pre-built ultra-lightweight image dehazing and recognition network model;

Obtain the image dehazing and recognition results output by the ultra-lightweight image dehazing and recognition network model.

9. The image dehazing and identification method according to claim 8, wherein the extraction of depth information corresponding to different pixels in the picture to be dehazed comprises:

Perform format conversion on the picture to be dehazed, and extract the brightness and saturation of the picture to be dehazed;

According to the brightness and saturation, depth information corresponding to different pixels in the image to be dehazed is generated.

10. A computer equipment, characterized in that, comprising:

memory for storing computer programs;

The processor is configured to implement the steps of the image dehazing and identification method according to any one of claims 8 or 9 when executing the computer program.