CN109886970A

CN109886970A - Detection and segmentation method of target objects in terahertz images and computer storage medium

Info

Publication number: CN109886970A
Application number: CN201910048648.9A
Authority: CN
Inventors: 梁栋; 潘家兴; 吴天鹏; 孙涵
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2019-06-14
Anticipated expiration: 2039-01-18
Also published as: CN109886970B

Abstract

The invention discloses a detection and segmentation method of a target object in a terahertz image and a computer storage medium. The conditional generation confrontation network is used to organically fuse the two tasks of detection and segmentation, and the segmentation mask is generated for samples containing weapons. For samples that do not contain weapons, generate a picture with zero values of all pixels, set an appropriate loss function and an efficient network structure, when the training set is small and the image signal-to-noise ratio is low, the trained image The model can detect whether the input terahertz image contains the target object and segment the target object from the image. The invention can train an accurate detection and segmentation model under the condition that the number of training samples is small and the signal-to-noise ratio of the picture is very low, and the model has fast processing speed and high timeliness, and can be applied to In the real-time security inspection system, the labor intensity of manual security inspection is reduced and the efficiency is improved.

Description

Detection and segmentation method of target objects in terahertz images and computer storage medium

技术领域technical field

本发明涉及一种目标物体的检测分割方法及计算机存储介质，特别是涉及一种太赫兹图像中目标物体的检测分割方法及计算机存储介质。The invention relates to a method for detecting and segmenting a target object and a computer storage medium, in particular to a method for detecting and segmenting a target object in a terahertz image and a computer storage medium.

背景技术Background technique

在安检中，检测隐藏在衣服一下的物体是一个关键的任务。然而现行的手工安检手段十分低效，而且有较高的漏检率。太赫兹成像技术提供了一种非接触式的方案。太赫兹是一种波长位于0.1×10¹²-10×10¹²Hz之间的电磁波，它通过测量人体不同部分的辐射热量来获得隐藏在衣物下物体的图像。而且相比于X-射线，这种电磁波对人体无害。Detecting objects hidden under clothing is a critical task in security checks. However, the current manual security inspection methods are very inefficient and have a high rate of missed inspections. Terahertz imaging technology provides a non-contact solution. Terahertz is an electromagnetic wave with a wavelength between 0.1×10 ¹² -10×10 ¹² Hz, which obtains images of objects hidden under clothing by measuring the radiant heat of different parts of the human body. And compared to X-rays, such electromagnetic waves are harmless to the human body.

在标记数据有限的情况下，我们不仅需要判断出图像中是否包含隐藏的物体，还需要将检测到的目标准确地分割出来。然而由于太赫兹成像内在的物理因素，图像的对比度和信噪比往往很低。在像素数值上隐藏的物体和人体部分的差异并不明显，而且物体边缘几乎和人体混在一起，无法分辨。同时图片还有较多的背景噪声。因此，在这样糟糕的条件下，导致标准的目标分割和显著性检测的算法都没有较好的解决难题。早期的工作聚焦于统计模型，Shen(Shen,X.,Dietlein,C.,Grossman,E.N.,Popovic,Z.,Meyer,F.G.:Detection and segmentation of concealed objects in terahertz images.IEEETransactions on Image Processing 17(2008)2465-2475)提出通过混合高斯模型来对辐射温度进行建模的多层次阈值的分割算法。首先使用各项异性扩散算法来去除噪声，然后去追踪随着阈值的变化而改变的边界，从而完成分割任务。LEE(Lee,D.,Yeom,S.,Son,J.,Kim,S.:Automatic image segmentation for concealed object detection using theexpectation-maximization algorithm.Optics Express 18(2010)10659-10667)同样使用混合高斯模型。通过两次的期望最大化算法求取贝叶斯边界来分割人体和人体包含的目标。类似的，Yeom(Lee,D.S.,Son,J.Y.,Jung,M.K.,Jung,S.W.,Lee,S.J.,Yeom,S.,Jang,Y.S.:Real-time outdoor concealed-object detection with passive millimeterwave imaging.Optics Express 19(2011)2530-2536)使用基于混合高斯模型的多尺度分割算法，同时为实现实时性，在期望最大化算法之前使用矢量量化的技术。然而这些传统算法都不具备检测的能力，即都是在当前图片包含物体的前提下进行分割操作，同时在这样差的成像条件下，没有好的分割结果。In the case of limited labeled data, we not only need to determine whether the image contains hidden objects, but also need to accurately segment the detected objects. However, due to the physical factors inherent in terahertz imaging, the contrast and signal-to-noise ratio of the images are often low. The difference between the hidden object and the human body part in the pixel value is not obvious, and the edge of the object is almost mixed with the human body and cannot be distinguished. At the same time, there is a lot of background noise in the picture. Therefore, under such bad conditions, the algorithms leading to standard object segmentation and saliency detection do not have a good solution to the problem. Earlier work focused on statistical models, Shen (Shen, X., Dietlein, C., Grossman, E.N., Popovic, Z., Meyer, F.G.: Detection and segmentation of concealed objects in terahertz images. IEEE Transactions on Image Processing 17 (2008) ) 2465-2475) propose a multi-level threshold segmentation algorithm for modeling radiant temperature by a Gaussian mixture model. First, the anisotropic diffusion algorithm is used to remove noise, and then the boundary that changes with the threshold is traced to complete the segmentation task. LEE (Lee, D., Yeom, S., Son, J., Kim, S.: Automatic image segmentation for concealed object detection using the expectation-maximization algorithm. Optics Express 18 (2010) 10659-10667) also uses a Gaussian mixture model . The human body and the objects contained in the human body are segmented by obtaining the Bayesian boundary through an expectation-maximization algorithm twice. Similarly, Yeom (Lee, D.S., Son, J.Y., Jung, M.K., Jung, S.W., Lee, S.J., Yeom, S., Jang, Y.S.: Real-time outdoor concealed-object detection with passive millimeterwave imaging. Optics Express 19 (2011) 2530-2536) used a multi-scale segmentation algorithm based on a Gaussian mixture model, and at the same time, to achieve real-time performance, a vector quantization technique was used before the expectation maximization algorithm. However, these traditional algorithms do not have the ability to detect, that is, they all perform segmentation operations on the premise that the current image contains objects, and at the same time, under such poor imaging conditions, there is no good segmentation result.

当前基于深度卷积神经网络的实例分割算法具有很好的潜力来解决这个难题。这一类方法可以大致分为两类，一类是基于R-CNN推荐的方法。他们使用自下而上的流程，分割的结果依赖于R-CNN的推荐区域，再去为这些结果分类。另外一类方法是基于语义分割的方法。这些方法实现实例分割是依赖于语义分割的结果的，即先得到语义分割的结果，再去将像素分到不同的实例中去。一个最新的实例分割算法Mask-RCNN(He,K.,Gkioxari,G.,Dollar,P.,Girshick,R.B.:Mask r-cnn.International Conference on ComputerVision(2017)2980-2988)共享卷积层的特征，能够同时完成检测，分类和分割的任务。但这些方法一般将分割和检测作为两个独立的单元，从而使得模型结构很复杂，处理时间长以及不鲁棒等。而且这一类方法需要大量的数据才能训练出可用的模型，显然这一类方法不适合太赫兹场景的目标检测和分割任务。Current instance segmentation algorithms based on deep convolutional neural networks have good potential to address this challenge. This type of method can be roughly divided into two categories, one is the method based on R-CNN recommendation. They use a bottom-up pipeline, where the segmentation results rely on the R-CNN recommendation regions, and then go to classify these results. Another class of methods is based on semantic segmentation. The implementation of instance segmentation by these methods depends on the results of semantic segmentation, that is, the results of semantic segmentation are obtained first, and then the pixels are divided into different instances. A state-of-the-art instance segmentation algorithm Mask-RCNN (He, K., Gkioxari, G., Dollar, P., Girshick, R.B.: Mask r-cnn. International Conference on ComputerVision (2017) 2980-2988) shared convolutional layers features, which can complete the tasks of detection, classification and segmentation at the same time. However, these methods generally treat segmentation and detection as two independent units, resulting in complex model structure, long processing time, and unrobustness. Moreover, this type of method requires a large amount of data to train a usable model. Obviously, this type of method is not suitable for target detection and segmentation tasks in terahertz scenes.

发明内容SUMMARY OF THE INVENTION

发明目的：本发明要解决的技术问题是提供一种太赫兹图像中目标物体的检测分割方法及计算机存储介质，克服现有技术中需要目标物体和背景反差较大或者需要采集和标注大量的高质量样本等缺陷，在信噪比很低、目标物体和背景在像素密度上极其相似并且训练样本数量较少的情况下，获得精确的检测和分割效果，同时具有高的处理速度，使其能够应用到实时安检系统中。Purpose of the invention: The technical problem to be solved by the present invention is to provide a detection and segmentation method and a computer storage medium for a target object in a terahertz image, which overcomes the need for a large contrast between the target object and the background in the prior art or the need to collect and label a large number of high-resolution images. Defects such as quality samples, in the case where the signal-to-noise ratio is very low, the target object and the background are extremely similar in pixel density, and the number of training samples is small, accurate detection and segmentation results are obtained, and at the same time, it has high processing speed, enabling it to Applied to the real-time security inspection system.

技术方案：本发明所述的太赫兹图像中目标物体的检测分割方法，包括以下步骤：Technical solution: The method for detecting and segmenting a target object in a terahertz image according to the present invention includes the following steps:

(1)将训练集中的原始太赫兹图像x作为训练样本，采用人工方式将x中的目标物体区域标注出来，得到真实分割掩膜y，同时对x和y进行预处理，将图像中的所有像素值转换到-1到1之间；(1) Using the original terahertz image x in the training set as a training sample, manually mark the target object area in x to obtain the real segmentation mask y, and preprocess x and y at the same time, all the The pixel value is converted to between -1 and 1;

(2)将所述的原始太赫兹图像x和随机噪声z输入到生成器中，生成器生成假分割掩膜；(2) Input the original terahertz image x and random noise z into the generator, and the generator generates a false segmentation mask;

(3)将真实分割掩膜和对应的原始太赫兹图像组成真实样本对，将假分割掩膜和对应的原始太赫兹图像组成假样本对，共同输入到判别器中进行评判；(3) The real segmentation mask and the corresponding original terahertz image are formed into a real sample pair, and the false segmentation mask and the corresponding original terahertz image are formed into a fake sample pair, which are jointly input into the discriminator for judgment;

(4)分别计算生成器和判别器的损失函数，利用梯度下降算法对生成器进行优化，使生成器的损失函数尽可能小，梯度上升算法对判别器进行优化，使判别器的损失函数尽可能大，直到模型收敛，回到步骤(1)，遍历完训练集后进入步骤(5)；(4) Calculate the loss function of the generator and the discriminator separately, and use the gradient descent algorithm to optimize the generator to make the loss function of the generator as small as possible. It may be large until the model converges, return to step (1), and enter step (5) after traversing the training set;

(5)回到步骤(1)，直到达到设定的迭代次数；(5) Go back to step (1) until the set number of iterations is reached;

(6)保存上述步骤训练好的模型，将新的待检测图片输入到生成器中，得到与之对应的检测和分割结果。(6) Save the model trained in the above steps, input the new image to be detected into the generator, and obtain the corresponding detection and segmentation results.

为了保持目标区域同时舍弃其他无用细节，获得简单高效的网络结构，可以选择性地连接编码器和解码器的部分层，所述的生成器为二维神经网络G，包含8个卷积层和8个转置卷积层,其中所有卷积层和转置卷积层的滤波器大小和移动步长都为5*5和2*2，8个卷积层的深度依次为64、128、256、512、512、512、512、1024，8个转置卷积层的深度依次为512、512、512、512、256、128、64、1。In order to keep the target area while discarding other useless details and obtain a simple and efficient network structure, some layers of the encoder and decoder can be selectively connected. The generator is a two-dimensional neural network G, which includes 8 convolutional layers and 8 transposed convolutional layers, in which the filter size and moving step size of all convolutional layers and transposed convolutional layers are 5*5 and 2*2, and the depths of the 8 convolutional layers are 64, 128, 256, 512, 512, 512, 512, 1024, the depths of the 8 transposed convolutional layers are 512, 512, 512, 512, 256, 128, 64, 1 in sequence.

为了使得生成器的输出更加多样化，增加模型鲁棒性，所述的随机噪声z由生成器中dropout操作产生，分别在所述转置卷积层的前三层进行，每个节点被保留的几率为50％。In order to make the output of the generator more diverse and increase the robustness of the model, the random noise z is generated by the dropout operation in the generator, which is performed in the first three layers of the transposed convolution layer, and each node is reserved 50% chance.

进一步的，所述的判别器为二维神经网络D，包含3个卷积层和1个全连接层，3个卷积层的滤波器大小均为5*5，步长均为2*2，滤波器深度分别为64、128、256，全连接层使用sigmoid作为激活函数，输出结果为一维标量。Further, the discriminator is a two-dimensional neural network D, including 3 convolutional layers and 1 fully connected layer, the filter size of the 3 convolutional layers is 5*5, and the step size is 2*2 , the filter depths are 64, 128, and 256 respectively, the fully connected layer uses sigmoid as the activation function, and the output result is a one-dimensional scalar.

为了在召回率和虚警率之间做出很好地权衡，所述判别器的损失函数为：In order to make a good trade-off between recall rate and false alarm rate, the loss function of the discriminator is:

所述生成器的损失函数为： The loss function of the generator is:

其中，L_s＝‖G(x,z)‖₁，‖·‖₁为L₁范数的表达式；D(x,y)是判别器的输出，G(x,z)是生成器的输出，为真实样本对的概率分布的期望，为假样本对的概率分布的期望，λ和β分别是和L_s对整个损失函数影响的权值。in, L _s = ‖G(x,z)‖ ₁ , ‖·‖ ₁ is the expression of the L ₁ norm; D(x,y) is the output of the discriminator, G(x,z) is the output of the generator, is the expectation of the probability distribution of the true sample pair, is the expectation of the probability distribution of the false sample pair, λ and β are respectively and L _s influence the weight of the entire loss function.

进一步的，步骤(1)中人工标注的方法是将每张太赫兹图形中的目标区域标注出来，将目标区域的像素值标为255其余位置为0，所述的预处理过程是将所有像素值转换到-1到1之间，转换公式为：p/127.5–1，其中p为具体的像素值。Further, the manual labeling method in step (1) is to label the target area in each terahertz graph, and label the pixel value of the target area as 255 and the rest of the positions as 0. The preprocessing process is to label all pixels. The value is converted to between -1 and 1, and the conversion formula is: p/127.5–1, where p is the specific pixel value.

本发明所述的计算机存储介质，其上存储有计算机程序，所述的计算机程序被计算机处理器执行时实现上述任一项所述的方法。The computer storage medium of the present invention stores a computer program thereon, and when the computer program is executed by a computer processor, any one of the methods described above is implemented.

有益效果：本发明能够能够在训练样本数量很少而且图片的信噪比很低的情况下训练出一个精确的检测和分割模型，并且该模型具有很快的处理速度和很高的时效性，可以应用在实时安检系统中，从而降低人工安检的劳动强度，提高效率。Beneficial effects: the present invention can train an accurate detection and segmentation model under the condition that the number of training samples is small and the signal-to-noise ratio of the picture is very low, and the model has fast processing speed and high timeliness, It can be used in real-time security inspection systems, thereby reducing the labor intensity of manual security inspections and improving efficiency.

附图说明Description of drawings

图1是选取的太赫兹图像示意图；1 is a schematic diagram of a selected terahertz image;

图2是本方法模型训练和测试过程示意图；Fig. 2 is the schematic diagram of this method model training and testing process;

图3是使用的模型结构和其他方案的对比示意图；Fig. 3 is the comparative schematic diagram of the model structure used and other schemes;

图4是本发明的模型详细结构。Fig. 4 is the detailed structure of the model of the present invention.

具体实施方式Detailed ways

太赫兹图像示意图如图1所示，其中目标物体部分用矩形框包围。本方法的具体过程如图2所示，分为训练过程和测试过程，训练过后目的是建立一个训练好的模型，测试过程将待测试图片输入模型，检验本方法的效果。本发明所采用的生成对抗网络是一种生成模型，通过对抗的过程能够实现由一张图片到另一张图片变换的过程。一个标准的生成对抗网络包含一个生成器和一个判别器。其中，是生成器的任务是根据输入信息，产生足够逼真的假样本来欺骗判别器，另一方面，判别器同时接受真正的从数据集中采样的样本和生成器产生的假样本然后去判断哪些样本是真实的哪些是假的。最终生成器能够产生判别器无法判别真伪的样本，即生成器捕捉到训练样本的分布。基于条件的生成对抗网络通过额外的信息来约束生成器，即使在很小样本量的条件下也能根据用户的标记数据生成不错的结果，这是本发明所采取的设计思想。The schematic diagram of the terahertz image is shown in Figure 1, where the target object is partially surrounded by a rectangular frame. The specific process of this method is shown in Figure 2, which is divided into a training process and a testing process. After training, the purpose is to establish a trained model. In the testing process, the images to be tested are input into the model to test the effect of the method. The generative confrontation network adopted in the present invention is a generative model, and the process of transforming from one picture to another can be realized through the confrontation process. A standard generative adversarial network consists of a generator and a discriminator. Among them, the task of the generator is to generate sufficiently realistic fake samples to deceive the discriminator based on the input information. On the other hand, the discriminator accepts both real samples sampled from the data set and fake samples generated by the generator and then determines which ones The samples are real and which are fake. Finally, the generator can generate samples that the discriminator cannot discriminate between true and false, that is, the generator captures the distribution of training samples. The condition-based generative adversarial network constrains the generator with additional information, and can generate good results according to the user's labeled data even under the condition of a small sample size, which is the design idea adopted by the present invention.

在训练过程中，为每一个样本提供正确的检测和分割结果，每张图片对应的给出相同尺寸的，所有像素的值为0的图片，对于包含武器的样本，在武器出现的区域将那些像素的值标记为255；而不包含任何武器的样本，则不做任何修改。具体地是，通过人工标注的方法将每张太赫兹图形中的目标区域标注出来，标注工具使用一个免费的标注工具Colabeler，将目标区域的像素值标为255其余位置为0，得到真实的分割掩膜。将原始的太赫兹图像和标注图片缩放到长宽都为256个像素后，然后开始图像预处理。预处理过程是将所有像素值都设置于于-1到1之间，转换公式为：p/127.5–1(p为具体的一个像素值)。将随机噪声z输入到生成器中，同时将缩放后的原始待检测图像作为条件x也输入到生成器中。生成器学习从输入图片到检测和分割结果的映射函数，输出对应的检测、分割结果G(x,z)。判别器的输入包含两个部分，一个是真实的样本对：原始太赫兹图像x和其真实的分割、检测结果即真实分割掩膜y,记为(x,y)；另一个是假的样本对：原始太赫兹图像x和生成器的输出假分割掩膜G(x,z)，记为(x,G(x,z))。对于判别器来说，它需要给真实的样本对较高的分数，相反给假的样本对较低的分数，而对生成器来说，它需要生产越来越真实的样本G(x,z)来欺骗判别器，从而使(x,G(x,z))获得更高的分数，并且最终判别器无法分辨这两对输入哪个是假的，这样生成器就学习到了隐含的从太赫兹图像到检测分割图像的映射。在测试过程中，将待检测的图片输入到训练好的生成器中，那么模型就会将人体，背景噪声等多余的信息丢弃，最终分割出隐藏的物体。检测和分割同时完成。During the training process, the correct detection and segmentation results are provided for each sample. Each image corresponds to an image of the same size, with all pixel values 0. Pixels are marked with a value of 255; samples that do not contain any weapons are left unmodified. Specifically, the target area in each terahertz graph is marked by manual labeling. The labeling tool uses a free labeling tool Colabeler to mark the pixel value of the target area as 255 and the rest as 0 to obtain the real segmentation. mask. After scaling the original terahertz image and the annotated image to 256 pixels in length and width, image preprocessing begins. The preprocessing process is to set all pixel values between -1 and 1, and the conversion formula is: p/127.5–1 (p is a specific pixel value). The random noise z is input into the generator, and the scaled original image to be detected is also input into the generator as the condition x. The generator learns the mapping function from the input image to the detection and segmentation results, and outputs the corresponding detection and segmentation results G(x,z). The input of the discriminator consists of two parts, one is a real sample pair: the original terahertz image x and its real segmentation, the detection result is the real segmentation mask y, denoted as (x, y); the other is a fake sample Right: the original terahertz image x and the generator's output pseudo-segmentation mask G(x, z), denoted by (x, G(x, z)). For the discriminator, it needs to give higher scores to real pairs and lower scores to fake pairs, while for the generator, it needs to produce more and more real samples G(x,z ) to fool the discriminator so that (x,G(x,z)) gets a higher score, and in the end the discriminator cannot tell which of the two pairs of inputs is fake, so the generator learns an implicit Mapping of Hertz images to detection segmentation images. In the testing process, the image to be detected is input into the trained generator, then the model will discard the redundant information such as human body and background noise, and finally segment the hidden objects. Detection and segmentation are done simultaneously.

为了实现这样的目标，本发明首先定义了合适的损失函数。To achieve such a goal, the present invention first defines a suitable loss function.

判别器的损失函数为：The loss function of the discriminator is:

L_s＝‖G(x,z)‖₁ (3)L _s = ‖G(x,z)‖ ₁ (3)

所述生成器的损失函数为：The loss function of the generator is:

其中，‖·‖₁为L₁范数的表达式；D(x,y)是判别器的输出，数值在0到1之间，G(x,z)是生成器的输出，为真实样本对的概率分布的期望，为假样本对的概率分布的期望，λ和β分别是和L_s对整个损失函数影响的权值。x,y都采样于真实的样本分布，判别器通过让D(x,y)尽可能的接近1，同时D(x,G(x,z))尽可能的接近0，来使得期望最大化即公式(1)实现最大化，相反地，生成器则企图将D(x,G(x,z))尽可能地接近1，来使得期望最小化。噪声z被表示为生成器中Dropout操作(在训练过程中随机地将部分结点的权值赋0值)，使得生成器的输出更加多样化，更多的假样本，从而使得生成器更加容易获得原始数据分布，增加模型鲁棒性。同时x作为条件被分别加入到两个网络中，构成条件生成对抗网络，约束生成器的输出与输入进行一一配对，否则检测和分割的结果将会是随机的。Among them, ‖·‖1 is the expression of L ₁ norm _; D(x,y) is the output of the discriminator, the value is between 0 and 1, G(x,z) is the output of the generator, is the expectation of the probability distribution of the true sample pair, is the expectation of the probability distribution of the false sample pair, λ and β are respectively and L _s influence the weight of the entire loss function. Both x and y are sampled from the real sample distribution. The discriminator maximizes the expectation by making D(x,y) as close to 1 as possible and D(x,G(x,z)) as close to 0 as possible. That is, formula (1) is maximized. On the contrary, the generator tries to minimize the expectation by making D(x, G(x, z)) as close to 1 as possible. The noise z is represented as a dropout operation in the generator (randomly assigning the weights of some nodes to 0 during the training process), which makes the output of the generator more diverse and more fake samples, thus making the generator easier Obtain the original data distribution to increase model robustness. At the same time, x is added to the two networks as a condition, forming a conditional generative adversarial network, and the output of the constraint generator is paired with the input one by one, otherwise the results of detection and segmentation will be random.

由于公式(1)只考虑了生成对抗网络的损失函数，其目标是生成判别器无法判别真伪的图片。然而能够生成逼真的图片，并不一定代表是好的分割结果。所以公式(2)使用曼哈顿距离来衡量重构误差。加入这一项的约束，要求生成的图片不仅仅要足够真实，同时要与正确的分割结果足够接近，即精确的分割结果。另外，使用曼哈顿距离而不是常用的L₂范数，在于L₂约束往往导致生成图片的模糊。Since formula (1) only considers the loss function of the generative adversarial network, its goal is to generate pictures that the discriminator cannot distinguish between true and false. However, being able to generate realistic images does not necessarily represent a good segmentation result. So formula (2) uses the Manhattan distance to measure the reconstruction error. The constraint of adding this item requires that the generated image not only be realistic enough, but also close enough to the correct segmentation result, that is, the accurate segmentation result. Also, using the Manhattan distance instead of the commonly used L ₂ norm is that the L ₂ constraint tends to lead to blurry generated images.

和许多目标检测分割的任务一样，隐藏的武器相对于整张图片来说只占据很小的部分，所以公式(3)要求生成的图片也应该是稀疏的。本发明使用L₁范数作为稀疏性约束而不是理论上的L₀范数和常用的L₂范数，原因在于，前者需要求解NP难的问题，而后者只能将各项逼近到很小的数值，并不能保证大部分元素是0.Like many object detection and segmentation tasks, the hidden weapons only occupy a small part of the whole image, so formula (3) requires that the generated images should also be sparse. The present invention uses the L ₁ norm as the sparsity constraint instead of the theoretical L ₀ norm and the commonly used L ₂ norm, because the former needs to solve NP-hard problems, while the latter can only approximate each item to a very small size is not guaranteed to be 0 for most elements.

生成器的损失函数为公式(4)。λ和β分别是和L_s对整个损失函数影响的权值，实际使用时，λ和β分别为800和5。条件x和重构误差能够监督生成过程，保证生成的图片与输入的太赫兹图片是一一对应的，否则是随机的，不满足目标检测和分割的任务。同时通过最小化曼哈顿距离维持了精确的分割结果。作为先验知识，稀疏性约束降低了虚警率。The loss function of the generator is Equation (4). λ and β are respectively and L _s affect the weight of the entire loss function. In practice, λ and β are 800 and 5, respectively. The condition x and the reconstruction error can supervise the generation process to ensure that the generated image is in one-to-one correspondence with the input terahertz image, otherwise it is random and does not meet the tasks of target detection and segmentation. At the same time, accurate segmentation results are maintained by minimizing the Manhattan distance. As a priori knowledge, the sparsity constraint reduces the false alarm rate.

模型的训练使用梯度上升算法来优化判别器，梯度下降算法来优化生成器。在训练生成器时，需要保持判别器的参数不变，最小化公式(4)；而当训练判别器时则保持生成器当前的参数不变，最大化公式(1)。在迭代过程中交替地来训练判别器和生成器，直到模型收敛。具体实施过程中，将学习率设置为0.00012，minibatch为8，遍历完整个训练集为一次迭代，总共进行200次迭代，得到最终模型。The training of the model uses gradient ascent to optimize the discriminator and gradient descent to optimize the generator. When training the generator, it is necessary to keep the parameters of the discriminator unchanged and minimize formula (4); while when training the discriminator, keep the current parameters of the generator unchanged and maximize formula (1). Train the discriminator and generator alternately in an iterative process until the model converges. In the specific implementation process, the learning rate is set to 0.00012, the minibatch is 8, and the entire training set is traversed as one iteration, and a total of 200 iterations are performed to obtain the final model.

在网络结构方面进行了改进，如图3所示，相比于传统的“Encoder-Decoder”结构，该发明增加了部分解码层和编码层的连接，其中具体的生成器结构如图四所示，从而得到低层次的特征，这些特征能够提高模型对小目标的检测能力。相比于“U-Net”，(Ronneberger,O.,Fischer,P.,Brox,T.:U-net:Convolutional networks forbiomedical image segmentation.Medical Image Computing and Computer AssistedIntervention(2015)234-241)新的模型去除了部分和输出层很近的连接。如图3所示，U-Net编码器前几层的输出包含大量的边缘和背景信息，如果将这些层连接到离输出层很近的解码层时，会导致较高的虚警，所以本发明去除了这部分连接，在召回率和虚警率之间找到均衡。根据以上的损失函数和对应的模型结构，使用梯度下降算法迭代求取优化结果，得到最终模型参数。所述的判别器包含3个卷积层和一个全连接层，3个卷积层的滤波器大小均为5*5，步长均为2*2，滤波器深度分别为64，128，256。全连接层使用sigmoid作为激活函数输出结果为一维标量。如图4所示，所述的生成器包含8个卷积层和8个转置卷积层。其中所有卷积层和转置卷积层的滤波器大小和移动步长都为5*5和2*2，8个卷积层的深度和8个转置卷积层的深度依次为：64，128，256，512，512，512,512,1024，512,512,512,512,256,128,64,1。噪声z由dropout操作产生，分别在转置卷积层的前三层进行，每个节点被保留的几率为50％。The network structure has been improved. As shown in Figure 3, compared with the traditional "Encoder-Decoder" structure, the invention adds some connections between the decoding layer and the encoding layer. The specific generator structure is shown in Figure 4. , so as to obtain low-level features, which can improve the detection ability of the model for small targets. Compared to "U-Net", (Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer Assisted Intervention (2015) 234-241) new The model removes some connections that are close to the output layer. As shown in Figure 3, the output of the first few layers of the U-Net encoder contains a lot of edge and background information. If these layers are connected to the decoding layer close to the output layer, it will lead to higher false alarms, so this The invention removes this part of the connection and finds a balance between the recall rate and the false alarm rate. According to the above loss function and the corresponding model structure, the gradient descent algorithm is used to iteratively obtain the optimization result, and the final model parameters are obtained. The discriminator includes three convolutional layers and one fully connected layer. The filter sizes of the three convolutional layers are all 5*5, the strides are all 2*2, and the filter depths are 64, 128, and 256 respectively. . The fully connected layer uses sigmoid as the activation function to output the result as a one-dimensional scalar. As shown in Figure 4, the described generator contains 8 convolutional layers and 8 transposed convolutional layers. The filter size and moving step size of all convolutional layers and transposed convolutional layers are 5*5 and 2*2, and the depth of 8 convolutional layers and the depth of 8 transposed convolutional layers are: 64 , 128, 256, 512, 512, 512, 512, 1024, 512, 512, 512, 512, 256, 128, 64, 1. The noise z is generated by dropout operation, which is performed in the first three layers of the transposed convolutional layers, respectively, and each node has a 50% chance of being preserved.

利用以上步骤训练好的模型，输入新的待检测图片到生成器中，得到与之对应的检测和分割结果。在实际测试中，本发明的模型能够将虚警率控制在11.35％的同时，将召回率提高到88.46％，明显优于现有的检测、分割算法。Using the model trained in the above steps, input a new image to be detected into the generator to obtain the corresponding detection and segmentation results. In the actual test, the model of the present invention can control the false alarm rate to 11.35% and at the same time increase the recall rate to 88.46%, which is obviously better than the existing detection and segmentation algorithms.

本发明的实施例还提供了一种计算机存储介质，其上存储有计算机程序。当所述计算机程序由处理器执行时，可以实现前述控制的方法。例如，该计算机存储介质为计算机可读存储介质。Embodiments of the present invention also provide a computer storage medium on which a computer program is stored. The aforementioned method of control can be implemented when the computer program is executed by a processor. For example, the computer storage medium is a computer-readable storage medium.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

Claims

1. A method for detecting and segmenting a target object in a terahertz image is characterized by comprising the following steps:

(1) taking an original terahertz image x in a training set as a training sample, marking a target object region in x in a manual mode to obtain a real segmentation mask y, preprocessing x and y at the same time, and converting all pixel values in the image between-1 and 1;

(2) inputting the original terahertz image x and the random noise z into a generator, and generating a false segmentation mask by the generator;

(3) forming a real sample pair by the real segmentation mask and the corresponding original terahertz image, forming a false sample pair by the false segmentation mask and the corresponding original terahertz image, and inputting the false segmentation mask and the corresponding original terahertz image into a discriminator together for evaluation;

(4) respectively calculating loss functions of a generator and a discriminator, optimizing the generator by using a gradient descent algorithm to make the loss function of the generator as small as possible, optimizing the discriminator by using a gradient ascent algorithm to make the loss function of the discriminator as large as possible until the model converges, returning to the step (1), and entering the step (5) after traversing the training set;

(5) returning to the step (1) until the set iteration number is reached;

(6) and storing the model trained in the step, and inputting the new picture to be detected into a generator to obtain a detection and segmentation result corresponding to the new picture to be detected.

2. The method for detecting and segmenting the target object in the terahertz image according to claim 1, wherein: the generator is a two-dimensional neural network G and comprises 8 convolutional layers and 8 transposed convolutional layers, wherein the filter size and the moving step size of all the convolutional layers and the transposed convolutional layers are 5 x 5 and 2 x 2, the depths of the 8 convolutional layers are 64, 128, 256, 512 and 1024 in sequence, and the depths of the 8 transposed convolutional layers are 512,256,128,64 and 1 in sequence.

3. The method for detecting and segmenting the target object in the terahertz image according to claim 2, wherein: the random noise z is generated by the dropout operation in the generator, and is respectively performed in the first three layers of the transposed convolutional layer, and the probability of each node being reserved is 50%.

4. The method for detecting and segmenting the target object in the terahertz image according to claim 1, wherein: the discriminator is a two-dimensional neural network D and comprises 3 convolution layers and 1 full-connection layer, the size of a filter of each convolution layer is 5 x 5, the step length is 2 x 2, the filter depth is 64, 128 and 256, the full-connection layer uses sigmoid as an activation function, and the output result is a one-dimensional scalar.

5. The method of detecting and segmenting a target object in a terahertz image according to claim 1, wherein the loss function of the discriminator is:

the loss function of the generator is:

wherein ,L_s＝||G(x，z)||₁，||·||₁is L₁A norm expression; d (x, y) is the output of the discriminator, G (x, z) is the output of the generator,as an expectation of the probability distribution of the true sample pairs,for the expectation of the probability distribution of the false sample pairs, λ and β are and L_SThe weight of the whole loss function.

6. The method for detecting and segmenting the target object in the terahertz image according to claim 1, wherein: the manual labeling method in the step (1) is to label a target region in each terahertz graph, label the pixel value of the target region as 255, and set the rest positions as 0, wherein the preprocessing process is to convert all the pixel values to between-1 and 1, and the conversion formula is as follows: p/127.5-1, where p is the specific pixel value.

7. A computer storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a computer processor, implements the method of any one of claims 1 to 6.