CN114639124A

CN114639124A - A Pedestrian Re-identification Method Based on Weighted Feature Fusion

Info

Publication number: CN114639124A
Application number: CN202210321956.6A
Authority: CN
Inventors: 孙劲光; 吴明岩
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-17

Abstract

The invention discloses a pedestrian re-identification method based on weighted feature fusion. The steps are: inputting pedestrian images, and extracting pedestrian features through a backbone network ResNeSt-50; inputting different features obtained from the backbone network into a weighted feature fusion pyramid model In the process, multi-scale features containing high-level semantic information and low-level high-resolution information are obtained; high-level semantic features are used as global features in the global branch network; low-level high-resolution features are used as local features in the local branch network; joint Softmax loss function, triple loss function and center loss function to optimize training on global features and local features, respectively. The present invention performs feature fusion in a weighted feature pyramid network, and uses high-level semantic feature words as global features; jointly optimizes and trains global features and local features by combining Softmax loss function, triple loss function and central loss function, and improves pedestrian performance. Recognition accuracy and accuracy.

Description

A Pedestrian Re-identification Method Based on Weighted Feature Fusion

技术领域technical field

本发明属于行人重识别的技术领域，尤其涉及一种基于加权特征融合的行人重识别方法。The invention belongs to the technical field of pedestrian re-identification, and in particular relates to a pedestrian re-identification method based on weighted feature fusion.

背景技术Background technique

行人重识别是指在跨摄像头跨场景的情况下，寻找特定的行人视频序列或图像。随着智能视频系统和智能安防应用的领域越来越多，行人重识别也吸引着越来越多的学者进行研究。由于摄像头参数不同和场景的不同，会有光照、遮挡或多姿态变化等问题，导致同一个人拍摄出来的行人图像也会有很大的差异，这就会为网络提取特征增加了难度。Pedestrian re-identification refers to finding a specific pedestrian video sequence or image in the case of cross-camera and cross-scene. With more and more fields of intelligent video systems and intelligent security applications, pedestrian re-identification is also attracting more and more scholars to conduct research. Due to different camera parameters and different scenes, there will be problems such as illumination, occlusion or multi-pose changes, resulting in great differences in pedestrian images captured by the same person, which will increase the difficulty for the network to extract features.

目前的行人重识别方法大多采用全局特征和局部特征融合的方法，一般会引进特征金字塔加强高层语义特征和低层高分辨率特征的融合，但是在融合的过程中没有考虑在特征融合时，不同特征融合的贡献度问题。Most of the current pedestrian re-identification methods use the fusion of global features and local features. Generally, feature pyramids are introduced to strengthen the fusion of high-level semantic features and low-level high-resolution features. However, in the process of fusion, different features are not considered during feature fusion. Contribution of Fusion.

发明内容SUMMARY OF THE INVENTION

基于以上现有技术的不足，本发明所解决的技术问题在于提供一种基于加权特征融合的行人重识别方法，通过ResNeSt-50网络提取特征，在加权特征金字塔网络中进行特征的融合，将高层语义特征语作为全局特征，将低层高分辨率特征作为局部特征；联合Softmax损失函数、三元组损失函数和中心损失函数分别对全局特征和局部特征进行优化训练，提高了行人识别的精度和准确率。Based on the above deficiencies of the prior art, the technical problem solved by the present invention is to provide a pedestrian re-identification method based on weighted feature fusion. Semantic feature words are used as global features, and low-level high-resolution features are used as local features. Combined with Softmax loss function, triple loss function and center loss function, the global and local features are optimized and trained respectively, which improves the accuracy and accuracy of pedestrian recognition. Rate.

为了解决上述技术问题，本发明通过以下技术方案来实现：In order to solve the above-mentioned technical problems, the present invention realizes through the following technical solutions:

本发明的基于加权特征融合的行人重识别方法，包括以下步骤：The pedestrian re-identification method based on weighted feature fusion of the present invention comprises the following steps:

步骤S1：输入行人图像，通过主干网络ResNeSt-50提取到行人的特征；Step S1: Input the pedestrian image, and extract the characteristics of the pedestrian through the backbone network ResNeSt-50;

步骤S2：将从主干网络获取的不同特征输入到加权特征融合金字塔模型中，得到多尺度的包含高层语义信息和低层高分辨率信息的特征；Step S2: Input different features obtained from the backbone network into the weighted feature fusion pyramid model to obtain multi-scale features including high-level semantic information and low-level high-resolution information;

步骤S3：在全局分支网络中将高层语义特征作为全局特征；在局部分支网络中将低层高分辨率特征作为局部特征；Step S3: use high-level semantic features as global features in the global branch network; use low-level high-resolution features as local features in the local branch network;

步骤S4：联合Softmax损失函数、三元组损失函数和中心损失函数分别对全局特征和局部特征进行优化训练。Step S4: Combine the Softmax loss function, the triplet loss function and the center loss function to optimize and train the global features and local features respectively.

优选的，在步骤S1中的行人图像会随机的进行随机擦除和水平翻转操作，增加了行人数据量；当使用ResNeSt-50网络提取行人特征时，会获得下采样3、4和5倍的特征图。Preferably, the pedestrian image in step S1 will be randomly erased and flipped horizontally, which increases the amount of pedestrian data; when using the ResNeSt-50 network to extract pedestrian features, downsampling 3, 4, and 5 times will be obtained. feature map.

可选的，在步骤2中，将步骤1中获得的下采样3、4和5倍的特征图作为输入到加权特征金字塔网络中，获得下采样3、4、5和6倍的经过加权融合的特征图。Optionally, in step 2, the feature maps downsampled by 3, 4, and 5 times obtained in step 1 are used as input to the weighted feature pyramid network to obtain weighted fusion of downsampling by 3, 4, 5, and 6 times. feature map.

可选的，在步骤3中，将步骤2中下采样5倍和6倍的特征图在通道维度上进行连接，再经过全局平均池化获得全局特征；下采样3倍和4倍的特征图在通道维度上进行连接，将特征图水平均匀的划分为4份，然后对这4个特征图分别进行广义均值池化，并将结果连接起来获得局部特征。Optionally, in step 3, the feature maps downsampled by 5 times and 6 times in step 2 are connected in the channel dimension, and then global features are obtained through global average pooling; the feature maps are downsampled by 3 times and 4 times. Connect in the channel dimension, divide the feature map horizontally into 4 parts, and then perform generalized mean pooling on these 4 feature maps respectively, and connect the results to obtain local features.

由上，本发明的基于加权特征融合的行人重识别方法至少具有如下有益效果：From the above, the pedestrian re-identification method based on weighted feature fusion of the present invention has at least the following beneficial effects:

1、使用ResNeSt-50网络提取特征，可以使网络更加关注到行人图像的重点区域，减少对非重点区域的关注；1. Using the ResNeSt-50 network to extract features can make the network pay more attention to the key areas of pedestrian images and reduce the attention to non-key areas;

2、使用带加权操作的特征金字塔网络，可以更好地获得更加丰富的层级特征信息，从而提高网络的泛化能力和鲁棒性；2. Using the feature pyramid network with weighted operation can better obtain richer hierarchical feature information, thereby improving the generalization ability and robustness of the network;

3、使用联合Softmax损失函数、三元组损失函数和中心损失函数的损失函数，可以更好的对模型进行优化。3. Using the loss function of the joint Softmax loss function, triple loss function and center loss function, the model can be better optimized.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其他目的、特征和优点能够更明显易懂，以下结合优选实施例，并配合附图，详细说明如下。The above description is only an overview of the technical solution of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present invention more obvious and easy to understand , the following detailed description is given in conjunction with the preferred embodiments and in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例的附图作简单地介绍。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the accompanying drawings of the embodiments will be briefly introduced below.

图1是ResNeSt-50分裂结构图；Fig. 1 is ResNeSt-50 splitting structure diagram;

图2是加权特征金字塔网络结构图；Fig. 2 is a weighted feature pyramid network structure diagram;

图3是分支网络结构图；Fig. 3 is a branch network structure diagram;

图4是本发明的基于加权特征融合的行人重识别方法的流程图。FIG. 4 is a flow chart of the pedestrian re-identification method based on weighted feature fusion of the present invention.

具体实施方式Detailed ways

下面结合附图详细说明本发明的具体实施方式，其作为本说明书的一部分，通过实施例来说明本发明的原理，本发明的其他方面、特征及其优点通过该详细说明将会变得一目了然。在所参照的附图中，不同的图中相同或相似的部件使用相同的附图标号来表示。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. As a part of this specification, the principles of the present invention will be illustrated by examples. Other aspects, features and advantages of the present invention will become apparent from the detailed description. In the figures to which reference is made, the same reference numerals are used for the same or similar parts in different figures.

下面参照图1-4对本发明的基于加权特征融合的行人重识别方法进行详细说明。本发明的基于加权特征融合的行人重识别方法包括以下步骤：The pedestrian re-identification method based on weighted feature fusion of the present invention will be described in detail below with reference to FIGS. 1-4 . The pedestrian re-identification method based on weighted feature fusion of the present invention comprises the following steps:

步骤1：输入行人图像，通过主干网络ResNeSt-50提取到行人的特征；Step 1: Input the pedestrian image, and extract the characteristics of the pedestrian through the backbone network ResNeSt-50;

步骤2：将从主干网络获取的不同特征输入到加权特征融合金字塔模型中，得到多尺度的包含高层语义信息和低层高分辨率信息的特征；Step 2: Input the different features obtained from the backbone network into the weighted feature fusion pyramid model to obtain multi-scale features containing high-level semantic information and low-level high-resolution information;

步骤3：在全局分支网络中将高层语义特征语作为全局特征；在局部分支网络中将低层高分辨率特征作为局部特征，分支网络图如图3所示；Step 3: In the global branch network, high-level semantic features are used as global features; in the local branch network, low-level high-resolution features are used as local features, and the branch network diagram is shown in Figure 3;

步骤4：联合Softmax损失函数、三元组损失函数和中心损失函数分别对全局特征和局部特征进行优化训练。Step 4: Combine the Softmax loss function, the triple loss function and the center loss function to optimize the training of global features and local features respectively.

具体的，所述步骤1的步骤如下：Specifically, the steps of step 1 are as follows:

步骤1.1：对行人图片随机的进行随机擦除和水平翻转，然后将图片的大小调整为384×128；Step 1.1: Randomly erase and flip the pedestrian image randomly, and then resize the image to 384×128;

步骤1.2：输入到ResNeSt-50网络中进行行人特征的提取，最终会提取到下采样3倍、下采样4倍和下采样5倍的特征，分别记为C₃、C₄和C₅，ResNeSt-50网络中分裂结构如图1所示，ResNeSt-50网络可以使网络更加关注到行人图像的重点区域，减少对非重点区域的关注。Step 1.2: Input into the ResNeSt-50 network to extract pedestrian features, and finally extract features with downsampling 3 times, downsampling 4 times, and downsampling 5 times, respectively denoted as C ₃ , C ₄ and C ₅ , ResNeSt The split structure in the -50 network is shown in Figure 1. The ResNeSt-50 network can make the network pay more attention to the key areas of pedestrian images and reduce the attention to non-key areas.

具体的，步骤2的步骤如下：Specifically, the steps of step 2 are as follows:

步骤2.1：将步骤1.2中获得C₃、C₄和C₅输入到加权特征金字塔网络中，加权特征金字塔网络如图2所示；Step 2.1: Input C ₃ , C ₄ and C ₅ obtained in step 1.2 into the weighted feature pyramid network, the weighted feature pyramid network is shown in Figure 2;

步骤2.2：C₃、C₄和C₅经过卷积操作后获得统一通道数为512的特征图P_{3_in}、P_{4_in}和P_{5_in}，然后对P_{5_in}进行最大池化操作，获取下采样6倍的特征图P_{6_in}。然后对P_{6_in}进行卷积操作获得特征图P_{6_td}，对P_{6_td}上采样后与P_{5_in}相加得到P_{5_td}，对P_{5_td}上采样后与P_{4_in}相加获得P_{4_td}，每一层级以此类推，最终获得特征图P_{6_out}、P_{5_out}、P_{4_out}和P_{3_out}。以P_{5_td}和P_{5_out}为例，运算公式如下所示：Step 2.2: C ₃ , C ₄ and C ₅ obtain feature maps P _{3_in} , P _{4_in} and P _{5_in} with a uniform channel number of 512 after the convolution operation, and then perform a maximum pooling operation on P _{5_in} to obtain 6 times of downsampling Feature map P _{6_in} . Then perform a convolution operation on P _{6_in} to obtain a feature map P _{6_td} , upsample P _{6_td} and add it to P _{5_in} to obtain P _{5_td} , upsample P _{5_td and add it to P 4_in} _to obtain P _{4_td} , and so on for each level , and finally obtain feature maps P _{6_out} , P _{5_out} , P _{4_out} and P _{3_out} . Taking P _{5_td} and P _{5_out} as an example, the operation formula is as follows:

其中ω_i是可学习的参数，0≤ω_i≤1，ε是一个非常小的值，主要作用是为了避免出现分母为0的情况。Where ω _i is a learnable parameter, 0≤ω _i ≤1, ε is a very small value, the main function is to avoid the situation where the denominator is 0.

具体的，所述步骤3的步骤如下：Specifically, the steps of step 3 are as follows:

步骤3.1：将P_{6_out}上采样成与P_{5_out}相同大小的特征图，然后在通道维度上与P_{5_out}进行连接，然后输入到全局平均池化(Global Average Pooling,GAP)中获得行人图像的全局特征。Step 3.1: _{Upsample P6_out} into a feature map of the same size as _{P5_out} , then connect it with _{P5_out} in the channel dimension, and then input it into Global Average Pooling (GAP) to obtain global features of pedestrian images .

步骤3.2：将P_{3_out}进行下采样得到与P_{4_out}相同大小的特征图，然后在通道维度中与P_{4_out}连接，获得局部特征图，然后将局部特征图沿水平方向均匀的划分为4份，然后对这4个特征图分别进行广义均值池化(Generalized Mean Pooling,GeM)，并将结果连接起来，最后经过一个1×1的卷积对特征图进行降维获得局部特征图。局部特征图中包含行人图像的高分辨率特征。在局部分支中，广义均值池化的公式为：Step 3.2: Downsample P _{3_out} to obtain a feature map of the same size as P _{4_out} , and then connect it with P _{4_out} in the channel dimension to obtain a local feature map, and then divide the local feature map into 4 evenly in the horizontal direction, and then Generalized Mean Pooling (GeM) is performed on the four feature maps respectively, and the results are connected. Finally, a 1×1 convolution is performed to reduce the dimension of the feature map to obtain a local feature map. The local feature maps contain high-resolution features of pedestrian images. In the local branch, the formula for generalized mean pooling is:

其中p_k为超参数，当p_k＝1时，f^(g)为平均池化；当p_k→∞时，f^(g)为最大池化。在训练的过程中，广义均值池化会自动学习超参数p_k，可以让模型更好捕获特征差异性。where p _k is a hyperparameter, f ^(g) is average pooling when p _k = 1, and f ^(g) is max pooling when p _k → ∞. During the training process, generalized mean pooling will automatically learn the hyperparameter p _k , which allows the model to better capture feature differences.

以上所述是本发明的优选实施方式而已，当然不能以此来限定本发明之权利范围，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和变动，这些改进和变动也视为本发明的保护范围。The above are only the preferred embodiments of the present invention, of course, the scope of the rights of the present invention cannot be limited by this. Several improvements and changes are made, and these improvements and changes are also regarded as the protection scope of the present invention.

Claims

1. A pedestrian re-identification method based on weighted feature fusion is characterized by comprising the following steps:

step S1: inputting a pedestrian image, and extracting the characteristics of the pedestrian through a backbone network ResNeSt-50;

step S2: inputting different features acquired from a backbone network into a weighted feature fusion pyramid model to obtain multi-scale features containing high-level semantic information and low-level high-resolution information;

step S3: taking the high-level semantic features as global features in the global branch network; taking the low-level high-resolution features as local features in the local branch network;

step S4: and respectively carrying out optimization training on the global features and the local features by combining the Softmax loss function, the triple loss function and the central loss function.

2. The pedestrian re-recognition method based on weighted feature fusion according to claim 1, wherein the pedestrian image in step S1 is randomly subjected to random erasing and horizontal flipping operations, increasing the pedestrian data amount; when extracting pedestrian features using the resenestt-50 network, downsampled 3, 4, and 5 times feature maps are obtained.

3. The pedestrian re-identification method based on weighted feature fusion as claimed in claim 2, wherein in step 2, the feature maps of the down-sampling 3, 4 and 5 times obtained in step 1 are input into the weighted feature pyramid network to obtain the weighted and fused feature maps of the down-sampling 3, 4, 5 and 6 times.

4. The pedestrian re-identification method based on the weighted feature fusion as claimed in claim 3, wherein in step 3, the feature maps down-sampled by 5 times and 6 times in step 2 are connected in channel dimension, and global features are obtained through global average pooling; and connecting the 3-time and 4-time downsampled feature maps in a channel dimension, dividing the feature maps into 4 parts horizontally and uniformly, then performing generalized mean pooling on the 4 feature maps respectively, and connecting the results to obtain local features.