CN113269226B

CN113269226B - Picture selection labeling method based on local and global information

Info

Publication number: CN113269226B
Application number: CN202110399472.9A
Authority: CN
Inventors: 王魏; 李文韬; 陈攀
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2022-09-23
Anticipated expiration: 2041-04-14
Also published as: CN113269226A

Abstract

The invention discloses a picture selection and marking method based on local and global information, which can learn a model as good as possible by using pictures with marks as less as possible by enabling a learning model to automatically select partial pictures for marking. In order to reduce the requirement of image marking, the method utilizes the feature extraction capability of the depth model to construct the feature representation space of the image sample, and the effect of the sample on model updating is measured based on the local information of the image sample in the feature representation space. Meanwhile, the picture data space is divided into different areas based on the global information of the feature representation space, and the labeling budget is dynamically allocated according to the performance of the model on the different areas, so that the picture marking information is efficiently utilized, and the demand of picture marking is reduced.

Description

An image selection and annotation method based on local and global information

技术领域technical field

本发明涉及一种基于局部与全局信息的图片选择标注方法，该方法能够利用特征表示空间的局部与全局信息高效地选择图片数据库中需要标注的对象，以较少的标注成本训练较好的图片分类模型，属于计算机人工智能数据分析技术领域。The invention relates to a picture selection and labeling method based on local and global information. The method can use the local and global information of the feature representation space to efficiently select objects to be labelled in a picture database, and train better pictures with less labeling cost. The classification model belongs to the technical field of computer artificial intelligence data analysis.

背景技术Background technique

随着互联网的不断发展，有大量的图片数据需要进行处理，例如人脸识别中的人脸图片，自动驾驶中的道路图片，电商平台上的商品图片等等。图片数据结构较为复杂，因此经常要用深度模型完成图片分类任务。但是训练深度模型需要大量的有标记图片。通常情况下，标注这些图片需要花费大量的人力物力，代价昂贵。为了降低标注成本，提高有标记图片利用效率，一种解决方法是让模型自动地选择需要标注的重要图片，并收集这些图片的标记用于更新模型，这便是选择标注的基本思想。目前的选择标注方法在衡量数据的重要程度时，主要考虑数据的不确定性与代表性。其中模型对于数据的预测结果的置信度越低，数据的不确定性越高。另外数据梯度的模长也可以用于估计数据的不确定性。由于基于不确定性的方法只考虑了单个数据的不确定程度，因此模型容易挑选出一批不确定性高但冗余的数据。通过考虑数据的代表性可以一定程度上缓解这一问题。基于代表性的方法一般将数据的特征聚成几个聚簇，选取每个聚簇的中心点作为该聚簇的代表。这样只需利用少量的数据即可刻画整个数据的分布情况。但是这种方法由于没有模型的信息作为指导，选取的数据不一定有利于模型的更新。With the continuous development of the Internet, a large amount of picture data needs to be processed, such as face pictures in face recognition, road pictures in automatic driving, and commodity pictures on e-commerce platforms. The image data structure is relatively complex, so deep models are often used to complete image classification tasks. But training deep models requires a large number of labeled images. Usually, labeling these images requires a lot of manpower and material resources and is expensive. In order to reduce the cost of labeling and improve the efficiency of using labeled pictures, one solution is to let the model automatically select important pictures that need to be labeled, and collect the labels of these pictures to update the model. This is the basic idea of choosing labels. The current selection and labeling methods mainly consider the uncertainty and representativeness of the data when measuring the importance of the data. The lower the confidence of the model in the prediction result of the data, the higher the uncertainty of the data. In addition, the modulo length of the data gradient can also be used to estimate the uncertainty of the data. Since the uncertainty-based method only considers the degree of uncertainty of a single data, it is easy for the model to pick out a batch of data with high uncertainty but redundant. This problem can be mitigated to some extent by considering the representativeness of the data. The representation-based method generally aggregates the features of the data into several clusters, and selects the center point of each cluster as the representative of the cluster. In this way, only a small amount of data can be used to describe the distribution of the entire data. However, since this method does not have model information as a guide, the selected data is not necessarily conducive to model updating.

发明内容SUMMARY OF THE INVENTION

发明目的：针对现有技术中存在的问题与不足，本发明提供一种基于局部与全局信息的图片选择标注方法。该方法能够利用图片特征表示空间中的局部信息，结合模型的预测结果，衡量图片的信息量，能够在一定程度上避免相似或冗余的图片。同时还结合了特征表示空间的全局信息，将图片数据分为多个聚簇，根据模型在不同聚簇上的性能动态地分配标注预算，进一步提高图片标记的利用效率，减少标注成本。在利用相同数目的有标记图片时，通过该方法训练的模型相较于一般的选择标注方法具有更好的性能。Purpose of the invention: In view of the problems and deficiencies in the prior art, the present invention provides a picture selection and annotation method based on local and global information. This method can use the image features to represent the local information in the space, and combine the prediction results of the model to measure the information content of the image, and can avoid similar or redundant images to a certain extent. At the same time, combined with the global information of the feature representation space, the image data is divided into multiple clusters, and the labeling budget is dynamically allocated according to the performance of the model on different clusters, which further improves the utilization efficiency of image labeling and reduces the labeling cost. When using the same number of labeled images, the model trained by this method has better performance than the general selective labeling method.

技术方案：一种基于局部与全局信息的图片选择标注方法，包括如下内容：Technical solution: a method for selecting and labeling pictures based on local and global information, including the following:

首先需要用户建立一个图片对象库。接下来从图片对象库中随机选取部分图片对象，获取这些图片对象的标记，组成初始训练集。用户设置深度模型的结构、每一轮选取的图片对象数目、迭代总轮数。First, the user needs to create a picture object library. Next, some image objects are randomly selected from the image object library, and the labels of these image objects are obtained to form an initial training set. The user sets the structure of the depth model, the number of image objects selected in each round, and the total number of iterations.

接着，基于训练集训练深度学习模型。利用深度模型将图片对象库中的图片对象转换为特征表示，即提取图片对象库中图片的特征。其中深度模型的倒数第二层的输出常常作为相应图片对象的特征表示。这些特征表示组成的空间被称为特征表示空间。Next, the deep learning model is trained based on the training set. The image objects in the image object library are converted into feature representations by using the deep model, that is, the features of the pictures in the image object library are extracted. The output of the penultimate layer of the deep model is often used as the feature representation of the corresponding image object. The space composed of these feature representations is called the feature representation space.

然后，在特征表示空间中，根据局部信息计算方法估计每个对象的信息量，并按照全局信息预算分配方法分配标注预算。基于该预算，挑选出信息量高的一批图片对象，收集这些图片对象的标记。更新有标记图片对象集合和未标记图片对象集合。同时，使用有标记图片对象集合重新训练深度模型，利用新模型重新提取图片对象的特征表示。将这些步骤依次迭代指定轮数。最后一轮的模型即为最终的深度模型。Then, in the feature representation space, the information amount of each object is estimated according to the local information calculation method, and the labeling budget is allocated according to the global information budget allocation method. Based on the budget, a batch of image objects with high information content is selected, and the tags of these image objects are collected. Update the marked image object collection and the unmarked image object collection. At the same time, the deep model is retrained using the set of labeled image objects, and the feature representation of the image objects is re-extracted using the new model. These steps are iterated sequentially for the specified number of rounds. The model in the last round is the final deep model.

最后在预测阶段，用户将待测图片对象输入到训练得到的深度模型，深度模型给用户返回预测结果。Finally, in the prediction stage, the user inputs the image object to be tested into the trained deep model, and the deep model returns the prediction result to the user.

有益效果：与现有的技术相比，本发明通过结合特征表示空间中的局部与全局信息，通过考虑了图片对象的局部信息避免选取出冗余图片，通过特征表示空间的全局信息按需分配预算，提高了图片数据标记的利用效率，减少了标注成本。Beneficial effects: Compared with the prior art, the present invention avoids selecting redundant pictures by combining the local and global information in the feature representation space, and avoids selecting redundant pictures by considering the local information of the picture object, and assigns the global information of the feature representation space as needed. budget, improve the utilization efficiency of image data tagging, and reduce the cost of tagging.

附图说明Description of drawings

图1是本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2是本发明中局部信息计算方法的流程图；Fig. 2 is the flow chart of local information calculation method in the present invention;

图3是本发明中全局信息预算分配方法的流程图。FIG. 3 is a flow chart of the global information budget allocation method in the present invention.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐明本发明，应理解这些实施例仅用于说明本发明而不用于限制本发明的范围，在阅读了本发明之后，本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Below in conjunction with specific embodiments, the present invention will be further illustrated, and it should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. The modifications all fall within the scope defined by the appended claims of this application.

如图1所示，基于局部与全局信息的图片选择标注方法，包括如下步骤：As shown in Figure 1, the image selection and annotation method based on local and global information includes the following steps:

步骤100，建立一个图片对象库作为数据集，从图片对象库中随机选取少量对象，获取这些对象的标记，组成初始训练集。记图片对象库中的数据的类别数为C。

表示有标记图片对象组成的集合，

表示未标记图片对象组成的集合；In step 100, a picture object library is established as a data set, a small number of objects are randomly selected from the picture object library, and the labels of these objects are obtained to form an initial training set. Note that the number of categories of data in the image object library is C.

Represents a collection of labeled image objects,

Represents a collection of unlabeled image objects;

步骤101，用户选择要使用的深度模型，记该模型为f(·；Θ)，其中

为模型参数，

为模型的全连接层参数，θ为模型中的其他参数，用户选择每一轮选取的样本数目B与迭代总轮数T；Step 101, the user selects the depth model to be used, and denote the model as f(·;Θ), where

are model parameters,

is the fully connected layer parameter of the model, θ is other parameters in the model, the user selects the number of samples B selected in each round and the total number of iterations T;

步骤102，使用有标记的图片对象

训练深度模型，当前轮数t＝1；Step 102, use the marked picture object

Train the deep model, the current number of rounds t=1;

步骤103，将未标记图片对象输入到深度模型中，根据深度模型提取图片对象的特征表示r_θ(x)与softmax层输出f(x；Θ)；Step 103, input the unlabeled picture object into the depth model, and extract the feature representation r _θ (x) and softmax layer output f (x; Θ) of the picture object according to the depth model;

步骤104，根据局部信息计算方法估计每个对象为模型提供的信息量，如图2所示，其具体步骤为：Step 104, according to the local information calculation method, estimate the amount of information provided by each object for the model, as shown in Figure 2, the specific steps are:

步骤1041，由用户选定局部近邻区域的范围∈；Step 1041, the user selects the range ∈ of the local neighborhood area;

步骤1042，对于未标记图片对象x，其softmax层输出为f(x；Θ)＝(p₁，...，p_C)，

模型f(x；Θ)预测的标记为

为了增加鲁棒性，进行如下的概率平滑：Step 1042, for the unlabeled image object x, the output of the softmax layer is f(x; Θ)=(p ₁ , . . . , p _C ),

The labels predicted by the model f(x;Θ) are

To increase robustness, perform the following probabilistic smoothing:

其中g(x；Θ)＝(g(x；Θ)₁，…，g(x；Θ)_C)；where g(x; Θ)=(g(x; Θ) ₁ , . . . , g(x; Θ) _C );

步骤1043，对于未标记图片对象

与

基于平滑后的概率计算信息量Step 1043, for unmarked picture objects

and

Calculate the amount of information based on the smoothed probability

步骤1044，记图片对象x的近邻区域为

其中r_θ(x)为图片对象x的特征表示，图片对象x的信息量为

Step 1044, record the adjacent area of the image object x as

where r _θ (x) is the feature representation of the image object x, and the amount of information of the image object x is

步骤1045，为所有的未标记图片对象

计算

并输出。Step 1045, for all unlabeled picture objects

calculate

and output.

步骤105，根据全局信息预算分配方法将未标记数据在特征表示空间内聚成C个聚簇，在不同聚簇中分配预算(B₁，…，B_C)，其中B_j为分配给第j个聚簇的标记预算。如图3所示，其具体步骤为：Step 105: According to the global information budget allocation method, the unlabeled data is clustered into C clusters in the feature representation space, and the budgets (B ₁ , . . . , B _C ) are allocated in different clusters, where B _j is allocated to the jth The tag budget for each cluster. As shown in Figure 3, the specific steps are:

步骤1051，由用户选定Gibbs分布的温度参数τ；Step 1051, the temperature parameter τ of the Gibbs distribution is selected by the user;

步骤1052，使用kmeans++方法把未标记图片对象的特征表示聚成C个聚簇，第j个聚簇中的图片对象组成的集合为

Step 1052, use the kmeans++ method to gather the feature representations of the unlabeled picture objects into C clusters, and the set of picture objects in the jth cluster is:

步骤1053，估计模型在不同聚簇上的性能，将其记模型在第j个聚簇上的表现为γ_j Step 1053, estimate the performance of the model on different clusters, and denote the performance of the model on the jth cluster as γ _j

步骤1054，根据γ_j构建预算的Gibbs分布α＝(α₁，…，α_C)Step 1054, construct a Gibbs distribution of budget α=(α ₁ , . . . , α _C ) according to γ _j

其中∑_j α_j＝1，τ为温度参数用于调节Gibbs分布的平滑程度；where ∑ _j α _j =1, τ is the temperature parameter used to adjust the smoothness of the Gibbs distribution;

步骤1055，根据Gibbs分布α进行B次采样，得到各个聚簇中所分配的预算(B₁，…，B_C)并输出，其中∑_j B_j＝B，B为总的标记预算；Step ₁₀₅₅ , _perform B times of sampling according to the Gibbs distribution α, obtain the budgets ( _B ₁ , .

步骤106，在每个聚簇中，根据相应的预算B_j j∈[C]，选取信息量的最高的B_j个图片对象，获取这些图片对象的标记，将这些图片对象加入有标记对象集合

中，更新

与

重新训练深度模型；Step 106: In each cluster, according to the corresponding budget B _j j∈[C], select B _j picture objects with the highest amount of information, obtain the marks of these picture objects, and add these picture objects to the set of marked objects.

medium, update

and

Retrain the deep model;

步骤107，如果t＜T，则t＝t+1，跳转到步骤103；Step 107, if t<T, then t=t+1, jump to step 103;

步骤108，用第T轮训练得到的模型作为最终的模型。对于待测对象，将模型预测的标记输出。Step 108, using the model obtained by the T-th round of training as the final model. For the object to be tested, output the label predicted by the model.

Claims

1. a picture selection labeling method based on local and global information, is characterized in that, comprises the following content:

First build a picture object library; then randomly select some picture objects from the picture object library, obtain the labels of these picture objects, and form an initial training set; set the structure of the deep model, the number of picture objects selected in each round, and the total number of iterations. number;

Next, the deep learning model is trained based on the training set; the image objects in the image object library are converted into feature representations by using the deep model, that is, the features of the pictures in the image object library are extracted; the space formed by the feature representation is called the feature representation space;

Then, in the feature representation space, the information amount of each object is estimated according to the local information calculation method, and the labeling budget is allocated according to the global information budget allocation method; Labeling of picture objects; update the set of labeled picture objects and the set of unlabeled picture objects; at the same time, use the set of labeled picture objects to retrain the deep model, and use the new model to re-extract the feature representation of picture objects; iterate the specified number of rounds; the last round The model is the final deep model;

Finally, in the prediction stage, the user inputs the image object to be tested into the trained deep model, and the deep model returns the prediction result to the user;

Note that the number of categories of data in the image object library is C;

Represents a collection of labeled image objects,

Represents a collection of unlabeled image objects; the selected depth model is denoted as f( ;Θ), where

are model parameters,

is the fully connected layer parameter of the model, θ is other parameters in the model, the user selects the number of samples B selected in each round and the total number of iterations T; use marked image objects

Training the depth model, the current number of rounds t=1; input the unlabeled image object into the depth model, and extract the feature representation of the image object according to the depth model r _θ (x) and the softmax layer output f (x; Θ);

Using probability smoothing and local information to calculate the amount of object information, the specific steps are:

Step 1041, select the range ∈ of the local neighborhood area;

Step 1042, for the unlabeled image object x, the output of the softmax layer is f(x; Θ)=(p ₁ ,...,p _C ),

The labels predicted by the model f(x;Θ) are

Probabilistic smoothing is performed as follows:

where g(x; Θ)=(g(x; Θ) ₁ ,...,g(x; Θ) _C );

Step 1043, for unmarked picture objects

and

Calculate the amount of information based on the smoothed probability

Step 1044, record the adjacent area of the image object x as

Step 1045, for all unlabeled picture objects

calculate

and output;

According to the global information budget allocation method, the unlabeled data is clustered into C clusters in the feature representation space, and the budgets (B ₁ ,...,B _C ) are allocated in different clusters, where B _j is allocated to the jth cluster , the specific steps are:

Step 1051, the temperature parameter τ of the Gibbs distribution is selected by the user;

Step 1053, estimate the performance of the model on different clusters, and denote the performance of the model on the jth cluster as γ _j

Step 1054, construct the Gibbs distribution of budget α=(α ₁ ,...,α _C ) according to γ _j

τ is the temperature parameter used to adjust the smoothness of the Gibbs distribution;

Step ₁₀₅₅ : _Perform B sampling according to the Gibbs distribution α to obtain and output the allocated budgets ( _B ₁ , .