CN117765534A

CN117765534A - Automatic traffic image labeling method and device based on difficult sample mining

Info

Publication number: CN117765534A
Application number: CN202311743990.3A
Authority: CN
Inventors: 刘政; 朱永东; 刘云涛; 李道勋; 黄倩; 伍召举; 张文佳; 吴自勉
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-09-11
Filing date: 2023-12-18
Publication date: 2024-03-26

Abstract

The invention discloses a traffic image automatic labeling method and device based on difficult sample mining, which trains an initial labeling model on a general traffic image data set; training a difficult sample annotation model based on the initial annotation model and the universal traffic image data set; video stream data acquired by the access terminal are subjected to frame extraction operation; drawing frames in video frames to obtain different image samples, and marking the image samples as difficult image samples and storing the difficult image samples when a difficult sample marking model detects undetected difficult marking areas; the marking capacity of the difficult sample marking model is enhanced by properly increasing the duty ratio of the difficult image sample in the difficult sample data set until the accuracy exceeds 95%, and the difficult sample marking model is used as a final sample marking model; the invention reduces the screening cost of difficult image samples in traffic images. And the difficult sample labeling model is continuously optimized, so that the difficult image sample screening efficiency is improved. Finally, the model precision of the traffic image annotation is improved, and manpower and material resources required by manual annotation are greatly saved.

Description

Traffic image automatic annotation method and device based on difficult sample mining

技术领域Technical field

本发明涉及交通图像标注技术领域，尤其涉及一种基于困难样本挖掘的交通图像自动标注方法和装置。The present invention relates to the technical field of traffic image annotation, and in particular to an automatic annotation method and device for traffic images based on difficult sample mining.

背景技术Background technique

随着人工智能在科技进步和社会发展中扮演者越来越重要的角色，人们把目光投向了自动驾驶上。而无论是L2级别辅助驾驶还是L5级别的完全自动化驾驶，都需要目标检测算法的支持。而一个优秀的目标检测算法的背后不仅需要大量的交通样本数据来支撑的，而且需要对困难检测场景有比较好的鲁棒性。样本数据从学习的难易程度上可以分为简单样本和困难样本。通过适当增大困难样本在新构建的数据集中占比，从而增强标注模型对困难样本的标注能力，可以在很大程度上提高模型的模型精度，具有重大意义。As artificial intelligence plays an increasingly important role in technological progress and social development, people are turning their attention to autonomous driving. Whether it is L2-level assisted driving or L5-level fully automated driving, it requires the support of target detection algorithms. An excellent target detection algorithm not only requires a large amount of traffic sample data to support it, but also needs to have relatively good robustness to difficult detection scenarios. Sample data can be divided into simple samples and difficult samples according to the difficulty of learning. By appropriately increasing the proportion of difficult samples in the newly constructed data set, thereby enhancing the annotation model's ability to label difficult samples, the model accuracy of the model can be improved to a large extent, which is of great significance.

目前各种深度学习网络使用的数据集主要来源于公开数据集或者自采数据集。公开数据集很难满足实际场景中的训练样本要求，使用自采数据集的话，如果使用人工标注的话则会耗费大量人力物力，尤其是对困难样本的标注上。Currently, the data sets used by various deep learning networks mainly come from public data sets or self-collected data sets. It is difficult for public data sets to meet the training sample requirements in actual scenarios. If you use self-collected data sets, manual labeling will consume a lot of manpower and material resources, especially for labeling difficult samples.

发明内容Contents of the invention

本发明的目的在于针对现有技术的不足，提供了一种基于困难样本挖掘的交通图像自动标注方法和装置。The purpose of the present invention is to provide an automatic annotation method and device for traffic images based on difficult sample mining in view of the shortcomings of the existing technology.

本发明的目的是通过以下技术方案来实现的：一种基于困难样本挖掘的交通图像自动标注方法，包括以下步骤：The purpose of the present invention is achieved through the following technical solutions: an automatic annotation method for traffic images based on difficult sample mining, including the following steps:

(1)获得SODA 10M已标注数据集并随机等分成两份，其中一份数据集被划分为初始标注训练集、初始标注验证集和初始标注测试集，以初始标注训练集为输入对Faster R-CNN神经网络进行训练得到初始标注模型；(1) Obtain the SODA 10M labeled data set and randomly divide it into two parts. One of the data sets is divided into the initial labeled training set, the initial labeled verification set and the initial labeled test set. The initial labeled training set is used as the input pair of Faster R -CNN neural network is trained to obtain the initial annotation model;

(2)使用初始标注模型对SODA 10M已标注数据集的另一份数据集进行划分，得到简单样本集合和困难样本集合；通过简单样本集合和困难样本集合构建困难样本数据集并用于对Faster R-CNN神经网络进行训练得到困难样本标注模型；(2) Use the initial labeling model to divide another data set of the SODA 10M labeled data set to obtain a simple sample set and a difficult sample set; construct a difficult sample data set through the simple sample set and the difficult sample set and use it to test Faster R -CNN neural network is trained to obtain a difficult sample labeling model;

(3)采集汽车终端的视频流数据并进行抽帧操作，得到图像数据集；(3) Collect video stream data from car terminals and perform frame extraction operations to obtain image data sets;

(4)使用初始标注模型和困难样本标注模型对图像数据集中每一张图像进行检测：当未检测出图像中包括困难正样本目标时，将该图像记为困难图像样本，并放入困难图像样本集合；反之记为简单图像样本，并放入简单图像样本集合；(4) Use the initial annotation model and the difficult sample annotation model to detect each image in the image data set: when the image is not detected to include a difficult positive sample target, the image is recorded as a difficult image sample and placed in the difficult image Sample set; otherwise, record it as a simple image sample and put it into the simple image sample set;

(5)当困难图像样本集合中的图像样本数量达到N时，通过困难图像样本集合和简单图像样本集合对困难样本数据集进行更新，得到更新后的困难样本数据集并输入用于更新困难样本标注模型；(5) When the number of image samples in the difficult image sample set reaches N, the difficult sample data set is updated through the difficult image sample set and the simple image sample set, and the updated difficult sample data set is obtained and input to update the difficult sample label model;

(6)重复步骤(3)-步骤(5)，直到困难样本标注模型的准确率超过95％，将更新后的困难样本标注模型作为最终的样本标注模型；(6) Repeat steps (3) to (5) until the accuracy of the difficult sample labeling model exceeds 95%, and use the updated difficult sample labeling model as the final sample labeling model;

(7)将待测交通图像输入到最终的样本标注模型中，输出待测交通图像的目标检测框集合作为标注结果。(7) Input the traffic image to be tested into the final sample annotation model, and output the target detection frame set of the traffic image to be tested as the annotation result.

进一步地，所述步骤(1)具体包括以下子步骤：Further, the step (1) specifically includes the following sub-steps:

(1.1)从SODA 10M数据集中随机选取包括2N个已标注数据的交通图像的SODA 10M已标注数据集；(1.1) Randomly select the SODA 10M labeled data set including 2N traffic images with labeled data from the SODA 10M data set;

(1.2)将SODA 10M已标注数据集随机等分成两份：第一SODA 10M已标注数据集和第二SODA 10M已标注数据集；从第一SODA10M已标注数据集中筛选出11种场景的交通图像，每种场景下的交通图像的数量相同，构建得到初始标注数据集；并按照7:2:1的比例将初始标注数据集划分为初始标注训练集、初始标注验证集和初始标注测试集；(1.2) The SODA 10M labeled data set is randomly divided into two parts: the first SODA 10M labeled data set and the second SODA 10M labeled data set; traffic images of 11 scenes are screened out from the first SODA 10M labeled data set. , the number of traffic images in each scenario is the same, and the initial annotation data set is constructed; and the initial annotation data set is divided into an initial annotation training set, an initial annotation verification set and an initial annotation test set in a ratio of 7:2:1;

(1.3)构建Faster R-CNN神经网络：采用卷积网络VGG16作为原始网络，VGG16网络包括13个共享基础卷积层、5个最大池化层、3个全连接层、1个softmax层，在VGG16网络中嵌入RPN层和ROI池化层，得到构建的Faster R-CNN网络模型；用高度、宽度、深度表示每张交通图像的不同张量形式，通过共享基础卷积层提取交通图像的卷积特征图，然后使用区域检测框生成网络提取卷积特征图上多个包含感兴趣对象的目标检测框；随后结合卷积特征图和多个目标检测框进行感兴趣区域池化处理，得到每个目标检测框上的卷积特征图并传递给全连接层，最后通过全连接层计算得到每个目标检测框的类别，并获取每个目标检测框在原始交通图像中最终的精确位置；(1.3) Construct Faster R-CNN neural network: Use the convolutional network VGG16 as the original network. The VGG16 network includes 13 shared basic convolutional layers, 5 maximum pooling layers, 3 fully connected layers, and 1 softmax layer. The RPN layer and ROI pooling layer are embedded in the VGG16 network to obtain the constructed Faster R-CNN network model; height, width, and depth are used to represent different tensor forms of each traffic image, and the volume of the traffic image is extracted by sharing the basic convolution layer. Convolution feature map, and then use the region detection frame generation network to extract multiple target detection frames containing objects of interest on the convolution feature map; then combine the convolution feature map and multiple target detection frames for region of interest pooling processing to obtain each The convolutional feature map on each target detection frame is passed to the fully connected layer. Finally, the category of each target detection frame is calculated through the fully connected layer, and the final precise position of each target detection frame in the original traffic image is obtained;

(1.4)以初始标注训练集为输入对Faster R-CNN神经网络进行训练，得到初始标注模型。(1.4) Use the initial annotation training set as input to train the Faster R-CNN neural network to obtain the initial annotation model.

进一步地，所述11种场景包括晴天场景、阴天场景、雨天场景、城市街道场景、高速公路场景、乡村道路场景、住宅区场景、白天场景、夜间场景、黎明场景和黄昏场景。Further, the 11 scenes include sunny scenes, cloudy scenes, rainy scenes, city street scenes, highway scenes, country road scenes, residential area scenes, day scenes, night scenes, dawn scenes and dusk scenes.

进一步地，所述步骤(2)具体包括以下子步骤：Further, the step (2) specifically includes the following sub-steps:

(2.1)使用初始标注模型对第二SODA10M已标注数据集中每一张图片m_i进行推理，得到每一张图片m_i的推理结果n_i；(2.1) Use the initial annotation model to infer each image m _i in the second SODA10M annotated data set, and obtain the inference result n _i for each image m _i ;

(2.2)对第二SODA10M已标注数据集中任意一张图片m_i，将图片m_i的推理结果n_i和真实标注结果N_i进行对比，若推理结果n_i包含未标记的目标区域，或者有标记的目标区域和对应真实标注结果N_i的IOU交并比小于0.5，则将图片m_i记为困难样本，并放入困难样本集合；反之记为简单样本，并放入简单样本集合；(2.2) For any image m _i in the second SODA10M labeled data set, compare the inference result n _i of image m _i with the real annotation result N _i . If the inference result n _i contains an unmarked target area, or there is If the IOU intersection ratio between the marked target area and the corresponding real annotation result N _i is less than 0.5, the image m _i will be recorded as a difficult sample and placed in the difficult sample set; otherwise, it will be recorded as a simple sample and placed in the simple sample set;

(2.3)对第二SODA 10M已标注数据集中每一张图片重复步骤(2.2)；(2.3) Repeat step (2.2) for each picture in the second SODA 10M annotated data set;

(2.4)从简单样本集合中随机选出部分简单样本，并从困难样本集合中随机选出部分困难样本一起作为困难样本数据集，其中，简单样本和困难样本的比例为2:1；再将困难样本数据集按照7:2:1的比例划分为困难样本训练集、困难样本验证集和困难样本测试集；(2.4) Randomly select some simple samples from the simple sample set, and randomly select some difficult samples from the difficult sample set together as a difficult sample data set, where the ratio of simple samples to difficult samples is 2:1; then The difficult sample data set is divided into a difficult sample training set, a difficult sample verification set and a difficult sample test set in a ratio of 7:2:1;

(2.5)训练困难样本标注模型：基于困难样本数据集和Faster R-CNN神经网络训练得到困难样本标注模型。(2.5) Train the difficult sample labeling model: The difficult sample labeling model is trained based on the difficult sample data set and Faster R-CNN neural network.

进一步地，所述步骤(3)具体为：使用MQTT通讯协议连接汽车终端设备，采集汽车终端设备的视频流数据，并上传至SRS视频服务器中，并对视频流进行抽帧操作，采集频率为0.1秒/张，转换为多张图像数据，得到图像数据集。Further, the step (3) is specifically: using the MQTT communication protocol to connect to the automotive terminal equipment, collecting the video stream data of the automotive terminal equipment, and uploading it to the SRS video server, and performing frame extraction operations on the video stream, with a collection frequency of 0.1 seconds/image, converted into multiple image data, and obtained image data set.

进一步地，所述步骤(4)具体包括以下子步骤：Further, the step (4) specifically includes the following sub-steps:

(4.1)对于图像数据集中每一张图像M_j，通过初始标注模型推理得到图像M_j的初始目标检测框集合D_j：(4.1) For each image M _j in the image data set, the initial target detection frame set D _j of the image M _j is obtained through the initial annotation model inference:

其中，a_j表示初始目标检测框集合D_j中总共a_j个目标检测框，i＝1,2,…,i,…,a_j；表示初始目标检测框集合D_j中第i个目标检测框的位置，/>表示第i个目标检测框左上角的坐标，/>表示第i个目标检测框右下角的坐标；Among them, a_j represents a total of a_j target detection frames in the initial target detection frame set D _j , i=1,2,…,i,…,a_j; Represents the position of the i-th target detection frame in the initial target detection frame set D _j , /> Represents the coordinates of the upper left corner of the i-th target detection frame, /> Represents the coordinates of the lower right corner of the i-th target detection frame;

(4.2)并通过困难样本标注模型推理得到图像M_j的困难目标检测框集合R_j：(4.2) and obtain the difficult target detection box set R _j of image M _j through difficult sample annotation model reasoning:

其中，b_j表示困难目标检测框集合R_j中总共b_j个目标检测框，h＝1,2,…,h,…,b_j；表示困难目标检测框集合R_j中第h个目标检测框的位置，/>表示第h个目标检测框左上角的坐标，/>表示第h个目标检测框右下角的坐标；Among them, b_j represents a total of b_j target detection frames in the difficult target detection frame set R _j , h=1,2,…,h,…,b_j; Represents the position of the h-th target detection frame in the difficult target detection frame set R _j , /> Represents the coordinates of the upper left corner of the h-th target detection frame,/> Represents the coordinates of the lower right corner of the h-th target detection frame;

(4.3)若图像M_j的困难目标检测框集合R_j中存在一个元素满足初始目标检测框集合D_j中所有元素和元素/>的交并比同时低于0.5时，将图像M_j记为困难图像样本，并放入困难图像样本集合；反之记为简单图像样本，并放入简单图像样本集合。(4.3) If there is an element in the difficult target detection frame set R _j of image M _j Satisfies all elements and elements in the initial target detection frame set D _j /> When the intersection ratio is lower than 0.5 at the same time, the image M _j is recorded as a difficult image sample and placed in the difficult image sample set; otherwise, it is recorded as a simple image sample and placed in the simple image sample set.

进一步地，所述步骤(5)具体包括以下子步骤：Further, the step (5) specifically includes the following sub-steps:

(5.1)当困难图像样本集合中的图像样本数量达到N时，从简单图像样本集合中随机选出部分简单图像样本，并从困难图像样本集合中随机选出部分困难图像样本一起作为第二困难图像样本集合，其中，简单图像样本和困难图像样本的的比例为2:1；(5.1) When the number of image samples in the difficult image sample set reaches N, some simple image samples are randomly selected from the simple image sample set, and some difficult image samples are randomly selected from the difficult image sample set as the second difficulty Image sample set, in which the ratio of simple image samples and difficult image samples is 2:1;

(5.2)将第二困难图像样本集合和困难样本数据集进行合并得到更新后的困难样本数据集，并按照7:2:1的比例划分为困难图像样本训练集、困难图像样本验证集和困难图像样本测试集；(5.2) Merge the second difficult image sample set and the difficult sample data set to obtain the updated difficult sample data set, and divide it into a difficult image sample training set, a difficult image sample verification set and a difficult image sample set in a ratio of 7:2:1. Image sample test set;

(5.3)基于新的困难样本数据集对Faster R-CNN神经网络重新训练，得到更新后的困难样本标注模型。(5.3) Retrain the Faster R-CNN neural network based on the new difficult sample data set to obtain an updated difficult sample labeling model.

本发明还提供一种基于困难样本挖掘的交通图像自动标注装置，包括一个或多个处理器，用于实现上述基于困难样本挖掘的交通图像自动标注方法。The present invention also provides an automatic annotation device for traffic images based on difficult sample mining, which includes one or more processors for implementing the above automatic annotation method for traffic images based on difficult sample mining.

本发明还提供一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，用于实现上述基于困难样本挖掘的交通图像自动标注方法。The present invention also provides a computer-readable storage medium on which a program is stored. When the program is executed by a processor, it is used to implement the above-mentioned automatic annotation method of traffic images based on difficult sample mining.

本发明的有益效果是：本发明通过构建初始标注模型和困难样本标注模型，能自动判定交通图像是否为困难图像样本，同时能控制困难样本数据集中的困难图像样本和简单图像样本的比例，通过适当增大困难图像样本在困难样本数据集中的占比，从而达到增强困难样本标注模型对困难图像样本的标注能力，逐步提升模型的精度,直到达到指定精度要求，得到最终的样本标注模型。The beneficial effects of the present invention are: by constructing an initial labeling model and a difficult sample labeling model, the present invention can automatically determine whether a traffic image is a difficult image sample, and at the same time can control the ratio of difficult image samples and simple image samples in the difficult sample data set. Appropriately increase the proportion of difficult image samples in the difficult sample data set, thereby enhancing the ability of the difficult sample labeling model to label difficult image samples, and gradually improving the accuracy of the model until the specified accuracy requirements are reached, and the final sample labeling model is obtained.

附图说明Description of the drawings

图1为一种基于困难样本挖掘的交通图像自动标注方法的整体流程图；Figure 1 is an overall flow chart of an automatic annotation method for traffic images based on difficult sample mining;

图2为实施例1中步骤(4)的流程示意图；Figure 2 is a schematic flow chart of step (4) in Embodiment 1;

图3为使用初始标注模型的标注结果示例图；Figure 3 is an example of annotation results using the initial annotation model;

图4为使用困难样本标注模型标注结果示例图；Figure 4 is an example of annotation results using a difficult sample annotation model;

图5为一种基于困难样本挖掘的交通图像自动标注装置的结构示意图。Figure 5 is a schematic structural diagram of an automatic annotation device for traffic images based on difficult sample mining.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加明白清楚，结合附图和实施例，对本发明进一步的详细说明，应当理解，此处所描述的具体实施例仅仅用以解释本发明，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，均在本发明保护范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail in conjunction with the accompanying drawings and examples. It should be understood that the specific embodiments described here are only used to explain the present invention and are not exhaustive. Example. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts are within the protection scope of the present invention.

实施例1Example 1

如图1所示，本发明提供了一种基于困难样本挖掘的交通图像自动标注方法，包括以下步骤：As shown in Figure 1, the present invention provides an automatic annotation method for traffic images based on difficult sample mining, which includes the following steps:

(1)获得SODA 10M已标注数据集并随机等分成两份，其中一份数据集被划分为初始标注训练集、初始标注验证集和初始标注测试集，以初始标注训练集为输入对Faster R-CNN神经网络进行训练得到初始标注模型。(1) Obtain the SODA 10M labeled data set and randomly divide it into two parts. One of the data sets is divided into the initial labeled training set, the initial labeled verification set and the initial labeled test set. The initial labeled training set is used as the input pair of Faster R -CNN neural network is trained to obtain the initial annotation model.

所述步骤(1)具体包括以下子步骤：The step (1) specifically includes the following sub-steps:

(1.1)从SODA 10M数据集中随机选取包括2N个已标注数据的交通图像的SODA 10M已标注数据集。所述SODA 10M数据集为通用交通图像数据集。(1.1) Randomly select the SODA 10M labeled data set including 2N traffic images with labeled data from the SODA 10M data set. The SODA 10M data set is a general traffic image data set.

(1.2)将SODA 10M已标注数据集随机等分成两份：第一SODA10M已标注数据集和第二SODA10M已标注数据集；从第一SODA 10M已标注数据集中筛选出11种场景的交通图像，每种场景下的交通图像的数量相同，构建得到初始标注数据集；并按照7:2:1的比例将初始标注数据集划分为初始标注训练集、初始标注验证集和初始标注测试集；所述11种场景包括晴天场景、阴天场景、雨天场景、城市街道场景、高速公路场景、乡村道路场景、住宅区场景、白天场景、夜间场景、黎明场景和黄昏场景。(1.2) Randomly divide the SODA 10M labeled data set into two parts: the first SODA10M labeled data set and the second SODA10M labeled data set; filter out traffic images of 11 scenes from the first SODA 10M labeled data set, The number of traffic images in each scenario is the same, and an initial annotation data set is constructed; and the initial annotation data set is divided into an initial annotation training set, an initial annotation verification set and an initial annotation test set in a ratio of 7:2:1; so The 11 scenes mentioned above include sunny scenes, cloudy scenes, rainy scenes, urban street scenes, highway scenes, rural road scenes, residential area scenes, day scenes, night scenes, dawn scenes and dusk scenes.

(1.3)构建Faster R-CNN神经网络：采用卷积网络VGG16作为原始网络，VGG16网络包括13个共享基础卷积层、5个最大池化层、3个全连接层、1个softmax层，在VGG16网络中嵌入RPN层和ROI池化层，得到构建的Faster R-CNN网络模型；用高度、宽度、深度表示每张交通图像的不同张量形式，通过共享基础卷积层提取交通图像的卷积特征图，然后使用区域检测框生成网络提取卷积特征图上多个包含感兴趣对象的目标检测框；随后结合卷积特征图和多个目标检测框进行感兴趣区域池化处理，得到每个目标检测框上的卷积特征图并传递给全连接层，最后通过全连接层计算得到每个目标检测框的类别，并获取每个目标检测框在原始交通图像中最终的精确位置。(1.3) Construct Faster R-CNN neural network: Use the convolutional network VGG16 as the original network. The VGG16 network includes 13 shared basic convolutional layers, 5 maximum pooling layers, 3 fully connected layers, and 1 softmax layer. The RPN layer and ROI pooling layer are embedded in the VGG16 network to obtain the constructed Faster R-CNN network model; height, width, and depth are used to represent different tensor forms of each traffic image, and the volume of the traffic image is extracted by sharing the basic convolution layer. Convolution feature map, and then use the region detection frame generation network to extract multiple target detection frames containing objects of interest on the convolution feature map; then combine the convolution feature map and multiple target detection frames for region of interest pooling processing to obtain each The convolutional feature map on each target detection frame is passed to the fully connected layer. Finally, the category of each target detection frame is calculated through the fully connected layer, and the final precise position of each target detection frame in the original traffic image is obtained.

(2)使用初始标注模型对SODA10M已标注数据集的另一份数据集进行划分，得到简单样本集合和困难样本集合；通过简单样本集合和困难样本集合构建困难样本数据集并用于对Faster R-CNN神经网络进行训练得到困难样本标注模型。(2) Use the initial labeling model to divide another data set of the SODA10M labeled data set to obtain a simple sample set and a difficult sample set; construct a difficult sample data set through the simple sample set and the difficult sample set and use it to perform Faster R- CNN neural network is trained to obtain a difficult sample labeling model.

所述步骤(2)具体包括以下子步骤：The step (2) specifically includes the following sub-steps:

(2.1)使用初始标注模型对第二SODA10M已标注数据集中每一张图片m_i进行推理，得到每一张图片m_i的推理结果n_i。(2.1) Use the initial annotation model to infer each image m _i in the second SODA10M annotated data set, and obtain the inference result n _i of each image m _i .

(2.2)对第二SODA10M已标注数据集中任意一张图片m_i，将图片m_i的推理结果n_i和真实标注结果N_i进行对比，若推理结果n_i包含未标记的目标区域，或者有标记的目标区域和对应真实标注结果N_i的IOU交并比小于0.5，则将图片m_i记为困难样本，并放入困难样本集合；反之记为简单样本，并放入简单样本集合。(2.2) For any image m _i in the second SODA10M labeled data set, compare the inference result n _i of image m _i with the real annotation result N _i . If the inference result n _i contains an unmarked target area, or there is If the IOU intersection ratio between the marked target area and the corresponding real annotation result N _i is less than 0.5, the image m _i will be recorded as a difficult sample and placed in the difficult sample set; otherwise, it will be recorded as a simple sample and placed in the simple sample set.

(2.3)对第二SODA 10M已标注数据集中每一张图片重复步骤(2.2)。(2.3) Repeat step (2.2) for each picture in the second SODA 10M labeled data set.

(2.4)从简单样本集合中随机选出部分简单样本，并从困难样本集合中随机选出部分困难样本一起作为困难样本数据集，其中，简单样本和困难样本的比例为2:1；再将困难样本数据集按照7:2:1的比例划分为困难样本训练集、困难样本验证集和困难样本测试集。(2.4) Randomly select some simple samples from the simple sample set, and randomly select some difficult samples from the difficult sample set together as a difficult sample data set, where the ratio of simple samples to difficult samples is 2:1; then The difficult sample data set is divided into a difficult sample training set, a difficult sample verification set and a difficult sample test set in a ratio of 7:2:1.

(3)采集汽车终端的视频流数据并进行抽帧操作，得到图像数据集。(3) Collect video stream data from car terminals and perform frame extraction operations to obtain image data sets.

所述步骤(3)具体为：使用MQTT通讯协议连接汽车终端设备，采集汽车终端设备的视频流数据，并上传至SRS视频服务器中，并对视频流进行抽帧操作，采集频率为0.1秒/张，转换为多张图像数据，得到图像数据集。The step (3) is specifically: use the MQTT communication protocol to connect to the automobile terminal equipment, collect the video stream data of the automobile terminal equipment, upload it to the SRS video server, and perform frame extraction operation on the video stream, with a collection frequency of 0.1 seconds/ Zhang, converted into multiple image data, to obtain an image data set.

(4)使用初始标注模型和困难样本标注模型对图像数据集中每一张图像进行检测：当未检测出图像中包括困难正样本目标时，将该图像记为困难图像样本，并放入困难图像样本集合；反之记为简单图像样本，并放入简单图像样本集合。(4) Use the initial annotation model and the difficult sample annotation model to detect each image in the image data set: when the image is not detected to include a difficult positive sample target, the image is recorded as a difficult image sample and placed in the difficult image Sample set; otherwise, record it as a simple image sample and put it into the simple image sample set.

如图2所示，所述步骤(4)具体包括以下子步骤：As shown in Figure 2, the step (4) specifically includes the following sub-steps:

其中，a_j表示初始目标检测框集合D_j中总共a_j个目标检测框，i＝1,2,…,i,…,a_j；表示初始目标检测框集合D_j中第i个目标检测框的位置，/>表示第i个目标检测框左上角的坐标，/>表示第i个目标检测框右下角的坐标。Among them, a_j represents a total of a_j target detection frames in the initial target detection frame set D _j , i=1,2,…,i,…,a_j; Represents the position of the i-th target detection frame in the initial target detection frame set D _j , /> Represents the coordinates of the upper left corner of the i-th target detection frame, /> Represents the coordinates of the lower right corner of the i-th target detection frame.

其中，b_j表示困难目标检测框集合R_j中总共b_j个目标检测框，h＝1,2,…,h,…,b_j；表示困难目标检测框集合R_j中第h个目标检测框的位置，/>表示第h个目标检测框左上角的坐标，/>表示第h个目标检测框右下角的坐标。Among them, b_j represents a total of b_j target detection frames in the difficult target detection frame set R _j , h=1,2,…,h,…,b_j; Represents the position of the h-th target detection frame in the difficult target detection frame set R _j , /> Represents the coordinates of the upper left corner of the h-th target detection frame,/> Represents the coordinates of the lower right corner of the h-th target detection frame.

以图3为例，展示了对图像M^*使用初始标注模型的标注结果示例图，图像M^*的初始目标检测框集合D^*表示为：Taking Figure 3 as an example, it shows an example of the annotation result using the initial annotation model for the image M ^* . The initial target detection frame set D ^* of the image M ^* is expressed as:

从图3中可以看出，图像M^*中被标注了4个目标检测框，每个目标检测框标注了1种交通工具，图3中“car 0.97”表示被标注出交通工具为汽车(car)，标注概率值为0.97。As can be seen from Figure 3, 4 target detection frames are marked in the image M ^* , and each target detection frame is marked with a vehicle. In Figure 3, "car 0.97" indicates that the vehicle is marked as a car (car ), the labeling probability value is 0.97.

以图4为例，展示了对图像M^*使用困难样本标注模型标注结果示例图，图像M^*的困难目标检测框集合R^*：Taking Figure 4 as an example, it shows an example image of the annotation results using the difficult sample annotation model for the image M ^* , and the difficult target detection frame set R ^* of the image M ^* :

从图4中可以看出，图像M^*中被标注了5个目标检测框。As can be seen from Figure 4, 5 target detection frames are marked in the image M ^* .

困难目标检测框集合R^*存在一个元素p(37,289),(185,393)]，满足初始目标检测框集合D^*中所有元素和元素[(37,289),(185,393)]的交并比同时低于0.5，因此将图像M^*记为困难图像样本，并放入困难图像样本集合。There is an element p(37,289),(185,393)] in the difficult target detection frame set R ^* , which satisfies that the intersection ratio of all elements in the initial target detection frame set D ^* and the elements [(37,289), (185,393)] is simultaneously lower than 0.5 , so the image M ^* is recorded as a difficult image sample and put into the difficult image sample set.

(5)当困难图像样本集合中的图像样本数量达到N时，通过困难图像样本集合和简单图像样本集合对困难样本数据集进行更新，得到更新后的困难样本数据集并输入用于更新困难样本标注模型。(5) When the number of image samples in the difficult image sample set reaches N, the difficult sample data set is updated through the difficult image sample set and the simple image sample set, and the updated difficult sample data set is obtained and input to update the difficult sample Label the model.

所述步骤(5)具体包括以下子步骤：The step (5) specifically includes the following sub-steps:

(5.1)当困难图像样本集合中的图像样本数量达到N时，从简单图像样本集合中随机选出部分简单图像样本，并从困难图像样本集合中随机选出部分困难图像样本一起作为第二困难图像样本集合，其中，简单图像样本和困难图像样本的的比例为2:1。(5.1) When the number of image samples in the difficult image sample set reaches N, some simple image samples are randomly selected from the simple image sample set, and some difficult image samples are randomly selected from the difficult image sample set as the second difficulty A collection of image samples, in which the ratio of simple image samples and difficult image samples is 2:1.

(5.2)将第二困难图像样本集合和困难样本数据集合并得到更新后的困难样本数据集，并按照7:2:1的比例划分为困难图像样本训练集、困难图像样本验证集和困难图像样本测试集。(5.2) Merge the second difficult image sample set and the difficult sample data set to obtain the updated difficult sample data set, and divide it into a difficult image sample training set, a difficult image sample verification set and a difficult image sample in a ratio of 7:2:1 Sample test set.

(6)重复步骤(3)-步骤(5)，直到困难样本标注模型的准确率超过95％，将更新后的困难样本标注模型作为最终的样本标注模型。(6) Repeat steps (3) to (5) until the accuracy of the difficult sample labeling model exceeds 95%, and use the updated difficult sample labeling model as the final sample labeling model.

每一次重复步骤(3)-步骤(5)，都是以本次重复得到的第二困难图像样本集合和上次重复中的得到的新的困难样本数据集进行合并，得到本次重复更新得到的困难样本数据集；更新得到的困难样本数据集中的困难图像样本的占比不断提高，从而达到增强困难样本标注模型对困难图像样本的标注能力，逐步提升模型的精度,直到达到指定精度要求，得到最终的样本标注模型。Each time steps (3) to (5) are repeated, the second difficult image sample set obtained in this iteration and the new difficult sample data set obtained in the last iteration are merged to obtain the update result of this iteration. The difficult sample data set; the proportion of difficult image samples in the updated difficult sample data set continues to increase, thereby enhancing the difficulty sample labeling model's ability to label difficult image samples, and gradually improving the accuracy of the model until it reaches the specified accuracy requirements. Obtain the final sample labeling model.

(7)将待测交通图像输入到最终的样本标注模型中，输出待测交通图像的目标检测框集合。(7) Input the traffic image to be tested into the final sample annotation model, and output the target detection frame set of the traffic image to be tested.

实施例2Example 2

参见图5，本发明实施例提供的一种基于困难样本挖掘的交通图像自动标注装置，包括一个或多个处理器，用于实现上述实施例中的基于困难样本挖掘的交通图像自动标注方法。Referring to Figure 5, an embodiment of the present invention provides an automatic annotation device for traffic images based on difficult sample mining, including one or more processors for implementing the automatic annotation method for traffic images based on difficult sample mining in the above embodiment.

本发明基于困难样本挖掘的交通图像自动标注装置的实施例可以应用在任意具备数据处理能力的设备上，该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为一个逻辑意义上的装置，是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，如图5所示，为本发明基于困难样本挖掘的交通图像自动标注装置所在任意具备数据处理能力的设备的一种硬件结构图，除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能，还可以包括其他硬件，对此不再赘述。The embodiments of the traffic image automatic annotation device based on difficult sample mining of the present invention can be applied to any device with data processing capabilities, and any device with data processing capabilities can be a device or device such as a computer. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of any device with data processing capabilities. From the hardware level, as shown in Figure 5, it is a hardware structure diagram of any device with data processing capabilities where the traffic image automatic annotation device based on difficult sample mining of the present invention is located. In addition to the processor and memory shown in Figure 5 , network interfaces, and non-volatile memory, any device with data processing capabilities where the device in the embodiment is located may also include other hardware based on the actual functions of any device with data processing capabilities. This will not be discussed here. Repeat.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。For details on the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method, and will not be described again here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

本发明实施例还提供一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，实现上述实施例中的基于困难样本挖掘的交通图像自动标注方法。Embodiments of the present invention also provide a computer-readable storage medium on which a program is stored. When the program is executed by a processor, the automatic annotation method of traffic images based on difficult sample mining in the above embodiments is implemented.

所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元，例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备，例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card，SMC)、SD卡、闪存卡(Flash Card)等。进一步的，所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据，还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capabilities as described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium can also be an external storage device of any device with data processing capabilities, such as a plug-in hard disk, a smart memory card (SMC), an SD card, or a flash memory card equipped on the device. (Flash Card) etc. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capabilities. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capabilities, and can also be used to temporarily store data that has been output or is to be output.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims

1. The automatic traffic image labeling method based on difficult sample mining is characterized by comprising the following steps of:

(1) The method comprises the steps of obtaining an SODA10M marked data set and dividing the marked data set into two parts randomly, wherein one data set is divided into an initial marking training set, an initial marking verification set and an initial marking test set, and training a Faster R-CNN neural network by taking the initial marking training set as input to obtain an initial marking model;

(2) Dividing another data set of the SODA10M marked data set by using an initial marking model to obtain a simple sample set and a difficult sample set; constructing a difficult sample data set through a simple sample set and a difficult sample set, and training a Faster R-CNN neural network to obtain a difficult sample labeling model;

(3) Collecting video stream data of an automobile terminal and performing frame extraction operation to obtain an image data set;

(4) Detecting each image in the image dataset by using the initial annotation model and the difficult sample annotation model: when the image is not detected to comprise a difficult positive sample target, marking the image as a difficult image sample, and putting the difficult image sample into a difficult image sample set; otherwise, marking as a simple image sample, and putting the simple image sample into a simple image sample set;

(5) When the number of the image samples in the difficult image sample set reaches N, updating the difficult sample data set through the difficult image sample set and the simple image sample set to obtain an updated difficult sample data set, and inputting the updated difficult sample data set for updating the difficult sample annotation model;

(6) Repeating the steps (3) - (5) until the accuracy of the difficult sample labeling model exceeds 95%, and taking the updated difficult sample labeling model as a final sample labeling model;

(7) Inputting the traffic image to be detected into a final sample labeling model, and outputting a target detection frame set of the traffic image to be detected as a labeling result.

2. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the step (1) specifically comprises the following sub-steps:

(1.1) randomly selecting an SODA10M marked data set comprising 2N traffic images of marked data from the SODA10M data set;

(1.2) randomly aliquoting the SODA10M annotated dataset into two parts: the first SODA10M has a labeled dataset and the second SODA10M has a labeled dataset; screening traffic images of 11 scenes from the first SODA10M marked data set, wherein the number of the traffic images in each scene is the same, and constructing an initial marked data set; dividing the initial annotation data set into an initial annotation training set, an initial annotation verification set and an initial annotation test set according to the proportion of 7:2:1;

(1.3) building a Faster R-CNN neural network: adopting a convolutional network VGG16 as an original network, wherein the VGG16 network comprises 13 shared basic convolutional layers, 5 maximum pooling layers, 3 full-connection layers and 1 softmax layer, and an RPN layer and an ROI pooling layer are embedded in the VGG16 network to obtain a constructed fast R-CNN network model; representing different tensor forms of each traffic image by height, width and depth, extracting a convolution feature map of the traffic image by sharing a basic convolution layer, and then extracting a plurality of target detection frames containing the object of interest on the convolution feature map by using a region detection frame generation network; then combining the convolution feature images with a plurality of target detection frames to carry out regional pooling treatment to obtain the convolution feature images on each target detection frame and transmitting the convolution feature images to a full-connection layer, finally calculating the category of each target detection frame through the full-connection layer, and obtaining the final accurate position of each target detection frame in the original traffic image;

and (1.4) training the Faster R-CNN neural network by taking the initial labeling training set as input to obtain an initial labeling model.

3. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the 11 scenes comprise a sunny scene, a cloudy scene, a rainy scene, a city street scene, a highway scene, a country road scene, a residential area scene, a daytime scene, a night scene, a dawn scene and a dusk scene.

4. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the step (2) specifically comprises the following sub-steps:

(2.1) for each picture M in the second SODA10M annotated data set using the initial annotation model _i Reasoning is carried out to obtain each picture m _i Is the reasoning result n of (2) _i ；

(2.2) for any one of the pictures M in the second SODA10M annotated data set _i Picture m _i Is the reasoning result n of (2) _i And true annotation result N _i Comparing, if the reasoning result n _i Containing unlabeled target areas, or marked target areas and corresponding true marking results N _i If the IOU cross-over ratio of (2) is smaller than 0.5, then picture m _i Marking as a difficult sample, and putting into a difficult sample set; otherwise, marking the sample as a simple sample, and putting the sample into a simple sample set;

(2.3) repeating step (2.2) for each picture in the second SODA10M annotated dataset;

(2.4) randomly selecting part of simple samples from the simple sample set, and randomly selecting part of difficult samples from the difficult sample set to be used as a difficult sample data set together, wherein the ratio of the simple samples to the difficult samples is 2:1; dividing the difficult sample data set into a difficult sample training set, a difficult sample verification set and a difficult sample test set according to the ratio of 7:2:1;

(2.5) training a difficult sample labeling model: and training based on the difficult sample data set and the Faster R-CNN neural network to obtain a difficult sample labeling model.

5. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the step (3) is specifically: and connecting the automobile terminal equipment by using an MQTT communication protocol, collecting video stream data of the automobile terminal equipment, uploading the video stream data to an SRS video server, performing frame extraction operation on the video stream, converting the video stream data into a plurality of image data at the collection frequency of 0.1 second/sheet, and obtaining an image data set.

6. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the step (4) specifically comprises the following substeps:

(4.1) for each image M in the image dataset _j Image M is obtained through reasoning of initial annotation model _j Initial target detection box set D of (2) _j ：

Where a_j represents the initial set of target detection boxes D _j I=1, 2, …, i, …, a_j;representing an initial set of target detection boxes D _j The position of the i-th target detection frame, < >>Indicating the coordinates of the upper left corner of the ith target detection frame,/->Representing the coordinates of the lower right corner of the ith target detection frame;

(4.2) and inferring the image M by the difficult sample annotation model _j Difficult target detection frame set R _j ：

Where b_j represents the set of difficult target detection boxes R _j B_j target detection boxes, h=1, 2, …, h, …, b_j;representing a set of difficult target detection boxes R _j Position of the h-th target detection frame, < >>Indicating the coordinates of the upper left corner of the h target detection frame,/->Representing the coordinates of the lower right corner of the h target detection frame;

(4.3) if image M _j Difficult target detection frame set R _j There is an element inSatisfy initial target detection frame set D _j All elements and elements->When the cross-over ratio of the images is simultaneously lower than 0.5, the image M is displayed _j Marking as a difficult image sample, and putting into a difficult image sample set; otherwise, the image is recorded as a simple image sample, and a simple image sample set is put in.

7. The automatic labeling method for traffic images based on difficult sample mining according to claim 1, wherein the step (5) specifically comprises the following substeps:

(5.1) randomly selecting a part of simple image samples from the simple image sample set when the number of the image samples in the difficult image sample set reaches N, and randomly selecting a part of difficult image samples from the difficult image sample set to be used as a second difficult image sample set together, wherein the ratio of the simple image samples to the difficult image samples is 2:1;

(5.2) combining the second difficult image sample set and the difficult sample data set to obtain an updated difficult sample data set, and dividing the updated difficult sample data set into a difficult image sample training set, a difficult image sample verification set and a difficult image sample test set according to the ratio of 7:2:1;

and (5.3) retraining the Faster R-CNN neural network based on the new difficult sample data set to obtain an updated difficult sample labeling model.

8. An automatic traffic image labeling device based on difficult sample mining, which comprises one or more processors for realizing the automatic traffic image labeling method based on difficult sample mining according to any one of claims 1 to 7.

9. A computer-readable storage medium having a program stored thereon, which when executed by a processor, is configured to implement the method for automatic annotation of traffic images based on difficult sample mining of any one of claims 1-7.