CN117765534A - Automatic traffic image labeling method and device based on difficult sample mining - Google Patents
Automatic traffic image labeling method and device based on difficult sample mining Download PDFInfo
- Publication number
- CN117765534A CN117765534A CN202311743990.3A CN202311743990A CN117765534A CN 117765534 A CN117765534 A CN 117765534A CN 202311743990 A CN202311743990 A CN 202311743990A CN 117765534 A CN117765534 A CN 117765534A
- Authority
- CN
- China
- Prior art keywords
- difficult
- sample
- image
- data set
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 49
- 238000005065 mining Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract 3
- 238000001514 detection method Methods 0.000 claims description 78
- 238000013527 convolutional neural network Methods 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 238000000034 method Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 3
- 238000011166 aliquoting Methods 0.000 claims 1
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 16
- 238000012545 processing Methods 0.000 description 12
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及交通图像标注技术领域,尤其涉及一种基于困难样本挖掘的交通图像自动标注方法和装置。The present invention relates to the technical field of traffic image annotation, and in particular to an automatic annotation method and device for traffic images based on difficult sample mining.
背景技术Background technique
随着人工智能在科技进步和社会发展中扮演者越来越重要的角色,人们把目光投向了自动驾驶上。而无论是L2级别辅助驾驶还是L5级别的完全自动化驾驶,都需要目标检测算法的支持。而一个优秀的目标检测算法的背后不仅需要大量的交通样本数据来支撑的,而且需要对困难检测场景有比较好的鲁棒性。样本数据从学习的难易程度上可以分为简单样本和困难样本。通过适当增大困难样本在新构建的数据集中占比,从而增强标注模型对困难样本的标注能力,可以在很大程度上提高模型的模型精度,具有重大意义。As artificial intelligence plays an increasingly important role in technological progress and social development, people are turning their attention to autonomous driving. Whether it is L2-level assisted driving or L5-level fully automated driving, it requires the support of target detection algorithms. An excellent target detection algorithm not only requires a large amount of traffic sample data to support it, but also needs to have relatively good robustness to difficult detection scenarios. Sample data can be divided into simple samples and difficult samples according to the difficulty of learning. By appropriately increasing the proportion of difficult samples in the newly constructed data set, thereby enhancing the annotation model's ability to label difficult samples, the model accuracy of the model can be improved to a large extent, which is of great significance.
目前各种深度学习网络使用的数据集主要来源于公开数据集或者自采数据集。公开数据集很难满足实际场景中的训练样本要求,使用自采数据集的话,如果使用人工标注的话则会耗费大量人力物力,尤其是对困难样本的标注上。Currently, the data sets used by various deep learning networks mainly come from public data sets or self-collected data sets. It is difficult for public data sets to meet the training sample requirements in actual scenarios. If you use self-collected data sets, manual labeling will consume a lot of manpower and material resources, especially for labeling difficult samples.
发明内容Contents of the invention
本发明的目的在于针对现有技术的不足,提供了一种基于困难样本挖掘的交通图像自动标注方法和装置。The purpose of the present invention is to provide an automatic annotation method and device for traffic images based on difficult sample mining in view of the shortcomings of the existing technology.
本发明的目的是通过以下技术方案来实现的:一种基于困难样本挖掘的交通图像自动标注方法,包括以下步骤:The purpose of the present invention is achieved through the following technical solutions: an automatic annotation method for traffic images based on difficult sample mining, including the following steps:
(1)获得SODA 10M已标注数据集并随机等分成两份,其中一份数据集被划分为初始标注训练集、初始标注验证集和初始标注测试集,以初始标注训练集为输入对Faster R-CNN神经网络进行训练得到初始标注模型;(1) Obtain the SODA 10M labeled data set and randomly divide it into two parts. One of the data sets is divided into the initial labeled training set, the initial labeled verification set and the initial labeled test set. The initial labeled training set is used as the input pair of Faster R -CNN neural network is trained to obtain the initial annotation model;
(2)使用初始标注模型对SODA 10M已标注数据集的另一份数据集进行划分,得到简单样本集合和困难样本集合;通过简单样本集合和困难样本集合构建困难样本数据集并用于对Faster R-CNN神经网络进行训练得到困难样本标注模型;(2) Use the initial labeling model to divide another data set of the SODA 10M labeled data set to obtain a simple sample set and a difficult sample set; construct a difficult sample data set through the simple sample set and the difficult sample set and use it to test Faster R -CNN neural network is trained to obtain a difficult sample labeling model;
(3)采集汽车终端的视频流数据并进行抽帧操作,得到图像数据集;(3) Collect video stream data from car terminals and perform frame extraction operations to obtain image data sets;
(4)使用初始标注模型和困难样本标注模型对图像数据集中每一张图像进行检测:当未检测出图像中包括困难正样本目标时,将该图像记为困难图像样本,并放入困难图像样本集合;反之记为简单图像样本,并放入简单图像样本集合;(4) Use the initial annotation model and the difficult sample annotation model to detect each image in the image data set: when the image is not detected to include a difficult positive sample target, the image is recorded as a difficult image sample and placed in the difficult image Sample set; otherwise, record it as a simple image sample and put it into the simple image sample set;
(5)当困难图像样本集合中的图像样本数量达到N时,通过困难图像样本集合和简单图像样本集合对困难样本数据集进行更新,得到更新后的困难样本数据集并输入用于更新困难样本标注模型;(5) When the number of image samples in the difficult image sample set reaches N, the difficult sample data set is updated through the difficult image sample set and the simple image sample set, and the updated difficult sample data set is obtained and input to update the difficult sample label model;
(6)重复步骤(3)-步骤(5),直到困难样本标注模型的准确率超过95%,将更新后的困难样本标注模型作为最终的样本标注模型;(6) Repeat steps (3) to (5) until the accuracy of the difficult sample labeling model exceeds 95%, and use the updated difficult sample labeling model as the final sample labeling model;
(7)将待测交通图像输入到最终的样本标注模型中,输出待测交通图像的目标检测框集合作为标注结果。(7) Input the traffic image to be tested into the final sample annotation model, and output the target detection frame set of the traffic image to be tested as the annotation result.
进一步地,所述步骤(1)具体包括以下子步骤:Further, the step (1) specifically includes the following sub-steps:
(1.1)从SODA 10M数据集中随机选取包括2N个已标注数据的交通图像的SODA 10M已标注数据集;(1.1) Randomly select the SODA 10M labeled data set including 2N traffic images with labeled data from the SODA 10M data set;
(1.2)将SODA 10M已标注数据集随机等分成两份:第一SODA 10M已标注数据集和第二SODA 10M已标注数据集;从第一SODA10M已标注数据集中筛选出11种场景的交通图像,每种场景下的交通图像的数量相同,构建得到初始标注数据集;并按照7:2:1的比例将初始标注数据集划分为初始标注训练集、初始标注验证集和初始标注测试集;(1.2) The SODA 10M labeled data set is randomly divided into two parts: the first SODA 10M labeled data set and the second SODA 10M labeled data set; traffic images of 11 scenes are screened out from the first SODA 10M labeled data set. , the number of traffic images in each scenario is the same, and the initial annotation data set is constructed; and the initial annotation data set is divided into an initial annotation training set, an initial annotation verification set and an initial annotation test set in a ratio of 7:2:1;
(1.3)构建Faster R-CNN神经网络:采用卷积网络VGG16作为原始网络,VGG16网络包括13个共享基础卷积层、5个最大池化层、3个全连接层、1个softmax层,在VGG16网络中嵌入RPN层和ROI池化层,得到构建的Faster R-CNN网络模型;用高度、宽度、深度表示每张交通图像的不同张量形式,通过共享基础卷积层提取交通图像的卷积特征图,然后使用区域检测框生成网络提取卷积特征图上多个包含感兴趣对象的目标检测框;随后结合卷积特征图和多个目标检测框进行感兴趣区域池化处理,得到每个目标检测框上的卷积特征图并传递给全连接层,最后通过全连接层计算得到每个目标检测框的类别,并获取每个目标检测框在原始交通图像中最终的精确位置;(1.3) Construct Faster R-CNN neural network: Use the convolutional network VGG16 as the original network. The VGG16 network includes 13 shared basic convolutional layers, 5 maximum pooling layers, 3 fully connected layers, and 1 softmax layer. The RPN layer and ROI pooling layer are embedded in the VGG16 network to obtain the constructed Faster R-CNN network model; height, width, and depth are used to represent different tensor forms of each traffic image, and the volume of the traffic image is extracted by sharing the basic convolution layer. Convolution feature map, and then use the region detection frame generation network to extract multiple target detection frames containing objects of interest on the convolution feature map; then combine the convolution feature map and multiple target detection frames for region of interest pooling processing to obtain each The convolutional feature map on each target detection frame is passed to the fully connected layer. Finally, the category of each target detection frame is calculated through the fully connected layer, and the final precise position of each target detection frame in the original traffic image is obtained;
(1.4)以初始标注训练集为输入对Faster R-CNN神经网络进行训练,得到初始标注模型。(1.4) Use the initial annotation training set as input to train the Faster R-CNN neural network to obtain the initial annotation model.
进一步地,所述11种场景包括晴天场景、阴天场景、雨天场景、城市街道场景、高速公路场景、乡村道路场景、住宅区场景、白天场景、夜间场景、黎明场景和黄昏场景。Further, the 11 scenes include sunny scenes, cloudy scenes, rainy scenes, city street scenes, highway scenes, country road scenes, residential area scenes, day scenes, night scenes, dawn scenes and dusk scenes.
进一步地,所述步骤(2)具体包括以下子步骤:Further, the step (2) specifically includes the following sub-steps:
(2.1)使用初始标注模型对第二SODA10M已标注数据集中每一张图片mi进行推理,得到每一张图片mi的推理结果ni;(2.1) Use the initial annotation model to infer each image m i in the second SODA10M annotated data set, and obtain the inference result n i for each image m i ;
(2.2)对第二SODA10M已标注数据集中任意一张图片mi,将图片mi的推理结果ni和真实标注结果Ni进行对比,若推理结果ni包含未标记的目标区域,或者有标记的目标区域和对应真实标注结果Ni的IOU交并比小于0.5,则将图片mi记为困难样本,并放入困难样本集合;反之记为简单样本,并放入简单样本集合;(2.2) For any image m i in the second SODA10M labeled data set, compare the inference result n i of image m i with the real annotation result N i . If the inference result n i contains an unmarked target area, or there is If the IOU intersection ratio between the marked target area and the corresponding real annotation result N i is less than 0.5, the image m i will be recorded as a difficult sample and placed in the difficult sample set; otherwise, it will be recorded as a simple sample and placed in the simple sample set;
(2.3)对第二SODA 10M已标注数据集中每一张图片重复步骤(2.2);(2.3) Repeat step (2.2) for each picture in the second SODA 10M annotated data set;
(2.4)从简单样本集合中随机选出部分简单样本,并从困难样本集合中随机选出部分困难样本一起作为困难样本数据集,其中,简单样本和困难样本的比例为2:1;再将困难样本数据集按照7:2:1的比例划分为困难样本训练集、困难样本验证集和困难样本测试集;(2.4) Randomly select some simple samples from the simple sample set, and randomly select some difficult samples from the difficult sample set together as a difficult sample data set, where the ratio of simple samples to difficult samples is 2:1; then The difficult sample data set is divided into a difficult sample training set, a difficult sample verification set and a difficult sample test set in a ratio of 7:2:1;
(2.5)训练困难样本标注模型:基于困难样本数据集和Faster R-CNN神经网络训练得到困难样本标注模型。(2.5) Train the difficult sample labeling model: The difficult sample labeling model is trained based on the difficult sample data set and Faster R-CNN neural network.
进一步地,所述步骤(3)具体为:使用MQTT通讯协议连接汽车终端设备,采集汽车终端设备的视频流数据,并上传至SRS视频服务器中,并对视频流进行抽帧操作,采集频率为0.1秒/张,转换为多张图像数据,得到图像数据集。Further, the step (3) is specifically: using the MQTT communication protocol to connect to the automotive terminal equipment, collecting the video stream data of the automotive terminal equipment, and uploading it to the SRS video server, and performing frame extraction operations on the video stream, with a collection frequency of 0.1 seconds/image, converted into multiple image data, and obtained image data set.
进一步地,所述步骤(4)具体包括以下子步骤:Further, the step (4) specifically includes the following sub-steps:
(4.1)对于图像数据集中每一张图像Mj,通过初始标注模型推理得到图像Mj的初始目标检测框集合Dj:(4.1) For each image M j in the image data set, the initial target detection frame set D j of the image M j is obtained through the initial annotation model inference:
其中,a_j表示初始目标检测框集合Dj中总共a_j个目标检测框,i=1,2,…,i,…,a_j;表示初始目标检测框集合Dj中第i个目标检测框的位置,/>表示第i个目标检测框左上角的坐标,/>表示第i个目标检测框右下角的坐标;Among them, a_j represents a total of a_j target detection frames in the initial target detection frame set D j , i=1,2,…,i,…,a_j; Represents the position of the i-th target detection frame in the initial target detection frame set D j , /> Represents the coordinates of the upper left corner of the i-th target detection frame, /> Represents the coordinates of the lower right corner of the i-th target detection frame;
(4.2)并通过困难样本标注模型推理得到图像Mj的困难目标检测框集合Rj:(4.2) and obtain the difficult target detection box set R j of image M j through difficult sample annotation model reasoning:
其中,b_j表示困难目标检测框集合Rj中总共b_j个目标检测框,h=1,2,…,h,…,b_j;表示困难目标检测框集合Rj中第h个目标检测框的位置,/>表示第h个目标检测框左上角的坐标,/>表示第h个目标检测框右下角的坐标;Among them, b_j represents a total of b_j target detection frames in the difficult target detection frame set R j , h=1,2,…,h,…,b_j; Represents the position of the h-th target detection frame in the difficult target detection frame set R j , /> Represents the coordinates of the upper left corner of the h-th target detection frame,/> Represents the coordinates of the lower right corner of the h-th target detection frame;
(4.3)若图像Mj的困难目标检测框集合Rj中存在一个元素满足初始目标检测框集合Dj中所有元素和元素/>的交并比同时低于0.5时,将图像Mj记为困难图像样本,并放入困难图像样本集合;反之记为简单图像样本,并放入简单图像样本集合。(4.3) If there is an element in the difficult target detection frame set R j of image M j Satisfies all elements and elements in the initial target detection frame set D j /> When the intersection ratio is lower than 0.5 at the same time, the image M j is recorded as a difficult image sample and placed in the difficult image sample set; otherwise, it is recorded as a simple image sample and placed in the simple image sample set.
进一步地,所述步骤(5)具体包括以下子步骤:Further, the step (5) specifically includes the following sub-steps:
(5.1)当困难图像样本集合中的图像样本数量达到N时,从简单图像样本集合中随机选出部分简单图像样本,并从困难图像样本集合中随机选出部分困难图像样本一起作为第二困难图像样本集合,其中,简单图像样本和困难图像样本的的比例为2:1;(5.1) When the number of image samples in the difficult image sample set reaches N, some simple image samples are randomly selected from the simple image sample set, and some difficult image samples are randomly selected from the difficult image sample set as the second difficulty Image sample set, in which the ratio of simple image samples and difficult image samples is 2:1;
(5.2)将第二困难图像样本集合和困难样本数据集进行合并得到更新后的困难样本数据集,并按照7:2:1的比例划分为困难图像样本训练集、困难图像样本验证集和困难图像样本测试集;(5.2) Merge the second difficult image sample set and the difficult sample data set to obtain the updated difficult sample data set, and divide it into a difficult image sample training set, a difficult image sample verification set and a difficult image sample set in a ratio of 7:2:1. Image sample test set;
(5.3)基于新的困难样本数据集对Faster R-CNN神经网络重新训练,得到更新后的困难样本标注模型。(5.3) Retrain the Faster R-CNN neural network based on the new difficult sample data set to obtain an updated difficult sample labeling model.
本发明还提供一种基于困难样本挖掘的交通图像自动标注装置,包括一个或多个处理器,用于实现上述基于困难样本挖掘的交通图像自动标注方法。The present invention also provides an automatic annotation device for traffic images based on difficult sample mining, which includes one or more processors for implementing the above automatic annotation method for traffic images based on difficult sample mining.
本发明还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,用于实现上述基于困难样本挖掘的交通图像自动标注方法。The present invention also provides a computer-readable storage medium on which a program is stored. When the program is executed by a processor, it is used to implement the above-mentioned automatic annotation method of traffic images based on difficult sample mining.
本发明的有益效果是:本发明通过构建初始标注模型和困难样本标注模型,能自动判定交通图像是否为困难图像样本,同时能控制困难样本数据集中的困难图像样本和简单图像样本的比例,通过适当增大困难图像样本在困难样本数据集中的占比,从而达到增强困难样本标注模型对困难图像样本的标注能力,逐步提升模型的精度,直到达到指定精度要求,得到最终的样本标注模型。The beneficial effects of the present invention are: by constructing an initial labeling model and a difficult sample labeling model, the present invention can automatically determine whether a traffic image is a difficult image sample, and at the same time can control the ratio of difficult image samples and simple image samples in the difficult sample data set. Appropriately increase the proportion of difficult image samples in the difficult sample data set, thereby enhancing the ability of the difficult sample labeling model to label difficult image samples, and gradually improving the accuracy of the model until the specified accuracy requirements are reached, and the final sample labeling model is obtained.
附图说明Description of the drawings
图1为一种基于困难样本挖掘的交通图像自动标注方法的整体流程图;Figure 1 is an overall flow chart of an automatic annotation method for traffic images based on difficult sample mining;
图2为实施例1中步骤(4)的流程示意图;Figure 2 is a schematic flow chart of step (4) in Embodiment 1;
图3为使用初始标注模型的标注结果示例图;Figure 3 is an example of annotation results using the initial annotation model;
图4为使用困难样本标注模型标注结果示例图;Figure 4 is an example of annotation results using a difficult sample annotation model;
图5为一种基于困难样本挖掘的交通图像自动标注装置的结构示意图。Figure 5 is a schematic structural diagram of an automatic annotation device for traffic images based on difficult sample mining.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加明白清楚,结合附图和实施例,对本发明进一步的详细说明,应当理解,此处所描述的具体实施例仅仅用以解释本发明,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,均在本发明保护范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail in conjunction with the accompanying drawings and examples. It should be understood that the specific embodiments described here are only used to explain the present invention and are not exhaustive. Example. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts are within the protection scope of the present invention.
实施例1Example 1
如图1所示,本发明提供了一种基于困难样本挖掘的交通图像自动标注方法,包括以下步骤:As shown in Figure 1, the present invention provides an automatic annotation method for traffic images based on difficult sample mining, which includes the following steps:
(1)获得SODA 10M已标注数据集并随机等分成两份,其中一份数据集被划分为初始标注训练集、初始标注验证集和初始标注测试集,以初始标注训练集为输入对Faster R-CNN神经网络进行训练得到初始标注模型。(1) Obtain the SODA 10M labeled data set and randomly divide it into two parts. One of the data sets is divided into the initial labeled training set, the initial labeled verification set and the initial labeled test set. The initial labeled training set is used as the input pair of Faster R -CNN neural network is trained to obtain the initial annotation model.
所述步骤(1)具体包括以下子步骤:The step (1) specifically includes the following sub-steps:
(1.1)从SODA 10M数据集中随机选取包括2N个已标注数据的交通图像的SODA 10M已标注数据集。所述SODA 10M数据集为通用交通图像数据集。(1.1) Randomly select the SODA 10M labeled data set including 2N traffic images with labeled data from the SODA 10M data set. The SODA 10M data set is a general traffic image data set.
(1.2)将SODA 10M已标注数据集随机等分成两份:第一SODA10M已标注数据集和第二SODA10M已标注数据集;从第一SODA 10M已标注数据集中筛选出11种场景的交通图像,每种场景下的交通图像的数量相同,构建得到初始标注数据集;并按照7:2:1的比例将初始标注数据集划分为初始标注训练集、初始标注验证集和初始标注测试集;所述11种场景包括晴天场景、阴天场景、雨天场景、城市街道场景、高速公路场景、乡村道路场景、住宅区场景、白天场景、夜间场景、黎明场景和黄昏场景。(1.2) Randomly divide the SODA 10M labeled data set into two parts: the first SODA10M labeled data set and the second SODA10M labeled data set; filter out traffic images of 11 scenes from the first SODA 10M labeled data set, The number of traffic images in each scenario is the same, and an initial annotation data set is constructed; and the initial annotation data set is divided into an initial annotation training set, an initial annotation verification set and an initial annotation test set in a ratio of 7:2:1; so The 11 scenes mentioned above include sunny scenes, cloudy scenes, rainy scenes, urban street scenes, highway scenes, rural road scenes, residential area scenes, day scenes, night scenes, dawn scenes and dusk scenes.
(1.3)构建Faster R-CNN神经网络:采用卷积网络VGG16作为原始网络,VGG16网络包括13个共享基础卷积层、5个最大池化层、3个全连接层、1个softmax层,在VGG16网络中嵌入RPN层和ROI池化层,得到构建的Faster R-CNN网络模型;用高度、宽度、深度表示每张交通图像的不同张量形式,通过共享基础卷积层提取交通图像的卷积特征图,然后使用区域检测框生成网络提取卷积特征图上多个包含感兴趣对象的目标检测框;随后结合卷积特征图和多个目标检测框进行感兴趣区域池化处理,得到每个目标检测框上的卷积特征图并传递给全连接层,最后通过全连接层计算得到每个目标检测框的类别,并获取每个目标检测框在原始交通图像中最终的精确位置。(1.3) Construct Faster R-CNN neural network: Use the convolutional network VGG16 as the original network. The VGG16 network includes 13 shared basic convolutional layers, 5 maximum pooling layers, 3 fully connected layers, and 1 softmax layer. The RPN layer and ROI pooling layer are embedded in the VGG16 network to obtain the constructed Faster R-CNN network model; height, width, and depth are used to represent different tensor forms of each traffic image, and the volume of the traffic image is extracted by sharing the basic convolution layer. Convolution feature map, and then use the region detection frame generation network to extract multiple target detection frames containing objects of interest on the convolution feature map; then combine the convolution feature map and multiple target detection frames for region of interest pooling processing to obtain each The convolutional feature map on each target detection frame is passed to the fully connected layer. Finally, the category of each target detection frame is calculated through the fully connected layer, and the final precise position of each target detection frame in the original traffic image is obtained.
(1.4)以初始标注训练集为输入对Faster R-CNN神经网络进行训练,得到初始标注模型。(1.4) Use the initial annotation training set as input to train the Faster R-CNN neural network to obtain the initial annotation model.
(2)使用初始标注模型对SODA10M已标注数据集的另一份数据集进行划分,得到简单样本集合和困难样本集合;通过简单样本集合和困难样本集合构建困难样本数据集并用于对Faster R-CNN神经网络进行训练得到困难样本标注模型。(2) Use the initial labeling model to divide another data set of the SODA10M labeled data set to obtain a simple sample set and a difficult sample set; construct a difficult sample data set through the simple sample set and the difficult sample set and use it to perform Faster R- CNN neural network is trained to obtain a difficult sample labeling model.
所述步骤(2)具体包括以下子步骤:The step (2) specifically includes the following sub-steps:
(2.1)使用初始标注模型对第二SODA10M已标注数据集中每一张图片mi进行推理,得到每一张图片mi的推理结果ni。(2.1) Use the initial annotation model to infer each image m i in the second SODA10M annotated data set, and obtain the inference result n i of each image m i .
(2.2)对第二SODA10M已标注数据集中任意一张图片mi,将图片mi的推理结果ni和真实标注结果Ni进行对比,若推理结果ni包含未标记的目标区域,或者有标记的目标区域和对应真实标注结果Ni的IOU交并比小于0.5,则将图片mi记为困难样本,并放入困难样本集合;反之记为简单样本,并放入简单样本集合。(2.2) For any image m i in the second SODA10M labeled data set, compare the inference result n i of image m i with the real annotation result N i . If the inference result n i contains an unmarked target area, or there is If the IOU intersection ratio between the marked target area and the corresponding real annotation result N i is less than 0.5, the image m i will be recorded as a difficult sample and placed in the difficult sample set; otherwise, it will be recorded as a simple sample and placed in the simple sample set.
(2.3)对第二SODA 10M已标注数据集中每一张图片重复步骤(2.2)。(2.3) Repeat step (2.2) for each picture in the second SODA 10M labeled data set.
(2.4)从简单样本集合中随机选出部分简单样本,并从困难样本集合中随机选出部分困难样本一起作为困难样本数据集,其中,简单样本和困难样本的比例为2:1;再将困难样本数据集按照7:2:1的比例划分为困难样本训练集、困难样本验证集和困难样本测试集。(2.4) Randomly select some simple samples from the simple sample set, and randomly select some difficult samples from the difficult sample set together as a difficult sample data set, where the ratio of simple samples to difficult samples is 2:1; then The difficult sample data set is divided into a difficult sample training set, a difficult sample verification set and a difficult sample test set in a ratio of 7:2:1.
(2.5)训练困难样本标注模型:基于困难样本数据集和Faster R-CNN神经网络训练得到困难样本标注模型。(2.5) Train the difficult sample labeling model: The difficult sample labeling model is trained based on the difficult sample data set and Faster R-CNN neural network.
(3)采集汽车终端的视频流数据并进行抽帧操作,得到图像数据集。(3) Collect video stream data from car terminals and perform frame extraction operations to obtain image data sets.
所述步骤(3)具体为:使用MQTT通讯协议连接汽车终端设备,采集汽车终端设备的视频流数据,并上传至SRS视频服务器中,并对视频流进行抽帧操作,采集频率为0.1秒/张,转换为多张图像数据,得到图像数据集。The step (3) is specifically: use the MQTT communication protocol to connect to the automobile terminal equipment, collect the video stream data of the automobile terminal equipment, upload it to the SRS video server, and perform frame extraction operation on the video stream, with a collection frequency of 0.1 seconds/ Zhang, converted into multiple image data, to obtain an image data set.
(4)使用初始标注模型和困难样本标注模型对图像数据集中每一张图像进行检测:当未检测出图像中包括困难正样本目标时,将该图像记为困难图像样本,并放入困难图像样本集合;反之记为简单图像样本,并放入简单图像样本集合。(4) Use the initial annotation model and the difficult sample annotation model to detect each image in the image data set: when the image is not detected to include a difficult positive sample target, the image is recorded as a difficult image sample and placed in the difficult image Sample set; otherwise, record it as a simple image sample and put it into the simple image sample set.
如图2所示,所述步骤(4)具体包括以下子步骤:As shown in Figure 2, the step (4) specifically includes the following sub-steps:
(4.1)对于图像数据集中每一张图像Mj,通过初始标注模型推理得到图像Mj的初始目标检测框集合Dj:(4.1) For each image M j in the image data set, the initial target detection frame set D j of the image M j is obtained through the initial annotation model inference:
其中,a_j表示初始目标检测框集合Dj中总共a_j个目标检测框,i=1,2,…,i,…,a_j;表示初始目标检测框集合Dj中第i个目标检测框的位置,/>表示第i个目标检测框左上角的坐标,/>表示第i个目标检测框右下角的坐标。Among them, a_j represents a total of a_j target detection frames in the initial target detection frame set D j , i=1,2,…,i,…,a_j; Represents the position of the i-th target detection frame in the initial target detection frame set D j , /> Represents the coordinates of the upper left corner of the i-th target detection frame, /> Represents the coordinates of the lower right corner of the i-th target detection frame.
(4.2)并通过困难样本标注模型推理得到图像Mj的困难目标检测框集合Rj:(4.2) and obtain the difficult target detection box set R j of image M j through difficult sample annotation model reasoning:
其中,b_j表示困难目标检测框集合Rj中总共b_j个目标检测框,h=1,2,…,h,…,b_j;表示困难目标检测框集合Rj中第h个目标检测框的位置,/>表示第h个目标检测框左上角的坐标,/>表示第h个目标检测框右下角的坐标。Among them, b_j represents a total of b_j target detection frames in the difficult target detection frame set R j , h=1,2,…,h,…,b_j; Represents the position of the h-th target detection frame in the difficult target detection frame set R j , /> Represents the coordinates of the upper left corner of the h-th target detection frame,/> Represents the coordinates of the lower right corner of the h-th target detection frame.
(4.3)若图像Mj的困难目标检测框集合Rj中存在一个元素满足初始目标检测框集合Dj中所有元素和元素/>的交并比同时低于0.5时,将图像Mj记为困难图像样本,并放入困难图像样本集合;反之记为简单图像样本,并放入简单图像样本集合。(4.3) If there is an element in the difficult target detection frame set R j of image M j Satisfies all elements and elements in the initial target detection frame set D j /> When the intersection ratio is lower than 0.5 at the same time, the image M j is recorded as a difficult image sample and placed in the difficult image sample set; otherwise, it is recorded as a simple image sample and placed in the simple image sample set.
以图3为例,展示了对图像M*使用初始标注模型的标注结果示例图,图像M*的初始目标检测框集合D*表示为:Taking Figure 3 as an example, it shows an example of the annotation result using the initial annotation model for the image M * . The initial target detection frame set D * of the image M * is expressed as:
从图3中可以看出,图像M*中被标注了4个目标检测框,每个目标检测框标注了1种交通工具,图3中“car 0.97”表示被标注出交通工具为汽车(car),标注概率值为0.97。As can be seen from Figure 3, 4 target detection frames are marked in the image M * , and each target detection frame is marked with a vehicle. In Figure 3, "car 0.97" indicates that the vehicle is marked as a car (car ), the labeling probability value is 0.97.
以图4为例,展示了对图像M*使用困难样本标注模型标注结果示例图,图像M*的困难目标检测框集合R*:Taking Figure 4 as an example, it shows an example image of the annotation results using the difficult sample annotation model for the image M * , and the difficult target detection frame set R * of the image M * :
从图4中可以看出,图像M*中被标注了5个目标检测框。As can be seen from Figure 4, 5 target detection frames are marked in the image M * .
困难目标检测框集合R*存在一个元素p(37,289),(185,393)],满足初始目标检测框集合D*中所有元素和元素[(37,289),(185,393)]的交并比同时低于0.5,因此将图像M*记为困难图像样本,并放入困难图像样本集合。There is an element p(37,289),(185,393)] in the difficult target detection frame set R * , which satisfies that the intersection ratio of all elements in the initial target detection frame set D * and the elements [(37,289), (185,393)] is simultaneously lower than 0.5 , so the image M * is recorded as a difficult image sample and put into the difficult image sample set.
(5)当困难图像样本集合中的图像样本数量达到N时,通过困难图像样本集合和简单图像样本集合对困难样本数据集进行更新,得到更新后的困难样本数据集并输入用于更新困难样本标注模型。(5) When the number of image samples in the difficult image sample set reaches N, the difficult sample data set is updated through the difficult image sample set and the simple image sample set, and the updated difficult sample data set is obtained and input to update the difficult sample Label the model.
所述步骤(5)具体包括以下子步骤:The step (5) specifically includes the following sub-steps:
(5.1)当困难图像样本集合中的图像样本数量达到N时,从简单图像样本集合中随机选出部分简单图像样本,并从困难图像样本集合中随机选出部分困难图像样本一起作为第二困难图像样本集合,其中,简单图像样本和困难图像样本的的比例为2:1。(5.1) When the number of image samples in the difficult image sample set reaches N, some simple image samples are randomly selected from the simple image sample set, and some difficult image samples are randomly selected from the difficult image sample set as the second difficulty A collection of image samples, in which the ratio of simple image samples and difficult image samples is 2:1.
(5.2)将第二困难图像样本集合和困难样本数据集合并得到更新后的困难样本数据集,并按照7:2:1的比例划分为困难图像样本训练集、困难图像样本验证集和困难图像样本测试集。(5.2) Merge the second difficult image sample set and the difficult sample data set to obtain the updated difficult sample data set, and divide it into a difficult image sample training set, a difficult image sample verification set and a difficult image sample in a ratio of 7:2:1 Sample test set.
(5.3)基于新的困难样本数据集对Faster R-CNN神经网络重新训练,得到更新后的困难样本标注模型。(5.3) Retrain the Faster R-CNN neural network based on the new difficult sample data set to obtain an updated difficult sample labeling model.
(6)重复步骤(3)-步骤(5),直到困难样本标注模型的准确率超过95%,将更新后的困难样本标注模型作为最终的样本标注模型。(6) Repeat steps (3) to (5) until the accuracy of the difficult sample labeling model exceeds 95%, and use the updated difficult sample labeling model as the final sample labeling model.
每一次重复步骤(3)-步骤(5),都是以本次重复得到的第二困难图像样本集合和上次重复中的得到的新的困难样本数据集进行合并,得到本次重复更新得到的困难样本数据集;更新得到的困难样本数据集中的困难图像样本的占比不断提高,从而达到增强困难样本标注模型对困难图像样本的标注能力,逐步提升模型的精度,直到达到指定精度要求,得到最终的样本标注模型。Each time steps (3) to (5) are repeated, the second difficult image sample set obtained in this iteration and the new difficult sample data set obtained in the last iteration are merged to obtain the update result of this iteration. The difficult sample data set; the proportion of difficult image samples in the updated difficult sample data set continues to increase, thereby enhancing the difficulty sample labeling model's ability to label difficult image samples, and gradually improving the accuracy of the model until it reaches the specified accuracy requirements. Obtain the final sample labeling model.
(7)将待测交通图像输入到最终的样本标注模型中,输出待测交通图像的目标检测框集合。(7) Input the traffic image to be tested into the final sample annotation model, and output the target detection frame set of the traffic image to be tested.
实施例2Example 2
参见图5,本发明实施例提供的一种基于困难样本挖掘的交通图像自动标注装置,包括一个或多个处理器,用于实现上述实施例中的基于困难样本挖掘的交通图像自动标注方法。Referring to Figure 5, an embodiment of the present invention provides an automatic annotation device for traffic images based on difficult sample mining, including one or more processors for implementing the automatic annotation method for traffic images based on difficult sample mining in the above embodiment.
本发明基于困难样本挖掘的交通图像自动标注装置的实施例可以应用在任意具备数据处理能力的设备上,该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图5所示,为本发明基于困难样本挖掘的交通图像自动标注装置所在任意具备数据处理能力的设备的一种硬件结构图,除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能,还可以包括其他硬件,对此不再赘述。The embodiments of the traffic image automatic annotation device based on difficult sample mining of the present invention can be applied to any device with data processing capabilities, and any device with data processing capabilities can be a device or device such as a computer. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of any device with data processing capabilities. From the hardware level, as shown in Figure 5, it is a hardware structure diagram of any device with data processing capabilities where the traffic image automatic annotation device based on difficult sample mining of the present invention is located. In addition to the processor and memory shown in Figure 5 , network interfaces, and non-volatile memory, any device with data processing capabilities where the device in the embodiment is located may also include other hardware based on the actual functions of any device with data processing capabilities. This will not be discussed here. Repeat.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For details on the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method, and will not be described again here.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details. The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
本发明实施例还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,实现上述实施例中的基于困难样本挖掘的交通图像自动标注方法。Embodiments of the present invention also provide a computer-readable storage medium on which a program is stored. When the program is executed by a processor, the automatic annotation method of traffic images based on difficult sample mining in the above embodiments is implemented.
所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据,还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device with data processing capabilities as described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium can also be an external storage device of any device with data processing capabilities, such as a plug-in hard disk, a smart memory card (SMC), an SD card, or a flash memory card equipped on the device. (Flash Card) etc. Furthermore, the computer-readable storage medium may also include both an internal storage unit and an external storage device of any device with data processing capabilities. The computer-readable storage medium is used to store the computer program and other programs and data required by any device with data processing capabilities, and can also be used to temporarily store data that has been output or is to be output.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.
Claims (9)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2023111677895 | 2023-09-11 | ||
| CN202311167789 | 2023-09-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117765534A true CN117765534A (en) | 2024-03-26 |
Family
ID=90317484
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311743990.3A Pending CN117765534A (en) | 2023-09-11 | 2023-12-18 | Automatic traffic image labeling method and device based on difficult sample mining |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117765534A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118644662A (en) * | 2024-06-14 | 2024-09-13 | 北京卓视智通科技有限责任公司 | Pre-labeling method, system, device and storage medium based on multi-expert model |
| CN119904894A (en) * | 2025-04-02 | 2025-04-29 | 中国科学院长春光学精密机械与物理研究所 | Multi-scale pedestrian detection method and device based on joint head and overall information |
-
2023
- 2023-12-18 CN CN202311743990.3A patent/CN117765534A/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118644662A (en) * | 2024-06-14 | 2024-09-13 | 北京卓视智通科技有限责任公司 | Pre-labeling method, system, device and storage medium based on multi-expert model |
| CN119904894A (en) * | 2025-04-02 | 2025-04-29 | 中国科学院长春光学精密机械与物理研究所 | Multi-scale pedestrian detection method and device based on joint head and overall information |
| CN119904894B (en) * | 2025-04-02 | 2025-07-25 | 中国科学院长春光学精密机械与物理研究所 | Multi-scale pedestrian detection method and device based on joint head and overall information |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117765534A (en) | Automatic traffic image labeling method and device based on difficult sample mining | |
| CN111460927B (en) | Method for extracting structured information of house property evidence image | |
| CN111476284A (en) | Image recognition model training method, image recognition model training device, image recognition method, image recognition device and electronic equipment | |
| CN107563372A (en) | A kind of license plate locating method based on deep learning SSD frameworks | |
| CN107451661A (en) | A kind of neutral net transfer learning method based on virtual image data collection | |
| CN106599051A (en) | Method for automatically annotating image on the basis of generation of image annotation library | |
| CN112613548B (en) | User customized target detection method, system and storage medium based on weak supervised learning | |
| CN109446897B (en) | Scene recognition method and device based on image context information | |
| CN112836657B (en) | Pedestrian detection method and system based on lightweight YOLOv3 | |
| CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
| CN108629286A (en) | A kind of remote sensing airport target detection method based on the notable model of subjective perception | |
| CN111079543B (en) | Efficient vehicle color identification method based on deep learning | |
| CN116597270B (en) | Road damage object detection method based on attention mechanism ensemble learning network | |
| CN115294483A (en) | Method and system for small target recognition in complex scene of transmission line | |
| CN112270317A (en) | Traditional digital water meter reading identification method based on deep learning and frame difference method | |
| CN116229406B (en) | Lane line detection method, system, electronic equipment and storage medium | |
| CN116597343A (en) | Expressway Weather Recognition Method and Device Based on Integrated Learning Algorithm | |
| CN115063684A (en) | A kind of agricultural machinery trajectory recognition method based on remote sensing image scene division and its application method | |
| CN115083003A (en) | Clustering network training and target clustering method, device, terminal and storage medium | |
| CN117274175A (en) | Insulator defect detection method based on improved neural network model and storage medium | |
| CN109858349A (en) | A kind of traffic sign recognition method and its device based on improvement YOLO model | |
| CN115019296A (en) | A cascade-based license plate detection and recognition method and device | |
| CN115240133A (en) | Bus congestion degree analysis method, device and equipment | |
| CN115050028A (en) | Sample vehicle license plate detection method in severe weather | |
| CN114359884A (en) | A license plate recognition method, device, electronic device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |