CN118351399A - Sample generation method, image recognition model training method and corresponding device - Google Patents
Sample generation method, image recognition model training method and corresponding device Download PDFInfo
- Publication number
- CN118351399A CN118351399A CN202410758335.3A CN202410758335A CN118351399A CN 118351399 A CN118351399 A CN 118351399A CN 202410758335 A CN202410758335 A CN 202410758335A CN 118351399 A CN118351399 A CN 118351399A
- Authority
- CN
- China
- Prior art keywords
- image
- class
- images
- recognition model
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本公开涉及数据处理技术领域,具体地,涉及样本生成方法、图像识别模型的训练方法及相应装置。The present disclosure relates to the field of data processing technology, and in particular, to a sample generation method, an image recognition model training method and corresponding devices.
背景技术Background technique
车载图像识别,主要是将图像识别技术应用于车辆中。当将图像识别技术搭载在车辆后,该车辆可以通过扫描车载图像识别对应的身份信息,以便实现车门解锁、点火启动、座椅调节和路径规划等车辆的个性化服务,以及车内支付等功能。In-vehicle image recognition mainly applies image recognition technology to vehicles. When image recognition technology is installed in a vehicle, the vehicle can scan the in-vehicle image to identify the corresponding identity information, so as to realize personalized services such as door unlocking, ignition start, seat adjustment and route planning, as well as in-vehicle payment and other functions.
相关技术中,通过大量采集驾驶员的图像,并将该图像作为训练样本,对图像识别模型进行训练,进而将训练后得到的图像识别模型用于识别驾驶员的身份信息,以实现车辆个性化服务与功能。In the related art, a large number of images of drivers are collected and used as training samples to train an image recognition model, and then the trained image recognition model is used to identify the driver's identity information to achieve personalized vehicle services and functions.
发明内容Summary of the invention
本公开的目的是提供样本生成方法、图像识别模型的训练方法及相应装置,以解决上述相关技术中存在的技术问题。The purpose of the present disclosure is to provide a sample generation method, an image recognition model training method and corresponding devices to solve the technical problems existing in the above-mentioned related technologies.
为了实现上述目的,第一方面,本公开提供一种样本生成方法,包括:In order to achieve the above objectives, in a first aspect, the present disclosure provides a sample generation method, comprising:
确定多个图像集,每个所述图像集包括同一身份信息对应的多张图像;Determine a plurality of image sets, each of the image sets including a plurality of images corresponding to the same identity information;
针对每个所述图像集,对所述图像集对应的多张图像进行聚类,得到多个类内簇,在不同的所述类内簇中选取图像构成正样本对,其中,同一所述类内簇中不同图像对应的特征之间的相似度小于第一相似度阈值;For each of the image sets, clustering the multiple images corresponding to the image set to obtain multiple intra-class clusters, selecting images from different intra-class clusters to form positive sample pairs, wherein the similarity between the features corresponding to different images in the same intra-class cluster is less than a first similarity threshold;
对所述多个图像集进行聚类,得到多个类间簇,在同一所述类间簇中选取不同图像集对应的图像构成负样本对,其中,同一所述类间簇中不同图像集对应的特征之间的相似度小于第二相似度阈值;Clustering the multiple image sets to obtain multiple inter-class clusters, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs, wherein the similarity between the features corresponding to the different image sets in the same inter-class cluster is less than a second similarity threshold;
根据所述多个图像集、所述正样本对以及所述负样本对,生成三元组样本对。A triplet of sample pairs is generated according to the multiple image sets, the positive sample pairs, and the negative sample pairs.
可选地,所述对所述图像集对应的多张图像进行聚类,得到多个类内簇,包括:Optionally, clustering the multiple images corresponding to the image set to obtain multiple intra-class clusters includes:
计算所述图像集中每两张图像之间的相似度;Calculating the similarity between every two images in the image set;
将所述图像集中的图像视为节点,将所述相似度小于所述第一相似度阈值的两张图像对应的节点连接,得到第一无向图;Considering the images in the image set as nodes, connecting the nodes corresponding to the two images whose similarity is less than the first similarity threshold, to obtain a first undirected graph;
将所述第一无向图中每个子图对应的图像确定为一类内簇,得到多个类内簇。The image corresponding to each subgraph in the first undirected graph is determined as an intra-cluster of a class, so as to obtain a plurality of intra-cluster of the class.
可选地,所述在不同的所述类内簇中选取图像构成正样本对,包括:Optionally, selecting images from different in-class clusters to form positive sample pairs includes:
从第一类内簇中随机选取第一图像,并从第二类内簇中随机选取第二图像,所述第一类内簇和所述第二类内簇为所述多个类内簇中不同的类内簇;Randomly selecting a first image from a first in-class cluster and randomly selecting a second image from a second in-class cluster, wherein the first in-class cluster and the second in-class cluster are different in-class clusters among the multiple in-class clusters;
将所述第一图像和所述第二图像构成正样本对。The first image and the second image constitute a positive sample pair.
可选地,所述对所述多个图像集进行聚类,得到多个类间簇,包括:Optionally, clustering the multiple image sets to obtain multiple inter-class clusters includes:
计算所述多个图像集中每两个图像集的中心特征之间的相似度,所述图像的中心特征为所述图像集中达到预设条件的图像;Calculating the similarity between the central features of every two image sets in the plurality of image sets, the central feature of the image being the image in the image set that meets a preset condition;
将所述图像集的中心特征视为节点,将所述相似度小于所述第二相似度阈值的两个图像集的中心特征对应的节点连接,得到第二无向图;The central features of the image sets are regarded as nodes, and the nodes corresponding to the central features of the two image sets whose similarities are less than the second similarity threshold are connected to obtain a second undirected graph;
将所述第二无向图中每个子图对应的图像集确定为一类间簇,得到所述多个类间簇。The image set corresponding to each subgraph in the second undirected graph is determined as an inter-class cluster to obtain the multiple inter-class clusters.
可选地,所述在同一所述类间簇中选取不同图像集对应的图像构成负样本对,包括:Optionally, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs includes:
从同一所述类间簇中随机选取第三图像与第四图像,所述第三图像与所述第四图像为不同图像集对应的图像;Randomly selecting a third image and a fourth image from the same between-class cluster, wherein the third image and the fourth image are images corresponding to different image sets;
将所述第三图像与所述第四图像构成负样本对。The third image and the fourth image form a negative sample pair.
可选地,所述在同一所述类间簇中选取不同图像集对应的图像构成负样本对,包括:Optionally, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs includes:
针对同一所述类间簇中的每个图像集,从所述图像集中随机选取第五图像;For each image set in the same between-class cluster, randomly selecting a fifth image from the image set;
从剩余图像集的多个类内簇中每个类内簇内中随机选取第六图像,其中,所述剩余图像集为同一所述类间簇中除所述图像集后剩余的图像集;Randomly select a sixth image from each of the multiple intra-class clusters of the remaining image set, wherein the remaining image set is the image set remaining after excluding the image set in the same inter-class cluster;
将所述第五图像与所述第六图像构成所述负样本对。The fifth image and the sixth image constitute the negative sample pair.
可选地,所述确定多个图像集,包括:Optionally, determining a plurality of image sets comprises:
确定车载图像采集装置采集的多个所述图像集。A plurality of image sets captured by the vehicle-mounted image capture device are determined.
第二方面,本公开提供一种图像识别模型的训练方法,所述方法包括:In a second aspect, the present disclosure provides a method for training an image recognition model, the method comprising:
根据本公开第一方面任一项所述的样本生成方法生成的三元组样本对,对第一图像识别模型进行训练。The first image recognition model is trained according to the triplet sample pairs generated by the sample generation method according to any one of the first aspects of the present disclosure.
可选地,在对第一图像识别模型进行训练之前,所述方法还包括:Optionally, before training the first image recognition model, the method further includes:
获取车载图像采集装置中的车载图像集,根据所述车载图像集,通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到所述第一图像识别模型。A vehicle-mounted image set in a vehicle-mounted image acquisition device is obtained, and a second image recognition model is trained according to the vehicle-mounted image set by using a normalized exponential softmax loss function to obtain the first image recognition model.
可选地,在通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到所述第一图像识别模型之前,还包括:Optionally, before training the second image recognition model by using a normalized exponential softmax loss function to obtain the first image recognition model, the method further includes:
获取RGB图像集;Get RGB image set;
对所述RGB图像集进行灰度处理,得到灰度图像集;Performing grayscale processing on the RGB image set to obtain a grayscale image set;
根据所述灰度图像集,对第三图像识别模型进行预训练,得到所述第二图像识别模型。The third image recognition model is pre-trained according to the grayscale image set to obtain the second image recognition model.
第三方面,本公开提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本公开第一方面与第二方面提供的任一项所述方法的步骤。In a third aspect, the present disclosure provides a non-temporary computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any one of the methods provided in the first and second aspects of the present disclosure.
第四方面,本公开提供一种控制器,包括:In a fourth aspect, the present disclosure provides a controller, comprising:
存储器,其上存储有计算机程序;a memory having a computer program stored thereon;
处理器,用于执行所述存储器中的所述计算机程序,以实现本公开第一方面与第二方面提供的任一项所述方法的步骤。A processor is used to execute the computer program in the memory to implement the steps of any one of the methods provided in the first aspect and the second aspect of the present disclosure.
第五方面,本公开提供一种车辆,包括本公开第四方面提供的控制器。In a fifth aspect, the present disclosure provides a vehicle, comprising the controller provided in the fourth aspect of the present disclosure.
第六方面,本公开提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现本公开第一方面与第二方面提供的任一项所述的方法的步骤。In a sixth aspect, the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the steps of any one of the methods provided in the first and second aspects of the present disclosure.
通过上述技术方案,同一个类内簇对应的多张图像的相似度较高,不同的类内簇对应的图像的相似度较低,在不同的类内簇中选取相似度较低的图像构成正样本对,可以减少由于选取相似度较高的同一图像而产生无效正样本对的情况。另外,同一类间簇对应的多个图像集的图像的相似度较高,在同一个类间簇中选取不同图像集对应的图像构成相似度较高负样本对,可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。由此,根据该正样本对以及负样本对、图像集,可以生成大量的三元组样本对。进而在将该三元组样本对用于对第一图像识别模型训练时,可以提高该第一图像识别模型的收敛稳定性以及识别精度。Through the above technical solution, the similarity of multiple images corresponding to the same intra-class cluster is high, and the similarity of images corresponding to different intra-class clusters is low. Selecting images with lower similarity in different intra-class clusters to form positive sample pairs can reduce the situation where invalid positive sample pairs are generated due to selecting the same image with higher similarity. In addition, the similarity of images of multiple image sets corresponding to the same inter-class cluster is high. Selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs with higher similarity can reduce the situation where invalid negative sample pairs are generated due to selecting different images with lower similarity. Therefore, a large number of triple sample pairs can be generated based on the positive sample pairs, negative sample pairs, and image sets. Furthermore, when the triple sample pairs are used to train the first image recognition model, the convergence stability and recognition accuracy of the first image recognition model can be improved.
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the following detailed description.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. Together with the following specific embodiments, they are used to explain the present disclosure but do not constitute a limitation of the present disclosure. In the accompanying drawings:
图1是相关技术中,车载人脸识别的流程示意图。FIG. 1 is a flow chart of vehicle-mounted face recognition in the related art.
图2是相关技术中,训练模型的流程示意图。FIG. 2 is a flow chart of a training model in the related art.
图3是根据本公开一示例性实施例示出一种样本生成方法的示意图。FIG. 3 is a schematic diagram showing a sample generating method according to an exemplary embodiment of the present disclosure.
图4是根据本公开一示例性实施例示出第一无向图的示意图。FIG. 4 is a schematic diagram showing a first undirected graph according to an exemplary embodiment of the present disclosure.
图5是根据本公开一示例性实施例示出一种图像识别模型的训练方法的流程示意图。FIG5 is a schematic flow chart showing a method for training an image recognition model according to an exemplary embodiment of the present disclosure.
图6是根据本公开一示例性实施例示出三元组样本对选取的流程示意图。FIG. 6 is a schematic diagram showing a process of selecting a triplet sample pair according to an exemplary embodiment of the present disclosure.
图7是根据本公开一示例性实施例示出一种样本生成装置的示意图。FIG. 7 is a schematic diagram showing a sample generating device according to an exemplary embodiment of the present disclosure.
图8是根据一示例性实施例示出一种车辆的框图。FIG. 8 is a block diagram showing a vehicle according to an exemplary embodiment.
具体实施方式Detailed ways
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。The specific implementation of the present disclosure is described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation described herein is only used to illustrate and explain the present disclosure, and is not used to limit the present disclosure.
图像识别技术,主要是通过计算机视觉技术对图像进行分析和处理的识别技术。但是在相关技术中,在进行图像识别时,无法通过少量的样本对图像识别模型进行训练,或者无法得到收敛性稳定的目标图像识别模型。本实施例以图像识别技术中的人脸识别技术为例进行如下说明。Image recognition technology is mainly a recognition technology that analyzes and processes images through computer vision technology. However, in related technologies, when performing image recognition, it is impossible to train the image recognition model with a small number of samples, or it is impossible to obtain a target image recognition model with stable convergence. This embodiment takes face recognition technology in image recognition technology as an example for explanation as follows.
当将人脸识别技术搭载到车辆后,如图1所示,车载人脸识别技术主要包括训练模型、注册身份以及识别三个过程。在训练模型的过程中,通过车载摄像头采集大量驾驶员的人脸图像,并对该人脸图像进行预处理,之后根据预处理后的人脸图像通过深度学习算法对人脸识别模型进行训练,得到目标人脸识别模型。在注册身份过程中,驾驶员录入自己的身份信息,并通过车载摄像头录入自己的人脸图像,其中,该人脸图像包括该驾驶员的正面以及多角度的人脸图像。在识别过程中,通过车载摄像头采集驾驶员的实时人脸图像,并在该实时人脸图像的质量合格时,通过目标人脸识别模型计算出该实时人脸图像与已录入人脸图像的匹配率,并根据该匹配率得到对应驾驶员的身份信息,返回终端,进而可以实现相应的车机控制。When the face recognition technology is installed in the vehicle, as shown in FIG1 , the vehicle-mounted face recognition technology mainly includes three processes: training model, registering identity, and identifying. In the process of training the model, a large number of driver's face images are collected through the vehicle-mounted camera, and the face images are preprocessed. Then, the face recognition model is trained through the deep learning algorithm based on the preprocessed face images to obtain the target face recognition model. In the process of registering identity, the driver enters his own identity information and records his own face image through the vehicle-mounted camera, wherein the face image includes the front and multi-angle face images of the driver. In the process of identifying, the real-time face image of the driver is collected through the vehicle-mounted camera, and when the quality of the real-time face image is qualified, the matching rate between the real-time face image and the recorded face image is calculated through the target face recognition model, and the identity information of the corresponding driver is obtained according to the matching rate, and returned to the terminal, so that the corresponding vehicle control can be realized.
如图2所示,在训练模型过程中,首先采集大量驾驶员的人脸图像,之后通过人脸检测模型从该人脸图像中抠取出面部区域图像,对该面部区域图像进行仿射变换,将面部区域图像对齐到与预设人脸模版同一位置的区域。之后将对齐后的人脸图像送入人脸识别模型中,通过设计合理的损失函数,使得该人脸识别模型收敛,得到目标人脸识别模型,进而可以使得该目标人脸识别模型区分不同身份信息的驾驶员的目的。As shown in Figure 2, in the process of training the model, a large number of driver's face images are first collected, and then the face detection model is used to extract the face area image from the face image, and the face area image is affine transformed to align the face area image to the area at the same position as the preset face template. Then, the aligned face image is sent to the face recognition model, and by designing a reasonable loss function, the face recognition model is converged to obtain the target face recognition model, which can then enable the target face recognition model to distinguish drivers with different identity information.
在训练模型过程中,训练该人脸识别模型中的损失函数主要是通过第一相关技术与第二相关技术中提供的损失函数进行训练。During the model training process, the loss function in the face recognition model is trained mainly through the loss functions provided in the first related technology and the second related technology.
在第一相关技术中,主要是基于margin(决策边界与力边界最近的训练样本之间的距离)的softmax(归一化指数函数)及其变种损失函数,如CosFace(余弦脸),SphereFace(球面脸),ArcFace(弧脸)等。这类损失函数将输入的大量人脸图像视为分类任务,将同一个身份信息对应的人脸图像视为同一类人脸图像。当从多少个不同身份信息的驾驶员采集的人脸图像作为训练样本时,其人脸识别模型训练后输出多少类的人脸图像,训练的最终目的是保证输入的大量人脸图像分类正确。In the first related technology, it is mainly based on the softmax (normalized exponential function) and its variant loss functions based on margin (the distance between the training samples closest to the decision boundary and the force boundary), such as CosFace (cosine face), SphereFace (spherical face), ArcFace (arc face), etc. This type of loss function regards a large number of input face images as classification tasks, and regards face images corresponding to the same identity information as the same type of face images. When face images collected from drivers with different identity information are used as training samples, the face recognition model outputs a certain number of types of face images after training. The ultimate goal of training is to ensure that a large number of input face images are correctly classified.
在第二相关技术中,其损失函数主要是基于度量学习的损失函数,如tripletloss(三元组损失),contrastive loss(对比损失)等。这类方法通过构造三元组样本对,使得同一身份信息的人脸图像之间的距离小于不同身份信息的人脸图像之间的距离。该方法可以将少量的人脸图像以不同的策略进行组合,进而形成海量的训练样本,更适合于人脸图像较少的应用场景。In the second related technology, the loss function is mainly based on metric learning, such as triplet loss, contrastive loss, etc. This type of method constructs triplet sample pairs so that the distance between face images with the same identity information is smaller than the distance between face images with different identity information. This method can combine a small number of face images with different strategies to form a large number of training samples, which is more suitable for application scenarios with fewer face images.
当将三元组损失函数作为人脸识别模型中的损失函数,对人脸图像进行训练时,可以通过如下计算式表达该三元组损失。When the triplet loss function is used as the loss function in the face recognition model to train face images, the triplet loss can be expressed by the following calculation formula.
其中,为输入的三元组样本,a为anchor(锚示例),p为positive(正示例),n为negative(负示例)。且a与p可以代表同一个身份信息对应的不同人脸图像,a与n可以代表不同身份信息对应的人脸图像。in, is the input triplet sample, a is the anchor, p is the positive, and n is the negative. And a and p can represent different face images corresponding to the same identity information, and a and n can represent face images corresponding to different identity information.
在第二相关技术中,通过如下方法选取三元组样本对。在每轮迭代训练人脸识别模型时,利用采集的人脸图像和人脸识别模型在线生成三元组样本对,具体方法包括:在每轮迭代时,从大量的人脸图像中随机抽取p个不同身份信息对应的所有人脸图像,针对p个身份信息中的每个身份信息对应的人脸图像中随机抽取k张人脸图像,将抽取的k张人脸图像构成当前样本。之后针对每个身份信息中的每张人脸图像,将该人脸图像与该身份信息中对应的其他人脸图像构成正样本对,则可以生成个正样本对。之后针对每个正样本对,将该正样本对与其他身份信息中的每张人脸图像构成负样本对,则总共可以生成个三元组样本对。In the second related technology, triple sample pairs are selected by the following method. In each round of iterative training of the face recognition model, triple sample pairs are generated online using the collected face images and the face recognition model. The specific method includes: in each round of iteration, all face images corresponding to p different identity information are randomly extracted from a large number of face images, and k face images are randomly extracted from the face images corresponding to each of the p identity information, and the k extracted face images constitute the current sample. Then, for each face image in each identity information, the face image and other face images corresponding to the identity information constitute a positive sample pair, and then a positive sample pair can be generated. positive sample pairs. Then, for each positive sample pair, the positive sample pair is combined with each face image in other identity information to form a negative sample pair, so a total of triplet sample pairs .
但是发明人发现,当使用第一相关技术中的损失函数训练人脸识别模型时,需要大量的人脸图像样本,但是车辆中获取的人脸图像属于少量样本,进而无法实现对从车辆获取的人脸图像进行人脸识别模型训练。However, the inventors found that when using the loss function in the first related technology to train a face recognition model, a large number of face image samples are required, but the face images obtained from the vehicle are a small number of samples, and thus it is impossible to train a face recognition model on the face images obtained from the vehicle.
当通过第二相关技术中的三元组损失函数训练人脸识别模型时,其当前样本中的人脸图像是随机抽取的,进而可能存在像男人和女人类似差异很大的人脸图像,无法形成有价值的负样本对,导致无法使得训练的人脸识别模型收敛,属于无效样本。通过从每个身份信息对应的人脸图像中随机抽取形成正样本对,该正样本对中可能包括大量相似度很高的人脸图像,进而无法使得训练的人脸识别模型收敛,属于无效样本。且通过随机抽取正样本对和负样本对的方法,可能会导致无法抽取相似度低的正样本对以及相似度高的负样本对,进而使得训练得到的目标人脸识别模型的精度不确定,且可能使得目标人脸识别模型的收敛性不稳定。When the face recognition model is trained by the triplet loss function in the second related technology, the face images in the current sample are randomly extracted, and there may be face images with great differences such as men and women, which cannot form valuable negative sample pairs, resulting in the inability to converge the trained face recognition model, which is an invalid sample. Positive sample pairs are formed by randomly extracting from the face images corresponding to each identity information. The positive sample pairs may include a large number of face images with high similarity, and thus the trained face recognition model cannot converge, which is an invalid sample. And by the method of randomly extracting positive sample pairs and negative sample pairs, it may be impossible to extract positive sample pairs with low similarity and negative sample pairs with high similarity, which makes the accuracy of the trained target face recognition model uncertain, and may make the convergence of the target face recognition model unstable.
有鉴于此,本公开提供样本生成方法、图像识别模型的训练方法及相应装置,以解决上述相关技术中存在的问题。In view of this, the present disclosure provides a sample generation method, an image recognition model training method and corresponding devices to solve the problems existing in the above-mentioned related technologies.
如图3所示,图3是根据本公开一示例性实施例示出一种样本生成方法的示意图,参照图3,包括:As shown in FIG3 , FIG3 is a schematic diagram showing a sample generation method according to an exemplary embodiment of the present disclosure, and referring to FIG3 , the method includes:
S301:确定多个图像集,每个所述图像集包括同一身份信息对应的多张图像;S301: Determine a plurality of image sets, each of the image sets including a plurality of images corresponding to the same identity information;
S302:针对每个所述图像集,对所述图像集对应的多张图像进行聚类,得到多个类内簇,在不同的所述类内簇中选取图像构成正样本对,其中,同一所述类内簇中不同图像对应的特征之间的相似度小于第一相似度阈值;S302: for each of the image sets, clustering multiple images corresponding to the image set to obtain multiple intra-class clusters, selecting images from different intra-class clusters to form positive sample pairs, wherein the similarity between features corresponding to different images in the same intra-class cluster is less than a first similarity threshold;
S303:对所述多个图像集进行聚类,得到多个类间簇,在同一所述类间簇中选取不同图像集对应的图像构成负样本对,其中,同一所述类间簇中不同图像集对应的特征之间的相似度小于第二相似度阈值;S303: clustering the multiple image sets to obtain multiple inter-class clusters, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs, wherein the similarity between the features corresponding to different image sets in the same inter-class cluster is less than a second similarity threshold;
S304:根据所述多个图像集、所述正样本对以及所述负样本对,生成三元组样本对。S304: Generate triplet sample pairs according to the multiple image sets, the positive sample pairs, and the negative sample pairs.
通过上述技术方案,同一个类内簇对应的多张图像的相似度较高,不同的类内簇对应的图像的相似度较低,在不同的类内簇中选取相似度较低的图像构成正样本对,可以减少由于选取相似度较高的同一图像而产生无效正样本对的情况。另外,同一类间簇对应的多个图像集的图像的相似度较高,在同一个类间簇中选取不同图像集对应的图像构成相似度较高负样本对,可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。由此,根据该正样本对以及负样本对、图像集,可以生成大量的三元组样本对。进而再将该三元组样本对用于对第一图像识别模型训练时,可以提高该第一图像识别模型的收敛稳定性以及识别精度。Through the above technical solution, the similarity of multiple images corresponding to the same intra-class cluster is high, and the similarity of images corresponding to different intra-class clusters is low. Selecting images with lower similarity in different intra-class clusters to form positive sample pairs can reduce the situation where invalid positive sample pairs are generated due to selecting the same image with higher similarity. In addition, the similarity of images of multiple image sets corresponding to the same inter-class cluster is high. Selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs with higher similarity can reduce the situation where invalid negative sample pairs are generated due to selecting different images with lower similarity. Therefore, a large number of triple sample pairs can be generated based on the positive sample pairs, negative sample pairs, and image sets. When the triple sample pairs are used to train the first image recognition model, the convergence stability and recognition accuracy of the first image recognition model can be improved.
另外,相较于相关技术中通过大量人脸图像训练人脸识别模型的方法,本公开提供的技术方案无需大量的图像进行训练,可以实现少量图像构成大量的三元组样本对,通过该三元组样本对训练第一图像识别模型。In addition, compared with the method of training face recognition models through a large number of face images in related technologies, the technical solution provided by the present invention does not require a large number of images for training, and can achieve a large number of triple sample pairs composed of a small number of images, and train the first image recognition model through the triple sample pairs.
为了使本领域技术人员更加理解本公开提供的样本生成方法,下面对上述各步骤进行详细举例说明。In order to enable those skilled in the art to better understand the sample generation method provided by the present disclosure, the above steps are described in detail with examples below.
示例地,图像集,可以为采集的一个身份信息对应的多张图像。多个图像集可以为多个身份信息分别对应的多张图像。其中,图像集可以为通过车载摄像头采集的图像,对此,本公开实施例不做限定。图像集可以为人脸图像集,图像集中包括的多张图像可以为多张人脸图像,对此,本公开实施例不做具体限定。For example, an image set may be a plurality of images corresponding to a piece of identity information collected. Multiple image sets may be a plurality of images corresponding to multiple pieces of identity information. Among them, an image set may be an image collected by a vehicle-mounted camera, which is not limited in the embodiments of the present disclosure. An image set may be a face image set, and the multiple images included in the image set may be multiple face images, which is not specifically limited in the embodiments of the present disclosure.
示例地,聚类,可以为将多个数据划分为若干个类别,使得同一类别内的数据彼此相似,不同类别之间的数据差异较大。且同一个图像集可以为同一个身份信息下的多张图像,在该多张图像中,可能存在相似度较高的图像以及相似度较低的图像。由此可以对图像集中的多张图像进行聚类,将相似度较高的图像划分为一类内簇,可以得到多个类内簇,而不同的类内簇之间的图像的相似度较低。可以从不同的类内簇中选取图像构成正样本对,进而可以使得该正样本对的相似度较低,可以避免出现相似度较高的正样本对。其中,同一类内簇内的不同图像对应的特征之间的相似度小于第一相似度阈值,进而可以使得同一类内簇内的不同图像的相似度较高。其中,第一相似度阈值可以两张图像之间相似的临界阈值。For example, clustering can be to divide multiple data into several categories, so that the data in the same category are similar to each other, and the data between different categories are relatively different. And the same image set can be multiple images under the same identity information, among which there may be images with high similarity and images with low similarity. In this way, multiple images in the image set can be clustered, and images with high similarity can be divided into an intra-class cluster, so that multiple intra-class clusters can be obtained, and the similarity of images between different intra-class clusters is low. Images can be selected from different intra-class clusters to form positive sample pairs, so that the similarity of the positive sample pairs can be low, and the occurrence of positive sample pairs with high similarity can be avoided. Among them, the similarity between the features corresponding to different images in the same intra-class cluster is less than the first similarity threshold, so that the similarity of different images in the same intra-class cluster can be high. Among them, the first similarity threshold can be a critical threshold for the similarity between two images.
根据对图像集中的多张图像进行聚类,使得同一个类内簇对应的多张图像的相似度较高,不同的类内簇对应的图像的相似度较低,在不同的类内簇中选取相似度较低的图像构成正样本对,可以减少由于选取相似度较高的同一图像而产生无效正样本对的情况。By clustering multiple images in the image set, the similarity of multiple images corresponding to the same cluster is higher, and the similarity of images corresponding to different clusters is lower. Images with lower similarity are selected from different clusters to form positive sample pairs, which can reduce the situation of invalid positive sample pairs caused by selecting the same image with higher similarity.
在可能的方式中,所述对所述图像集对应的多张图像进行聚类,得到多个类内簇,包括:In a possible manner, clustering the multiple images corresponding to the image set to obtain multiple intra-class clusters includes:
计算所述图像集中每两张图像之间的相似度;Calculating the similarity between every two images in the image set;
将所述图像集中的图像视为节点,将所述相似度小于所述第一相似度阈值的两张图像对应的节点连接,得到第一无向图;Considering the images in the image set as nodes, connecting the nodes corresponding to the two images whose similarity is less than the first similarity threshold, to obtain a first undirected graph;
将所述第一无向图中每个子图对应的图像确定为一类内簇,得到多个类内簇。The image corresponding to each subgraph in the first undirected graph is determined as an intra-cluster of a class, so as to obtain a plurality of intra-cluster of the class.
应当理解的是,相似度,可以为欧式距离,对此,本公开实施例不做具体限定。当相似度为欧式距离时,可以通过如下计算式表达图像集中每两张图像之间的欧式距离。It should be understood that the similarity can be the Euclidean distance, which is not specifically limited in the embodiments of the present disclosure. When the similarity is the Euclidean distance, the Euclidean distance between every two images in the image set can be expressed by the following calculation formula.
其中,N可以为图像集中多张图像的特征维度,m可以为索引值,可以为欧式距离,、均可以为图像集中多张图像形成的矩阵向量中的特征值。Where N can be the feature dimension of multiple images in the image set, and m can be the index value. can be the Euclidean distance, , Both can be eigenvalues in the matrix vector formed by multiple images in the image set.
当欧式距离小于第一相似度阈值时,可以代表两张图像的相似度较高,则可以将该两张图像归为同一个类内簇中。当欧式距离大于第一相似度阈值时,可以代表两张图像的相似度较低,该两张图像可能不属于同一个类内簇。第一无向图,可以通过顶点以及连接这些顶点的边组成的数学结构。When the Euclidean distance is less than the first similarity threshold, it can represent that the similarity between the two images is high, and the two images can be classified into the same intra-class cluster. When the Euclidean distance is greater than the first similarity threshold, it can represent that the similarity between the two images is low, and the two images may not belong to the same intra-class cluster. The first undirected graph can be a mathematical structure composed of vertices and edges connecting these vertices.
进而可以将图像集中的每张图像视为一个节点,将欧氏距离小于第一相似度阈值的两张图像对应的节点进行连接,可以得到第一无向图。在该第一无向图中,可以将连接在一起的多个节点视为一个子图。则可以在第一无向图中,形成多个子图也可以形成一个子图。其中,每个子图可以包括两个节点、三个节点、四个节点等数量不等的节点。当该子图中包括两个节点时,可以代表该节点对应的两张图像的相似度较高,当该子图包括三个节点时,可以代表该三个节点分别对应的图像的相似度较高,以此类推。进而可以将该第一无向图中的子图确定为一类内簇,可以得到多个类内簇。Then, each image in the image set can be regarded as a node, and the nodes corresponding to the two images whose Euclidean distance is less than the first similarity threshold are connected to obtain a first undirected graph. In the first undirected graph, multiple nodes connected together can be regarded as a subgraph. Then, in the first undirected graph, multiple subgraphs can be formed or one subgraph can be formed. Among them, each subgraph can include two nodes, three nodes, four nodes and other nodes of varying numbers. When the subgraph includes two nodes, it can represent that the similarity of the two images corresponding to the nodes is high. When the subgraph includes three nodes, it can represent that the similarity of the images corresponding to the three nodes is high, and so on. Then, the subgraph in the first undirected graph can be determined as an intra-class cluster, and multiple intra-class clusters can be obtained.
如图4所示,图4是根据本公开一示例性实施例示出第一无向图的示意图。在第一无向图中,包括a-h八个节点连接的第一子图、l-n三个节点连接的第二子图以及i-k三个节点连接的第三子图。在第一子图中,a-h八个节点中任意两个节点对应的图像的相似度较高,可以将第一子图确定为一类内簇。在第二子图中,l-n三个节点分别对应的图像的相似度较高,可以将第二子图确定为一类内簇。在第三子图中,i-k三个节点分别对应的图像的相似度较高,可以将第三子图确定为一类内簇。进而从第一子图与第二子图中随机各选取一张图像,该两张图像的相似度较低。从第二子图与第三子图之间随机各选取一张图像,该两张图像的相似度较低。从第一子图与第三子图之间随机各选取一张图像,该两张图像的相似度较低。由此,该第一无向图中可以将三个子图确定为三个类内簇。As shown in FIG. 4, FIG. 4 is a schematic diagram showing a first undirected graph according to an exemplary embodiment of the present disclosure. In the first undirected graph, a first subgraph connected by eight nodes a-h, a second subgraph connected by three nodes l-n, and a third subgraph connected by three nodes i-k are included. In the first subgraph, the similarity of images corresponding to any two of the eight nodes a-h is high, and the first subgraph can be determined as an inner cluster of a class. In the second subgraph, the similarity of images corresponding to the three nodes l-n is high, and the second subgraph can be determined as an inner cluster of a class. In the third subgraph, the similarity of images corresponding to the three nodes i-k is high, and the third subgraph can be determined as an inner cluster of a class. Then, one image is randomly selected from the first subgraph and the second subgraph, and the similarity of the two images is low. One image is randomly selected from the second subgraph and the third subgraph, and the similarity of the two images is low. One image is randomly selected from the first subgraph and the third subgraph, and the similarity of the two images is low. Thus, the three subgraphs can be determined as three inner clusters in the first undirected graph.
根据相似度将图像集中的多张图像划分为多个类内簇,使得同一类间簇对应的多个图像集的图像的相似度较高,在同一个类间簇中选取不同图像集对应的图像构成相似度较高负样本对,可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。According to the similarity, multiple images in the image set are divided into multiple intra-class clusters, so that the similarity of images in multiple image sets corresponding to the same inter-class cluster is higher. In the same inter-class cluster, images corresponding to different image sets are selected to form negative sample pairs with higher similarity, which can reduce the situation of invalid negative sample pairs generated by selecting different images with lower similarity.
在可能的方式中所述在不同的所述类内簇中选取图像构成正样本对,包括:In a possible manner, selecting images from different in-class clusters to form positive sample pairs includes:
从第一类内簇中随机选取第一图像,并从第二类内簇中随机选取第二图像,所述第一类内簇和所述第二类内簇为所述多个类内簇中不同的类内簇;Randomly selecting a first image from a first in-class cluster and randomly selecting a second image from a second in-class cluster, wherein the first in-class cluster and the second in-class cluster are different in-class clusters among the multiple in-class clusters;
将所述第一图像和所述第二图像构成正样本对。The first image and the second image constitute a positive sample pair.
应当理解的是,在构成正样本对时,需要将同一个图像集中相似度较低的两张图像构成为正样本对。由此,可以在第一类内簇内随机选取第一图像,在第二类内簇内随机选取第二图像,而第一类内簇与第二类内簇属于多个类内簇中不同的类内簇。也即是说,第一图像与第二图像为相似度较低的图像,之后可以将第一图像与第二图像构成正样本对,该正样本对即为相似度较低的正样本对。进而可以避免选取相似度较高的负样本对。由此,当该图像集中包括M个类内簇时,可以构成个正样本对。It should be understood that when forming a positive sample pair, two images with low similarity in the same image set need to be formed into a positive sample pair. Thus, a first image can be randomly selected from a first-class cluster, and a second image can be randomly selected from a second-class cluster, and the first-class cluster and the second-class cluster belong to different in-class clusters among multiple in-class clusters. In other words, the first image and the second image are images with low similarity, and then the first image and the second image can be formed into a positive sample pair, which is a positive sample pair with low similarity. This can avoid selecting negative sample pairs with high similarity. Thus, when the image set includes M in-class clusters, it can be formed. Positive sample pairs.
从不同类内簇中选取相似度较低的图像构成正样本对,可以减少由于选取相似度较高的同一图像而产生无效正样本对的情况。且在将该正样本对用于后续对第一图像识别模型进行训练时,可以减少出现无效训练的情况。Selecting images with lower similarity from clusters within different classes to form positive sample pairs can reduce the situation where invalid positive sample pairs are generated due to selecting the same image with higher similarity. And when the positive sample pairs are used for subsequent training of the first image recognition model, the situation where invalid training occurs can be reduced.
示例地,类间簇,可以为相似度较高的不同身份信息对应的多张图像组成的集合。在多个图像集中,可能会存在身份信息不同但是相似的图像。由此,可以对多个图像集进行聚类,将相似度较高的多个图像集划分为一类间簇,可以得到多个类间簇。之后可以在同一个类间簇中选取不同图像集对应的图像构成负样本对。进而可以形成相似度较高的负样本对,以及可以避免出现相似度较低的负样本对。For example, an inter-class cluster can be a set of multiple images corresponding to different identity information with high similarity. In multiple image sets, there may be images with different identity information but similarity. Therefore, multiple image sets can be clustered, and multiple image sets with high similarity can be divided into an inter-class cluster, and multiple inter-class clusters can be obtained. After that, images corresponding to different image sets can be selected in the same inter-class cluster to form negative sample pairs. In this way, negative sample pairs with high similarity can be formed, and negative sample pairs with low similarity can be avoided.
通过聚类方法将多个图像集划分为多个类间簇,并在每个类间簇中选取不同图像集对应的图像构成负样本对,可以得到相似度较高的负样本对。可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。进而当将该负样本对用于训练第一图像识别模型时,可以减少出现无效训练的情况。By dividing multiple image sets into multiple inter-class clusters through a clustering method, and selecting images corresponding to different image sets in each inter-class cluster to form negative sample pairs, negative sample pairs with high similarity can be obtained. The situation of generating invalid negative sample pairs due to selecting different images with low similarity can be reduced. Furthermore, when the negative sample pairs are used to train the first image recognition model, the situation of invalid training can be reduced.
在可能的方式中,所述对所述多个图像集进行聚类,得到多个类间簇,包括:In a possible manner, clustering the multiple image sets to obtain multiple inter-class clusters includes:
计算所述多个图像集中每两个图像集的中心特征之间的相似度,所述图像的中心特征为所述图像集中达到预设条件的图像;Calculating the similarity between the central features of every two image sets in the plurality of image sets, the central feature of the image being the image in the image set that meets a preset condition;
将所述图像集的中心特征视为节点,将所述相似度小于所述第二相似度阈值的两个图像集的中心特征对应的节点连接,得到第二无向图;The central features of the image sets are regarded as nodes, and the nodes corresponding to the central features of the two image sets whose similarities are less than the second similarity threshold are connected to obtain a second undirected graph;
将所述第二无向图中每个子图对应的图像集确定为一类间簇,得到所述多个类间簇。The image set corresponding to each subgraph in the second undirected graph is determined as an inter-class cluster to obtain the multiple inter-class clusters.
应当理解的是,图像集的中心特征可以为图像集中达到预设条件的图像。It should be understood that the central feature of the image set may be the images in the image set that meet the preset conditions.
在确定图像集的中心特征时,可以计算图像集中每张图像分别到剩余图像之间的距离和,将最小的距离和对应的图像确定为该图像集的中心特征。则该预设条件可以为图像集中的图像分别到剩余图像之间的距离和达到最小值。可以通过如下计算式计算得到该图像集的中心特征。When determining the central feature of an image set, the sum of the distances between each image in the image set and the remaining images can be calculated, and the image corresponding to the minimum distance sum can be determined as the central feature of the image set. Then the preset condition can be that the sum of the distances between the images in the image set and the remaining images reaches the minimum value. The central feature of the image set can be calculated by the following calculation formula.
其中,K为图像集中的图像的数量,C为图像集的中心特征,为图像集中两个图像对应的特征之间的欧氏距离,可以为距离和,可以为最小的距离和。Among them, K is the number of images in the image set, C is the central feature of the image set, is the Euclidean distance between the features corresponding to two images in the image set, can be the distance and, Can be the minimum distance and.
在本实施例中,相似度可以为欧式距离,对此,本公开实施例不做具体限定。当相似度为欧式距离时,可以根据图像集的中心特征,计算多个图像集中每两个图像集的中心特征之间的欧式距离。其中,该欧式距离可以通过如下计算式进行表达。In this embodiment, the similarity may be the Euclidean distance, which is not specifically limited in the embodiment of the present disclosure. When the similarity is the Euclidean distance, the Euclidean distance between the central features of every two image sets in the plurality of image sets may be calculated based on the central features of the image sets. The Euclidean distance may be expressed by the following calculation formula.
其中,为每两个图像集的中心特征之间的欧式距离,N为图像集的中心特征的特征维度,m为索引值,、均为图像集的中心特征。in, is the Euclidean distance between the central features of each two image sets, N is the feature dimension of the central feature of the image set, and m is the index value. , are the central features of the image set.
当欧式距离小于第二相似度阈值时,可以代表该相似度对应的两个图像集对应的图像相似度较高,否则,可以代表该相似度对应的两个图像集对应的图像的相似度较低。可以将每个图像集对应的中心特征视为节点,将欧式距离小于第二相似度阈值的两个图像集的中心特征对应的节点连接,可以得到第二无向图。在该第二无向图中,可以包括一个子图,也可以包括多个子图。且在每个子图中,可以包括一个节点对应的一个图像集,也可以包括多个节点分别对应的图像集。且每个子图中不同的图像集之间的图像的相似度较高。也即是说,每个子图可以为相似度较高的不同身份信息对应的图像。将每个子图确定为一类间簇,进而可以根据第二无向图得到多个类间簇。When the Euclidean distance is less than the second similarity threshold, it can represent that the images corresponding to the two image sets corresponding to the similarity are highly similar; otherwise, it can represent that the images corresponding to the two image sets corresponding to the similarity are relatively low similarity. The central feature corresponding to each image set can be regarded as a node, and the nodes corresponding to the central features of the two image sets whose Euclidean distance is less than the second similarity threshold are connected to obtain a second undirected graph. In the second undirected graph, one subgraph or multiple subgraphs can be included. In each subgraph, one image set corresponding to a node can be included, or image sets corresponding to multiple nodes can be included. The images in different image sets in each subgraph have a high similarity. In other words, each subgraph can be an image corresponding to different identity information with a high similarity. Each subgraph is determined as an inter-class cluster, and then multiple inter-class clusters can be obtained according to the second undirected graph.
通过对多个图像集的中心特征进行聚类的方法,将多个图像集划分为多个类间簇,可以将该类间簇用于后续确定负样本对中,进而可以避免出现相似度较低的负样本对。By clustering the central features of multiple image sets, the multiple image sets are divided into multiple inter-class clusters, and the inter-class clusters can be used in the subsequent determination of negative sample pairs, thereby avoiding the occurrence of negative sample pairs with low similarity.
在可能的方式中,所述在同一所述类间簇中选取不同图像集对应的图像构成负样本对,包括:In a possible manner, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs includes:
从同一所述类间簇中随机选取第三图像与第四图像,所述第三图像与所述第四图像为不同图像集对应的图像;Randomly selecting a third image and a fourth image from the same between-class cluster, wherein the third image and the fourth image are images corresponding to different image sets;
将所述第三图像与所述第四图像构成负样本对。The third image and the fourth image form a negative sample pair.
应当理解的是,在每个类间簇中,可以包括相似度较高的多个图像集。可以从同一个类间簇中选取第三图像与第四图像,且第三图像与第四图像属于不同图像集对应的图像。也即是说,第三图像与第四图像属于不同身份信息但是相似度较高的图像。进而可以将第三图像与第四图像构成负样本对,使得该负样本对的相似度较高,可以减少出现相似度较低的负样本对。It should be understood that in each inter-class cluster, multiple image sets with high similarity can be included. The third image and the fourth image can be selected from the same inter-class cluster, and the third image and the fourth image belong to images corresponding to different image sets. In other words, the third image and the fourth image belong to images with different identity information but high similarity. Furthermore, the third image and the fourth image can be formed into a negative sample pair, so that the similarity of the negative sample pair is high, and the occurrence of negative sample pairs with low similarity can be reduced.
通过在同一类间簇中选择相似度较高的负样本对,可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。进而当将该负样本对用于后续训练第一图像识别模型时,可以减少出现无效训练情况,同时可以提高对第一图像识别模型的训练精度。By selecting negative sample pairs with higher similarity in the same inter-class cluster, the situation of generating invalid negative sample pairs due to selecting different images with lower similarity can be reduced. Furthermore, when the negative sample pairs are used for subsequent training of the first image recognition model, the situation of invalid training can be reduced, and the training accuracy of the first image recognition model can be improved.
在可能的方式中,所述在同一所述类间簇中选取不同图像集对应的图像构成负样本对,包括:In a possible manner, selecting images corresponding to different image sets in the same inter-class cluster to form negative sample pairs includes:
针对同一所述类间簇中的每个图像集,从所述图像集中随机选取第五图像;For each image set in the same between-class cluster, randomly selecting a fifth image from the image set;
从剩余图像集的多个类内簇中每个类内簇内中随机选取第六图像,其中,所述剩余图像集为同一所述类间簇中除所述图像集后剩余的图像集;Randomly select a sixth image from each of the multiple intra-class clusters of the remaining image set, wherein the remaining image set is the image set remaining after excluding the image set in the same inter-class cluster;
将所述第五图像与所述第六图像构成所述负样本对。The fifth image and the sixth image constitute the negative sample pair.
应当理解的是,针对同一个类间簇,可以从该类间簇中的每个图像集中随机选取第五图像。之后可以确定剩余图像集中的多个类内簇,在从该多个类内簇的每个类内簇内随机选取第六图像,该剩余图像集为同一类间簇中除图像集后剩余的图像集。将第五图像集与第六图像集构成负样本对,可以得到相似度较高的负样本对。It should be understood that for the same inter-class cluster, the fifth image can be randomly selected from each image set in the inter-class cluster. Then, multiple intra-class clusters in the remaining image set can be determined, and the sixth image is randomly selected from each intra-class cluster of the multiple intra-class clusters. The remaining image set is the image set remaining after the image set is removed from the same inter-class cluster. The fifth image set and the sixth image set constitute a negative sample pair, and a negative sample pair with a high similarity can be obtained.
比如,当该类间簇中包括第一图像集到第四图像集四个图像集时,在计算第一图像集形成的负样本对时,从该第一图像集中随机抽取一张第五图像。当第二图像集包括4个类内簇,第三图像集包括5个类内簇,第四图像集包括8个类内簇时,可以计算第二图像集到第四图像集包括的总的类内簇的个数为17个。可以从该17个类内簇中的每个类内簇随机选取一个第六图像,进而可以得到17个第六图像。将第五图像分别与17个第六图像可以构成17个负样本对。当计算第二图像集到第四图像集中形成的负样本对与第一图像集形成的负样本对的方法相同,此处不再赘述。For example, when the inter-class cluster includes four image sets from the first image set to the fourth image set, when calculating the negative sample pairs formed by the first image set, a fifth image is randomly selected from the first image set. When the second image set includes 4 intra-class clusters, the third image set includes 5 intra-class clusters, and the fourth image set includes 8 intra-class clusters, it can be calculated that the total number of intra-class clusters included in the second image set to the fourth image set is 17. A sixth image can be randomly selected from each of the 17 intra-class clusters, and 17 sixth images can be obtained. The fifth image can be respectively combined with the 17 sixth images to form 17 negative sample pairs. The method for calculating the negative sample pairs formed from the second image set to the fourth image set is the same as that formed by the first image set, and will not be repeated here.
通过将同一类间簇中的每个图像集对应的图像与剩余图像集中的每个类内簇对应的图像组合为负样本对,可以使得选取的负样本对为相似度较高的负样本对,进而可以避免出现相似度较低的负样本对。且基于图像之间的相似度选出有价值的负样本对,并根据该负样本对训练第一图像识别模型,可以避免随机选取样本对导致第一图像识别模型出现震荡的问题,以及可以减少出现无效训练的情况。By combining the images corresponding to each image set in the same inter-class cluster with the images corresponding to each intra-class cluster in the remaining image set as negative sample pairs, the selected negative sample pairs can be negative sample pairs with higher similarity, thereby avoiding the occurrence of negative sample pairs with lower similarity. And by selecting valuable negative sample pairs based on the similarity between images and training the first image recognition model based on the negative sample pairs, the problem of oscillation of the first image recognition model caused by randomly selecting sample pairs can be avoided, and the occurrence of invalid training can be reduced.
示例地,之后可以根据多个图像集、正样本对以及负样本对生成三元组样本对。在生成三元组样本对时,可以将每个图像集中的每个正样本对分别与该图像集所在类间簇形成的负样本对组合为三元组样本对。For example, a triplet sample pair may be generated based on multiple image sets, positive sample pairs, and negative sample pairs. When generating the triplet sample pair, each positive sample pair in each image set may be combined with a negative sample pair formed by the inter-class cluster of the image set into a triplet sample pair.
比如,当该图像集中包括A个类内簇时,在该图像集中可以形成个正样本对。当该图像集所在的类间簇包括5个图像集时,除开当前图像集剩余图像集中的每个图像集的类内簇分别包含P1、P2、P3以及P4个类内簇,从这些类内簇的每个类内簇随机抽取一张图像,一共可以抽取P1+P2+P3+P4张图像。将该图像集中的每个正样本对分别与该多张图像进行组合,一共可以形成个三元组样本对。针对每个图像集,均采用上述方法计算构成的三元组样本对个数,可以通过少量的图像形成大量的三元组样本对。For example, when the image set includes A intra-class clusters, the image set can form positive sample pairs. When the inter-class cluster of the image set includes 5 image sets, the intra-class clusters of each image set except the current image set include P1, P2, P3 and P4 intra-class clusters respectively. An image is randomly selected from each of these intra-class clusters, and a total of P1+P2+P3+P4 images can be extracted. Each positive sample pair in the image set is combined with the multiple images, and a total of For each image set, the above method is used to calculate the number of triple sample pairs, and a large number of triple sample pairs can be formed from a small number of images.
当将相似度较低的正样本对以及相似度较高的负样本对组合为三元组样本对,并将该三元组样本对用于后续训练第一图像识别模型时,可以减少在随机选取三元组样本对时,选取不到相似度较低的正样本对以及相似度较高的负样本对的情况出现。以及可以减少随机选取正样本对以及负样本对导致的训练第一图像识别模型出现震荡的问题,可以使得训练得到的第一图像识别模型的精度更准确,进而可以使得该第一图像识别模型的收敛稳定性更好。When the positive sample pairs with lower similarity and the negative sample pairs with higher similarity are combined into a triple sample pair, and the triple sample pair is used for subsequent training of the first image recognition model, the situation where the positive sample pairs with lower similarity and the negative sample pairs with higher similarity cannot be selected when the triple sample pairs are randomly selected can be reduced. The problem of oscillation in the training of the first image recognition model caused by the random selection of positive sample pairs and negative sample pairs can also be reduced, and the accuracy of the first image recognition model obtained by training can be made more accurate, thereby making the convergence stability of the first image recognition model better.
在可能的方式中,所述确定多个图像集,包括:In a possible manner, the determining of the plurality of image sets includes:
确定车载图像采集装置采集的多个所述图像集。A plurality of image sets captured by the vehicle-mounted image capture device are determined.
应当理解的是,车载图像采集装置,可以为用于采集车辆驾驶员的图像的车载摄像头。其中,该车载摄像头可以为车内监控摄像头、全景摄像头等摄像头,对此,本公开实施例不做具体限定。通过车载图像采集装置可以采集少量的车载图像集,之后通过图像识别模型提取车载图像集中的特征数据,可以得到多个图像集。It should be understood that the vehicle-mounted image acquisition device may be a vehicle-mounted camera for acquiring images of the vehicle driver. The vehicle-mounted camera may be an in-vehicle monitoring camera, a panoramic camera, or the like, which is not specifically limited in the embodiments of the present disclosure. A small number of vehicle-mounted image sets may be acquired through the vehicle-mounted image acquisition device, and then feature data in the vehicle-mounted image sets may be extracted through an image recognition model to obtain multiple image sets.
基于同一构思,本实施例还公开一种图像识别模型的训练方法,所述方法包括:Based on the same concept, this embodiment also discloses a method for training an image recognition model, the method comprising:
根据本实施例公开的样本生成方法生成的三元组样本对,对第一图像识别模型进行训练。The first image recognition model is trained based on the triplet sample pairs generated by the sample generation method disclosed in this embodiment.
示例地,可以通过本实施例公开的样本生产方法,可以根据少量的图像集生成大量的三元组样本对,并将大量的三元组样本对用于对第一图像识别模型进行训练。其中,第一图像识别模型可以为人脸识别模型,对此,本公开实施例不做具体限定。相较于相关技术中获取大量人脸图像训练人脸识别模型,本实施例提供的第一图像识别模型的训练方法无需提供大量的图像,可以对第一图像识别模型进行训练,得到目标图像识别模型。For example, the sample production method disclosed in this embodiment can generate a large number of triple sample pairs based on a small number of image sets, and use the large number of triple sample pairs to train the first image recognition model. The first image recognition model can be a face recognition model, which is not specifically limited in the embodiment of the present disclosure. Compared with the related art of obtaining a large number of face images to train the face recognition model, the training method of the first image recognition model provided in this embodiment does not need to provide a large number of images, and the first image recognition model can be trained to obtain a target image recognition model.
在可能的方式中,在对第一图像识别模型进行训练之前,所述方法还包括:In a possible manner, before training the first image recognition model, the method further includes:
获取车载图像采集装置中的车载图像集,根据所述车载图像集,通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到所述第一图像识别模型。A vehicle-mounted image set in a vehicle-mounted image acquisition device is obtained, and a second image recognition model is trained according to the vehicle-mounted image set by using a normalized exponential softmax loss function to obtain the first image recognition model.
应当理解的是,获取通过车载图像采集装置采集的车载图像集,其中,车载图像采集装置可以为车载摄像头,车载摄像头可以为车内监控摄像头、全景摄像头等摄像头,对此,本公开实施例不做具体限定。之后可以通过该车载图像集,通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到第一图像识别模型。之后再通过三元组样本对对第一图像识别模型进行训练。其中,通过归一化指数softmax损失函数对第二图像识别模型进行训练的训练过程为相关技术中的训练过程,此处,不再赘述。It should be understood that a vehicle-mounted image set collected by a vehicle-mounted image acquisition device is obtained, wherein the vehicle-mounted image acquisition device may be a vehicle-mounted camera, and the vehicle-mounted camera may be an in-vehicle monitoring camera, a panoramic camera, or other cameras, to which the embodiments of the present disclosure do not specifically limit. Afterwards, the second image recognition model may be trained by the vehicle-mounted image set through a normalized exponential softmax loss function to obtain a first image recognition model. Afterwards, the first image recognition model is trained by triplet sample pairs. Among them, the training process of training the second image recognition model by a normalized exponential softmax loss function is a training process in the related art, which will not be described in detail here.
先通过归一化指数softmax损失函数对第二图像识别模型训练,在通过三元组样本对训练第一图像识别模型,可以避免仅通过softmax损失函数训练第二图像识别模型出现的过拟合现象,以及可以避免第一图像识别模型的泛化性低。通过三元组样本对可以将少量的车载图像扩充为大量的三元组样本对数据,进而可以避免因车载图像集的图像少导致无法训练的问题,可以减小样本获取的难度。First, the second image recognition model is trained by the normalized exponential softmax loss function, and then the first image recognition model is trained by the triple sample pairs, which can avoid the overfitting phenomenon that occurs when the second image recognition model is trained only by the softmax loss function, and can avoid the low generalization of the first image recognition model. The triple sample pairs can expand a small number of vehicle images into a large number of triple sample pair data, thereby avoiding the problem of being unable to train due to the small number of images in the vehicle image set, and can reduce the difficulty of sample acquisition.
在每次迭代过程中执行如下操作:通过第一图像识别模型提取车载图像集中的多个图像集,根据多个图像集,通过上述图像识别模型的训练方法训练该第一图像识别模型,得到新的第一图像识别模型与损失值,判断损失值是否小于预设阈值,如果损失值大于预设阈值,则通过新的第一图像识别模型提取车载图像集中新的多个图像集,直到得到的损失值小于预设阈值,停止迭代。将最后一次迭代得到的第一图像识别模型作为目标图像识别模型,将该目标标图像识别模型用于识别车辆中的图像。其中,预设阈值可以表征模型达到收敛条件的阈值。The following operations are performed during each iteration: extract multiple image sets from the vehicle image set through the first image recognition model, train the first image recognition model according to the multiple image sets through the training method of the above-mentioned image recognition model, obtain a new first image recognition model and a loss value, and judge whether the loss value is less than the preset threshold. If the loss value is greater than the preset threshold, extract multiple new image sets from the vehicle image set through the new first image recognition model until the loss value obtained is less than the preset threshold, and stop the iteration. The first image recognition model obtained from the last iteration is used as the target image recognition model, and the target image recognition model is used to identify images in the vehicle. Among them, the preset threshold can represent the threshold at which the model reaches the convergence condition.
通过第二图像识别模型提取车载图像集中的多个图像集,并将该多个图像集组成三元组样本对,进行迭代优化训练,可以提高目标图像识别模型的精度以及提高目标图像识别模型的收敛稳定性。By extracting multiple image sets from the vehicle-mounted image set through the second image recognition model, and grouping the multiple image sets into triplet sample pairs, and performing iterative optimization training, the accuracy of the target image recognition model can be improved and the convergence stability of the target image recognition model can be improved.
在可能的方式中,在通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到所述第一图像识别模型之前,还包括:In a possible manner, before training the second image recognition model by using the normalized exponential softmax loss function to obtain the first image recognition model, the method further includes:
获取RGB图像集;Get RGB image set;
对所述RGB图像集进行灰度处理,得到灰度图像集;Performing grayscale processing on the RGB image set to obtain a grayscale image set;
根据所述灰度图像集,对第三图像识别模型进行预训练,得到所述第二图像识别模型。The third image recognition model is pre-trained according to the grayscale image set to obtain the second image recognition model.
应当理解的是,RGB图像集,可以为直接通过网络下载的大量的各种图像集。可以通过如下计算式对该RGB图像集进行灰度处理,得到灰度图像集。It should be understood that the RGB image set may be a large number of various image sets directly downloaded from the Internet. The RGB image set may be gray-scaled by the following calculation formula to obtain a gray-scale image set.
其中,为转换后的点的像素值,为RGB图像集中图像的像素点的像素值,为转化前R通道点的像素值,为转化前G通道点的像素值,为转化前B通道点的像素值。in, For the converted The pixel value of the point, is the pixel value of the pixel point in the RGB image set, is the R channel point before transformation The pixel value of G channel point before conversion The pixel value of is the B channel point before conversion The pixel value of .
之后通过灰度图像集对第三图像识别模型进行预训练,得到第二图像识别模型,其中,通过灰度图像集对第三图像识别模型进行预训练的具体训练过程为相关技术中的训练过程,在此,不再赘述。之后可以将第二图像识别模型应用到后续通过车载图像集进行微调,进而得到第一图像识别模型。通过从大量RGB图像进行灰度处理后,将大量灰度图像集对第三图像识别模型进行预训练,进而可以提高第一图像识别模型的收敛速度以及泛化性。Afterwards, the third image recognition model is pre-trained with a grayscale image set to obtain a second image recognition model, wherein the specific training process of pre-training the third image recognition model with a grayscale image set is the training process in the related art and will not be described here. The second image recognition model can then be applied to subsequent fine-tuning with an on-board image set to obtain a first image recognition model. By pre-training the third image recognition model with a large number of grayscale image sets after grayscale processing from a large number of RGB images, the convergence speed and generalization of the first image recognition model can be improved.
参照图5,图5是根据本公开一示例性实施例示出一种图像识别模型的训练方法的流程示意图,如图5所示,该图像识别模型的流程包括以下步骤。5 , which is a flow chart showing a method for training an image recognition model according to an exemplary embodiment of the present disclosure. As shown in FIG5 , the process of the image recognition model includes the following steps.
S500:通过RGB图像集训练第三图像识别模型,得到第二图像识别模型。S500: Train a third image recognition model using an RGB image set to obtain a second image recognition model.
S501:通过归一化指数softmax损失函数训练第二图像识别模型,得到第一图像识别模型。S501: Train the second image recognition model through a normalized exponential softmax loss function to obtain a first image recognition model.
S502:生成三元组样本对。S502: Generate triplet sample pairs.
S503:训练第一图像识别模型。S503: Train a first image recognition model.
S504:迭代优化,返回步骤S502。S504: Iterate optimization and return to step S502.
S505:判断损失值是否达到预设阈值,如果达到,则执行步骤S506,否则执行步骤S504。S505: Determine whether the loss value reaches a preset threshold, if so, execute step S506, otherwise execute step S504.
S506:得到目标图像识别模型。S506: Obtain a target image recognition model.
参照图6,图6是根据本公开一示例性实施例示出三元组样本对选取的流程示意图。如图6所示,三元组样本对的选取方法的流程包括以下步骤。Referring to Fig. 6, Fig. 6 is a schematic diagram showing a process of selecting a triplet sample pair according to an exemplary embodiment of the present disclosure. As shown in Fig. 6, the process of the method for selecting a triplet sample pair includes the following steps.
S600:通过第一图像识别模型提取多个图像集,分别执行步骤S601与步骤S603。S600: extract multiple image sets through the first image recognition model, and execute step S601 and step S603 respectively.
S601:对每个图像集对应的多张图像聚类。S601: Clustering multiple images corresponding to each image set.
S602:不同类内簇内选取正样本对,执行步骤S606。S602: Select positive sample pairs within clusters in different classes and execute step S606.
S603:计算每个图像集的中心特征。S603: Calculate the central feature of each image set.
S604:对所有的图像集的中心特征进行聚类。S604: Clustering the central features of all image sets.
S605:同一个类间簇内选取负样本对,执行步骤S606。S605: Select negative sample pairs within the same inter-class cluster and execute step S606.
S606:构成三元组样本对。S606: Construct triplet sample pairs.
上述各流程步骤的具体实施方式已在上文进行详细举例说明,这里不再赘述。另外应当理解的是,对于上述系统实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受上文所描述的动作顺序的限制。其次,本领域技术人员也应该知悉,上文所描述的实施例属于优选实施例,所涉及的步骤并不一定是本公开所必须的。The specific implementation methods of the above-mentioned process steps have been illustrated in detail above and will not be repeated here. In addition, it should be understood that for the above-mentioned system embodiments, for the sake of simplicity of description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited to the order of actions described above. Secondly, those skilled in the art should also know that the embodiments described above belong to preferred embodiments, and the steps involved are not necessarily required by the present disclosure.
通过上述技术方案,在不同的类内簇中选取相似度较低的图像构成正样本对,可以减少由于选取相似度较高的同一图像而产生无效正样本对的情况。在同一个类间簇中选取不同图像构成负样本对,可以减少由于选取相似度较低的不同图像而产生无效负样本对的情况。由此,根据该正样本对以及负样本对、图像集,生成三元组样本对训练第一图像识别模型,可以提高该第一图像识别模型的收敛稳定性以及识别精度。以及可以减少随机选取正样本对以及负样本对导致的训练第一图像识别模型出现震荡的问题。Through the above technical solution, by selecting images with lower similarity in different intra-class clusters to form positive sample pairs, the situation of generating invalid positive sample pairs due to selecting the same image with higher similarity can be reduced. By selecting different images in the same inter-class cluster to form negative sample pairs, the situation of generating invalid negative sample pairs due to selecting different images with lower similarity can be reduced. Thus, according to the positive sample pairs, negative sample pairs, and image sets, a triplet sample pair is generated to train the first image recognition model, which can improve the convergence stability and recognition accuracy of the first image recognition model. And the problem of oscillation in the training of the first image recognition model caused by randomly selecting positive sample pairs and negative sample pairs can be reduced.
基于同一构思,本实施例还提供一种样本生成装置700,参照图7,图7是根据本公开一示例性实施例示出一种样本生成装置700的示意图,如图7所示,包括:Based on the same concept, this embodiment further provides a sample generating device 700. Referring to FIG. 7 , FIG. 7 is a schematic diagram showing a sample generating device 700 according to an exemplary embodiment of the present disclosure. As shown in FIG. 7 , the sample generating device 700 includes:
第一确定模块701,用于确定多个图像集,每个所述图像集包括同一身份信息对应的多张图像;A first determining module 701 is used to determine a plurality of image sets, each of which includes a plurality of images corresponding to the same identity information;
第一选取模块702,用于针对每个所述图像集,对所述图像集对应的多张图像进行聚类,得到多个类内簇,在不同的所述类内簇中选取图像构成正样本对,其中,同一所述类内簇中不同图像对应的特征之间的相似度小于第一相似度阈值;A first selection module 702 is used to cluster multiple images corresponding to each of the image sets to obtain multiple intra-class clusters, and select images from different intra-class clusters to form positive sample pairs, wherein the similarity between features corresponding to different images in the same intra-class cluster is less than a first similarity threshold;
第二选取模块703,用于对所述多个图像集进行聚类,得到多个类间簇,在同一所述类间簇中选取不同图像集对应的图像构成负样本对,其中,同一所述类间簇中不同图像集对应的特征之间的相似度小于第二相似度阈值;A second selection module 703 is used to cluster the multiple image sets to obtain multiple inter-class clusters, and select images corresponding to different image sets in the same inter-class cluster to form negative sample pairs, wherein the similarity between the features corresponding to different image sets in the same inter-class cluster is less than a second similarity threshold;
生成模块704,用于根据所述多个图像集、所述正样本对以及所述负样本对,生成三元组样本对。The generating module 704 is configured to generate triplet sample pairs according to the multiple image sets, the positive sample pairs and the negative sample pairs.
可选地,所述第一选取模块702包括:Optionally, the first selection module 702 includes:
第一计算模块,用于计算所述图像集中每两张图像之间的相似度;A first calculation module, used for calculating the similarity between every two images in the image set;
第一连接模块,用于将所述图像集中的图像视为节点,将所述相似度小于所述第一相似度阈值的两张图像对应的节点连接,得到第一无向图;A first connection module, configured to regard the images in the image set as nodes, and connect the nodes corresponding to the two images whose similarity is less than the first similarity threshold, to obtain a first undirected graph;
第二确定模块,用于将所述第一无向图中每个子图对应的图像确定为一类内簇,得到多个类内簇。The second determining module is used to determine the image corresponding to each subgraph in the first undirected graph as an intra-cluster of a class to obtain multiple intra-clusters.
可选地,所述第一选取模块702包括:Optionally, the first selection module 702 includes:
第一选取子模块,用于从第一类内簇中随机选取第一图像,并从第二类内簇中随机选取第二图像,所述第一类内簇和所述第二类内簇为所述多个类内簇中不同的类内簇;A first selection submodule is used to randomly select a first image from a first in-class cluster and randomly select a second image from a second in-class cluster, wherein the first in-class cluster and the second in-class cluster are different in-class clusters among the multiple in-class clusters;
第一构成模块,用于将所述第一图像和所述第二图像构成正样本对。The first forming module is used to form a positive sample pair with the first image and the second image.
可选地,所述第二选取模块703包括:Optionally, the second selection module 703 includes:
第二计算模块,用于计算所述多个图像集中每两个图像集的中心特征之间的相似度,所述图像的中心特征为所述图像集中达到预设条件的图像;A second calculation module is used to calculate the similarity between the central features of every two image sets in the multiple image sets, where the central feature of the image is the image in the image set that meets a preset condition;
第二连接模块,用于将所述图像集的中心特征视为节点,将所述相似度小于所述第二相似度阈值的两个图像集的中心特征对应的节点连接,得到第二无向图;A second connection module, configured to regard the central features of the image sets as nodes, and connect the nodes corresponding to the central features of two image sets whose similarity is less than the second similarity threshold, to obtain a second undirected graph;
第三确定模块,用于将所述第二无向图中每个子图对应的图像集确定为一类间簇,得到所述多个类间簇。The third determination module is used to determine the image set corresponding to each subgraph in the second undirected graph as an between-class cluster to obtain the multiple between-class clusters.
可选地,所述第二选取模块703包括:Optionally, the second selection module 703 includes:
第二选取子模块,用于从同一所述类间簇中随机选取第三图像与第四图像,所述第三图像与所述第四图像为不同图像集对应的图像;A second selection submodule is used to randomly select a third image and a fourth image from the same between-class cluster, wherein the third image and the fourth image are images corresponding to different image sets;
第二构成模块,用于将所述第三图像与所述第四图像构成负样本对。The second forming module is used to form a negative sample pair with the third image and the fourth image.
可选地,所述第二选取模块703包括:Optionally, the second selection module 703 includes:
第三子选取模块,用于针对同一所述类间簇中的每个图像集,从所述图像集中随机选取第五图像;A third sub-selection module is used for randomly selecting a fifth image from each image set in the same between-class cluster;
第四子选取模块,用于从剩余图像集的多个类内簇中每个类内簇内中随机选取第六图像,其中,所述剩余图像集为同一所述类间簇中除所述图像集后剩余的图像集;a fourth sub-selection module, configured to randomly select a sixth image from each of the multiple intra-class clusters of the remaining image set, wherein the remaining image set is the image set remaining after excluding the image set in the same inter-class cluster;
第三构成模块,用于将所述第五图像与所述第六图像构成所述负样本对。The third forming module is used to form the negative sample pair with the fifth image and the sixth image.
可选地,所述第一确定模块还用于:Optionally, the first determining module is further configured to:
确定车载图像采集装置采集的图像集。Determine the image set collected by the vehicle-mounted image acquisition device.
基于同一构思,本实施例还公开一种图像识别模型的训练装置,所述训练装置包括:Based on the same concept, this embodiment also discloses a training device for an image recognition model, the training device comprising:
第一训练模块,用于根据本实施例公开的样本生成方法生成的三元组样本对,对第一图像识别模型进行训练。The first training module is used to train the first image recognition model according to the triplet sample pairs generated by the sample generation method disclosed in this embodiment.
可选地,所述训练装置还包括:Optionally, the training device further comprises:
第二训练模块,用于获取车载图像采集装置中的车载图像集,根据所述车载图像集,通过归一化指数softmax损失函数对第二图像识别模型进行训练,得到所述第一图像识别模型。The second training module is used to obtain a vehicle-mounted image set in a vehicle-mounted image acquisition device, and train a second image recognition model based on the vehicle-mounted image set through a normalized exponential softmax loss function to obtain the first image recognition model.
可选地,所述训练装置还包括:Optionally, the training device further comprises:
图像采集模块,用于获取RGB图像集;Image acquisition module, used to obtain RGB image sets;
图像处理模块,用于对所述RGB图像集进行灰度处理,得到灰度图像集;An image processing module, used for performing grayscale processing on the RGB image set to obtain a grayscale image set;
第三训练模块,用于根据所述灰度图像集,对第三图像识别模型进行预训练,得到所述第二图像识别模型。The third training module is used to pre-train the third image recognition model according to the grayscale image set to obtain the second image recognition model.
基于同一构思,本实施例还提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本实施例提供的样本生成方法与图像识别模型的训练方法的步骤。Based on the same concept, this embodiment also provides a non-temporary computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the sample generation method and the image recognition model training method provided in this embodiment are implemented.
基于同一构思,本实施例还提供一种控制器,包括:Based on the same concept, this embodiment also provides a controller, including:
存储器,其上存储有计算机程序;a memory having a computer program stored thereon;
处理器,用于执行所述存储器中的所述计算机程序,以实现本实施例提供的样本生成方法与图像识别模型的训练方法的步骤。The processor is used to execute the computer program in the memory to implement the steps of the sample generation method and the image recognition model training method provided in this embodiment.
在另一示例性实施例中,还提供一种控制器。该控制器可以是集成电路(Integrated Circuit,IC)或芯片,其中该集成电路可以是一个IC,也可以是多个IC的集合;该芯片可以包括但不限于以下种类:GPU(Graphics Processing Unit,图形处理器)、CPU(Central Processing Unit,中央处理器)、FPGA(Field Programmable Gate Array,可编程逻辑阵列)、DSP(Digital Signal Processor,数字信号处理器)、ASIC(ApplicationSpecific Integrated Circuit,专用集成电路)、SOC(System on Chip,SoC,片上系统或系统级芯片)等。上述的集成电路或芯片中可以用于执行可执行指令(或代码),以实现上述的样本生成方法与图像识别模型的训练方法。其中该可执行指令可以存储在该集成电路或芯片中,也可以从其他的装置或设备获取,例如该集成电路或芯片中包括处理器、存储器,以及用于与其他的装置通信的接口。该可执行指令可以存储于该存储器中,当该可执行指令被处理器执行时实现上述的样本生成方法与图像识别模型的训练方法;或者,该集成电路或芯片可以通过该接口接收可执行指令并传输给该处理器执行,以实现上述的样本生成方法与图像识别模型的训练方法。In another exemplary embodiment, a controller is also provided. The controller may be an integrated circuit (IC) or a chip, wherein the integrated circuit may be an IC or a collection of multiple ICs; the chip may include but is not limited to the following types: GPU (Graphics Processing Unit), CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), SOC (System on Chip, SoC), etc. The above-mentioned integrated circuit or chip can be used to execute executable instructions (or codes) to implement the above-mentioned sample generation method and image recognition model training method. The executable instructions can be stored in the integrated circuit or chip, or can be obtained from other devices or equipment, for example, the integrated circuit or chip includes a processor, a memory, and an interface for communicating with other devices. The executable instructions can be stored in the memory, and when the executable instructions are executed by the processor, the above-mentioned sample generation method and image recognition model training method are implemented; alternatively, the integrated circuit or chip can receive the executable instructions through the interface and transmit them to the processor for execution, so as to implement the above-mentioned sample generation method and image recognition model training method.
基于同一构思,本实施例还提供一种车辆,包括本实施例提供的控制器。Based on the same concept, this embodiment also provides a vehicle, including the controller provided by this embodiment.
图8是根据一示例性实施例示出的一种车辆800的框图。例如,车辆800可以是混合动力车辆,也可以是非混合动力车辆、电动车辆、燃料电池车辆或者其他类型的车辆。车辆800可以是自动驾驶车辆或者半自动驾驶车辆。8 is a block diagram of a vehicle 800 according to an exemplary embodiment. For example, the vehicle 800 may be a hybrid vehicle, a non-hybrid vehicle, an electric vehicle, a fuel cell vehicle, or other types of vehicles. The vehicle 800 may be an autonomous vehicle or a semi-autonomous vehicle.
参照图8,车辆800可包括各种子系统,例如,信息娱乐系统810、感知系统820、决策控制系统830、驱动系统840以及计算平台850。其中,车辆800还可以包括更多或更少的子系统,并且每个子系统都可包括多个部件。另外,车辆800的每个子系统之间和每个部件之间可以通过有线或者无线的方式实现互连。8 , vehicle 800 may include various subsystems, such as an infotainment system 810, a perception system 820, a decision control system 830, a drive system 840, and a computing platform 850. Vehicle 800 may also include more or fewer subsystems, and each subsystem may include multiple components. In addition, each subsystem and each component of vehicle 800 may be interconnected by wire or wireless means.
在一些实施例中,信息娱乐系统810可以包括通信系统,娱乐系统以及导航系统等。In some embodiments, the infotainment system 810 may include a communication system, an entertainment system, and a navigation system, etc.
感知系统820可以包括若干种传感器,用于感测车辆800周边的环境的信息。例如,感知系统820可包括全球定位系统(全球定位系统可以是GPS系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)、激光雷达、毫米波雷达、超声雷达以及摄像装置。The perception system 820 may include several sensors for sensing information about the environment around the vehicle 800. For example, the perception system 820 may include a global positioning system (the global positioning system may be a GPS system, or a Beidou system or other positioning systems), an inertial measurement unit (IMU), a laser radar, a millimeter wave radar, an ultrasonic radar, and a camera.
决策控制系统830可以包括计算系统、整车控制器、转向系统、油门以及制动系统。The decision control system 830 may include a computing system, a vehicle controller, a steering system, a throttle, and a braking system.
驱动系统840可以包括为车辆800提供动力运动的组件。在一个实施例中,驱动系统840可以包括引擎、能量源、传动系统和车轮。引擎可以是内燃机、电动机、空气压缩引擎中的一种或者多种的组合。引擎能够将能量源提供的能量转换成机械能量。The drive system 840 may include components that provide powered motion for the vehicle 800. In one embodiment, the drive system 840 may include an engine, an energy source, a transmission system, and wheels. The engine may be one or a combination of an internal combustion engine, an electric motor, and an air compression engine. The engine is capable of converting energy provided by the energy source into mechanical energy.
车辆800的部分或所有功能受计算平台850控制。计算平台850可包括至少一个处理器851和存储器852,处理器851可以执行存储在存储器852中的指令853。Some or all functions of the vehicle 800 are controlled by a computing platform 850. The computing platform 850 may include at least one processor 851 and a memory 852, and the processor 851 may execute instructions 853 stored in the memory 852.
处理器851可以是任何常规的处理器,诸如商业可获得的CPU。处理器还可以包括诸如图像处理器(Graphic Process Unit,GPU),现场可编程门阵列(Field ProgrammableGate Array,FPGA)、片上系统(System on Chip,SOC)、专用集成芯片(ApplicationSpecific Integrated Circuit,ASIC)或它们的组合。The processor 851 may be any conventional processor, such as a commercially available CPU. The processor may also include a graphics processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a system on chip (System on Chip, SOC), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or a combination thereof.
存储器852可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 852 may be implemented by any type of volatile or nonvolatile memory device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.
除了指令853以外,存储器852还可存储数据,例如道路地图,路线信息,车辆的位置、方向、速度等数据。存储器852存储的数据可以被计算平台850使用。In addition to instructions 853 , memory 852 may also store data, such as road maps, route information, vehicle location, direction, speed, etc. The data stored in memory 852 may be used by computing platform 850 .
在本公开实施例中,处理器851可以执行指令853,以完成上述的样本生成方法与图像识别模型的训练方法的全部或部分步骤。In the embodiment of the present disclosure, the processor 851 can execute instruction 853 to complete all or part of the steps of the above-mentioned sample generation method and image recognition model training method.
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的样本生成方法与图像识别模型的训练方法的步骤。例如,该计算机可读存储介质可以为上述包括程序指令的存储器852,上述程序指令可由车辆800的处理器851执行以完成上述的样本生成方法与图像识别模型的训练方法。在另一示例性实施例中,还提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于执行上述的样本生成方法与图像识别模型的训练方法的代码部分。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned sample generation method and the training method of the image recognition model are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 852 including program instructions, and the above-mentioned program instructions can be executed by the processor 851 of the vehicle 800 to complete the above-mentioned sample generation method and the training method of the image recognition model. In another exemplary embodiment, a computer program product is also provided, which includes a computer program that can be executed by a programmable device, and the computer program has a code portion for executing the above-mentioned sample generation method and the training method of the image recognition model when executed by the programmable device.
在另一示例性实施例中,还提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于执行上述的样本生成方法与图像识别模型的训练方法的代码部分。In another exemplary embodiment, a computer program product is also provided, which includes a computer program that can be executed by a programmable device, and the computer program has a code portion for executing the above-mentioned sample generation method and image recognition model training method when executed by the programmable device.
以上结合附图详细描述了本公开的优选实施方式,但是,本公开并不限于上述实施方式中的具体细节,在本公开的技术构思范围内,可以对本公开的技术方案进行多种简单变型,这些简单变型均属于本公开的保护范围。The preferred embodiments of the present disclosure are described in detail above in conjunction with the accompanying drawings; however, the present disclosure is not limited to the specific details in the above embodiments. Within the technical concept of the present disclosure, a variety of simple modifications can be made to the technical solution of the present disclosure, and these simple modifications all fall within the protection scope of the present disclosure.
另外需要说明的是,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本公开对各种可能的组合方式不再另行说明。It should also be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present disclosure will not further describe various possible combinations.
此外,本公开的各种不同的实施方式之间也可以进行任意组合,只要其不违背本公开的思想,其同样应当视为本公开所公开的内容。In addition, various embodiments of the present disclosure may be arbitrarily combined, and as long as they do not violate the concept of the present disclosure, they should also be regarded as the contents disclosed by the present disclosure.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410758335.3A CN118351399B (en) | 2024-06-13 | 2024-06-13 | Sample generation method, training method of image recognition model and corresponding devices |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410758335.3A CN118351399B (en) | 2024-06-13 | 2024-06-13 | Sample generation method, training method of image recognition model and corresponding devices |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118351399A true CN118351399A (en) | 2024-07-16 |
| CN118351399B CN118351399B (en) | 2025-05-13 |
Family
ID=91824876
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410758335.3A Active CN118351399B (en) | 2024-06-13 | 2024-06-13 | Sample generation method, training method of image recognition model and corresponding devices |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118351399B (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110163265A (en) * | 2019-04-30 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Data processing method, device and computer equipment |
| CN112329826A (en) * | 2020-10-24 | 2021-02-05 | 中国人民解放军空军军医大学 | Training method of image recognition model, image recognition method and device |
| WO2022033150A1 (en) * | 2020-08-11 | 2022-02-17 | Oppo广东移动通信有限公司 | Image recognition method, apparatus, electronic device, and storage medium |
| CN115130536A (en) * | 2022-04-08 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Training method of feature extraction model, data processing method, device and equipment |
| WO2023143016A1 (en) * | 2022-01-26 | 2023-08-03 | 北京字跳网络技术有限公司 | Feature extraction model generation method and apparatus, and image feature extraction method and apparatus |
| US20230386244A1 (en) * | 2022-05-31 | 2023-11-30 | Ubtech Robotics Corp Ltd | Person re-identification method, computer-readable storage medium, and terminal device |
-
2024
- 2024-06-13 CN CN202410758335.3A patent/CN118351399B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110163265A (en) * | 2019-04-30 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Data processing method, device and computer equipment |
| WO2022033150A1 (en) * | 2020-08-11 | 2022-02-17 | Oppo广东移动通信有限公司 | Image recognition method, apparatus, electronic device, and storage medium |
| CN112329826A (en) * | 2020-10-24 | 2021-02-05 | 中国人民解放军空军军医大学 | Training method of image recognition model, image recognition method and device |
| WO2023143016A1 (en) * | 2022-01-26 | 2023-08-03 | 北京字跳网络技术有限公司 | Feature extraction model generation method and apparatus, and image feature extraction method and apparatus |
| CN115130536A (en) * | 2022-04-08 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Training method of feature extraction model, data processing method, device and equipment |
| US20230386244A1 (en) * | 2022-05-31 | 2023-11-30 | Ubtech Robotics Corp Ltd | Person re-identification method, computer-readable storage medium, and terminal device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118351399B (en) | 2025-05-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112166304B (en) | Error detection of sensor data | |
| CN111860227B (en) | Method, apparatus and computer storage medium for training trajectory planning model | |
| US11586865B2 (en) | Apparatus, system and method for fusing sensor data to do sensor translation | |
| US10037471B2 (en) | System and method for image analysis | |
| CN111062405B (en) | Method and device for training image recognition model and image recognition method and device | |
| CN111652293A (en) | Vehicle weight recognition method for multi-task joint discrimination learning | |
| CN111837163B (en) | System and method for vehicle wheel detection | |
| CN114821530B (en) | Lane line detection method and system based on deep learning | |
| US11600078B2 (en) | Information processing apparatus, information processing method, vehicle, information processing server, and storage medium | |
| WO2023125628A1 (en) | Neural network model optimization method and apparatus, and computing device | |
| US20240212319A1 (en) | Classification of objects present on a road | |
| CN116626670B (en) | Automatic driving model generation method and device, vehicle and storage medium | |
| US11860627B2 (en) | Image processing apparatus, vehicle, control method for information processing apparatus, storage medium, information processing server, and information processing method for recognizing a target within a captured image | |
| EP4295271A1 (en) | Apparatus, system and method for translating sensor label data between sensor domains | |
| CN115713629A (en) | Debias data set for machine learning | |
| CN115129886B (en) | Driving scene recognition method and device and vehicle | |
| CN115147812A (en) | Lane line detection method, lane line detection device, vehicle, and storage medium | |
| CN118351399A (en) | Sample generation method, image recognition model training method and corresponding device | |
| CN118254761A (en) | Vehicle range extender power optimal control method, device, equipment and medium | |
| CN114063597B (en) | Server, vehicle control device, and vehicle machine learning system | |
| CN115019116B (en) | Learning device and learning method | |
| CN114170587A (en) | Vehicle indicator lamp identification method and device, computer equipment and storage medium | |
| KR102488983B1 (en) | Method and apparatus for determining a curved lane in cluttered environments | |
| CN120069569B (en) | Driving scene risk assessment method and system | |
| CN117172096B (en) | Simulation method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |