+

CN115761851A - Optimization Method of Cosine Optimal Loss Function Based on Global Information - Google Patents

Optimization Method of Cosine Optimal Loss Function Based on Global Information Download PDF

Info

Publication number
CN115761851A
CN115761851A CN202211442334.5A CN202211442334A CN115761851A CN 115761851 A CN115761851 A CN 115761851A CN 202211442334 A CN202211442334 A CN 202211442334A CN 115761851 A CN115761851 A CN 115761851A
Authority
CN
China
Prior art keywords
class
cosine
loss function
optimal
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211442334.5A
Other languages
Chinese (zh)
Other versions
CN115761851B (en
Inventor
魏欣
毛日强
张远来
万欢
晏斐
徐健锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN202211442334.5A priority Critical patent/CN115761851B/en
Publication of CN115761851A publication Critical patent/CN115761851A/en
Application granted granted Critical
Publication of CN115761851B publication Critical patent/CN115761851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本发明提出一种基于全局信息的余弦最优损失函数的优化方法,包括:S1.将现有损失函数的优点与一些重要的新属性相结合,应用L2权重归一化;S2.明确遵循最小化类内变化和最大化类间变化两个目标,依靠一种新的算法来学习类中心和类边缘的余弦相似度,并分别提出两个轻量化版本的余弦最优损失函数;S3.整合上述两个轻量化版本来创建余弦最优损失函数的标准版本。本发明主要针对现有损失函数没有应用权重和特征归一化或未明确遵循最小化类内变化和最大化类间变化的问题,使用全局信息作为人脸识别的反馈,提出了一种基于全局信息的余弦最优损失函数,相比于现有的损失函数,该损失函数更加有效并实现了更先进的性能。The present invention proposes a cosine optimal loss function optimization method based on global information, including: S1. Combining the advantages of the existing loss function with some important new attributes, applying L2 weight normalization; S2. Explicitly following the minimum The two goals of minimizing intra-class change and maximizing inter-class change rely on a new algorithm to learn the cosine similarity of the class center and class edge, and propose two lightweight versions of the cosine optimal loss function; S3. Integration The above two lightweight versions are used to create a standard version of the cosine-optimal loss function. The present invention mainly aims at the problem that the existing loss function does not apply weight and feature normalization or does not clearly follow the problem of minimizing intra-class change and maximizing inter-class change, using global information as the feedback of face recognition, and proposes a global-based A cosine-optimal loss function for information that is more efficient and achieves state-of-the-art performance compared to existing loss functions.

Description

基于全局信息的余弦最优损失函数的优化方法Optimization Method of Cosine Optimal Loss Function Based on Global Information

技术领域technical field

本发明涉及人工智能、机器学习与人脸识别技术领域,具体涉及可应用于人脸识别且基于全局信息的余弦最优损失函数的优化方法。The invention relates to the technical fields of artificial intelligence, machine learning and face recognition, in particular to an optimization method of a cosine optimal loss function that can be applied to face recognition and is based on global information.

背景技术Background technique

卷积神经网络(CNN)在人脸识别方面表现出令人印象深刻的性能,其中损失函数在此过程中起着重要作用。为了学习到具有高度判别能力的特征,近年来提出了许多不同的损失函数。目前,人脸识别中表现最好的损失函数可以分为两类——基于欧几里德距离的损失函数和基于余弦相似度的损失函数。Convolutional Neural Networks (CNNs) have shown impressive performance in face recognition, where loss functions play an important role in the process. To learn highly discriminative features, many different loss functions have been proposed in recent years. At present, the best-performing loss functions in face recognition can be divided into two categories - loss functions based on Euclidean distance and loss functions based on cosine similarity.

Softmax损失可以表述为:

Figure BDA0003946439500000011
其中N表示批量大小,P表示整个训练集中的类别数,fi∈Rd是属于第yi个类的第i个样本的特征向量,Wi∈Rd是权重矩阵W的第j列最后的全连接层,bj是第j个类的偏置项。典型的基于欧几里德距离的损失包括中心损失、间隔损失和范围损失。它们都添加了额外的惩罚来实现与softmax损失的联合监督,并且基于以下两个目标进行设计:最小化类内变化和最大化类间变化。这两个目标都对性能提升有贡献。基于余弦相似度的损失函数包括L-Softmax损失、A-Softmax损失和AM-Softmax损失。它们是通过添加额外的间隔约束从softmax损失衍生出来的。L2权重归一化提高了性能,尽管改进非常有限。特征归一化带来的优势包括更好的性能和更好的几何解释。Softmax loss can be expressed as:
Figure BDA0003946439500000011
where N represents the batch size, P represents the number of categories in the entire training set, f i ∈ R d is the feature vector of the i-th sample belonging to the y i- th class, W i ∈ R d is the j-th column of the weight matrix W and finally The fully connected layer of , b j is the bias term of the jth class. Typical losses based on Euclidean distance include center loss, margin loss and range loss. They both add additional penalties to achieve joint supervision with softmax loss, and are designed based on the following two objectives: minimizing intra-class variation and maximizing inter-class variation. Both of these goals contribute to performance gains. Loss functions based on cosine similarity include L-Softmax loss, A-Softmax loss and AM-Softmax loss. They are derived from the softmax loss by adding an additional margin constraint. L2 weight normalization improves performance, although the improvement is very limited. Advantages brought about by feature normalization include better performance and better geometric interpretation.

目前已经提出的这些损失函数要么没应用权重和特征归一化,如对比损失、三重损失、中心损失、范围损失和间隔损失;要么未明确遵循提高判别能力的两个目标,如L-Softmax loss、ASoftmax loss、AM-Softmax loss和ArcFace。These loss functions that have been proposed so far either do not apply weight and feature normalization, such as contrastive loss, triple loss, center loss, range loss and interval loss; or do not explicitly follow the two goals of improving discriminative ability, such as L-Softmax loss , ASoftmax loss, AM-Softmax loss and ArcFace.

目前,深度神经网络是通过基于每个小批量的反馈信息迭代更新网络参数来训练的。这是一个可行的解决方案,因为存在两个限制:GPU,TPU或其他类似处理单元的计算能力和内存大小。在没有计算能力限制的情况下,深度神经网络可以以整个训练集作为反馈信息的来源进行训练,直接优化整个训练集的样本分布。在没有内存大小限制的情况下,深度神经网络会将整个训练集输入到内存中,而不是逐个小批量处理数据。也许正是因为以上两个约束,没有任何一个损失使用整个数据集作为反馈信息的来源来优化人脸识别中的CNNs。Currently, deep neural networks are trained by iteratively updating network parameters based on feedback information from each mini-batch. This is a viable solution because there are two constraints: the computing power and memory size of the GPU, TPU or other similar processing unit. Without the limitation of computing power, the deep neural network can use the entire training set as the source of feedback information for training, and directly optimize the sample distribution of the entire training set. Without memory size constraints, deep neural networks feed the entire training set into memory instead of processing data individually in mini-batches. Perhaps precisely because of the above two constraints, none of the losses use the entire dataset as a source of feedback information to optimize CNNs in face recognition.

我们提出了一种新的损失函数,即基于全局信息的余弦最优损失函数。余弦最优损失函数具有优化类内和类间变化以及权重和特征归一化的所有四个属性。并且,余弦最优损失函数由整个训练集的分布信息引导。相比于之前提出的损失函数,余弦最优损失函数更加有效,并表现出了更加先进的性能。We propose a new loss function, the cosine optimal loss function based on global information. The cosine-optimal loss function has all four properties of optimizing intra- and inter-class variation as well as weight and feature normalization. And, the cosine optimal loss function is guided by the distribution information of the whole training set. Compared to previously proposed loss functions, the cosine-optimal loss function is more effective and shows more advanced performance.

发明内容Contents of the invention

(1)要解决的技术问题(1) Technical problems to be solved

损失函数在CNN(卷积神经网络)中起着重要作用。然而,现有的损失函数要么没有应用权重和特征归一化,要么没有明确遵循提高辨别能力的两个目标:最小化类内变化和最大化类间变化。而且,所有这些函数只考虑小批量的反馈信息,而没有考虑整个训练集的分布信息。Loss function plays an important role in CNN (Convolutional Neural Network). However, existing loss functions either do not apply weight and feature normalization, or do not explicitly follow the two goals of improving discriminative ability: minimizing intra-class variation and maximizing inter-class variation. Moreover, all these functions only consider the feedback information of the mini-batch, but not the distribution information of the whole training set.

(2)技术方案(2) Technical solution

可应用于人脸识别且基于全局信息的余弦最优损失函数,包括如下步骤:A cosine optimal loss function that can be applied to face recognition and based on global information, including the following steps:

a)Softmax损失是深度学习中最常用的损失函数,可以表述为:a) Softmax loss is the most commonly used loss function in deep learning, which can be expressed as:

Figure BDA0003946439500000021
Figure BDA0003946439500000021

其中N表示批量大小,P表示整个训练集中的类别数,fi∈Rd是属于第yi个类的第i个样本的特征向量,Wj∈Rd是权重矩阵W的第j列最后的全连接层,bj是第j个类的偏置项;where N represents the batch size, P represents the number of categories in the entire training set, f i ∈ R d is the feature vector of the i-th sample belonging to the y i- th class, W j ∈ R d is the j-th column of the weight matrix W and finally The fully connected layer of , b j is the bias item of the jth class;

固定Softmax损失中的bj=0和||Wj||=1来应用L2权重归一化。同时对特征向量fi应用L2归一化并将||fi||重新缩放到S,再与AM-Softmax损失结合。得到的总损失为L=LAM+λLG。其中S是一个指定的常数,LG是所提出的余弦最优损失函数,λ是用于调整这两种损失影响力的超参数,LAM为AM-Softmax损失的函数表述。Fix b j = 0 and ||W j || = 1 in Softmax loss to apply L2 weight normalization. Simultaneously apply L2 normalization to feature vector f i and rescale ||f i || to S, combined with AM-Softmax loss. The resulting total loss is L = L AM + λL G . where S is a specified constant, L G is the proposed cosine optimal loss function, λ is a hyperparameter used to adjust the influence of these two losses, and L AM is the functional expression of AM-Softmax loss.

b)为了最小化类内变化,首先提出一个轻量化版本的余弦最优损失函数

Figure BDA0003946439500000023
其公式如下:
Figure BDA0003946439500000022
R(j)=cos(cj,ej),其中P是整个训练集中的类别数,cj是类j的中心,ej表示类j的边缘(即类j的最远样本)。R(j)表示j类的余弦范围,即类中心与j类边缘的余弦相似度。我们使用Wj作为cj的近似替代,并且提出一种算法来递归更新每个类的范围。b) In order to minimize intra-class variation, a lightweight version of the cosine optimal loss function is first proposed
Figure BDA0003946439500000023
Its formula is as follows:
Figure BDA0003946439500000022
R(j)=cos(c j , e j ), where P is the number of categories in the entire training set, c j is the center of class j, and e j represents the edge of class j (ie, the farthest sample of class j). R(j) represents the cosine range of class j, that is, the cosine similarity between the class center and the edge of class j. We use Wj as an approximate surrogate for cj , and propose an algorithm to recursively update the bounds of each class.

c)根据步骤b)所提到的算法,一开始,R(j)被初始化为1。然后我们使用以下迭代方式来更新R(j):c) According to the algorithm mentioned in step b), at the beginning, R(j) is initialized to 1. Then we update R(j) iteratively as follows:

Figure BDA0003946439500000031
Figure BDA0003946439500000031

其中j=1,2,...,P,where j=1,2,...,P,

Figure BDA0003946439500000032
Figure BDA0003946439500000032

其中,当yi=j时φ(yi,j)=1,否则φ(yi,j)=0。β是收缩率,用于调整学习类别范围的收缩速度。Wherein, φ(y i , j)=1 when y i =j, otherwise φ(y i , j)=0. β is the shrinkage rate, which is used to adjust the shrinkage speed of the learned category range.

根据步骤b)中所提出的学习算法,其基本思想涉及两种情形:①如果输入样本与其对应的类中心的余弦相似度小于记录的类范围,则直接用它们的余弦相似度替换类范围;②相反,如果输入样本与其对应的类中心的余弦相似度不小于记录的类范围,则通过用β缩放它们的余弦相似度来收缩类范围。情形①使学习的类范围保持最新。随着训练的进行,真实的类范围会越来越小。情形②用于帮助学习的类范围缩小到真实值。According to the learning algorithm proposed in step b), its basic idea involves two situations: ① If the cosine similarity between the input sample and its corresponding class center is smaller than the recorded class range, directly replace the class range with their cosine similarity; ② Conversely, if the cosine similarity of an input sample and its corresponding class center is not smaller than the recorded class range, the class range is shrunk by scaling their cosine similarity with β. Scenario ① keeps the learned class range up-to-date. As training progresses, the true class range becomes smaller and smaller. Case ② is used to help the learning class narrow down to the real value.

d)为了最大化类间变化,提出另一个轻量化版本的余弦最优损失函数

Figure BDA0003946439500000035
d) In order to maximize the inter-class variation, another lightweight version of the cosine optimal loss function is proposed
Figure BDA0003946439500000035

Figure BDA0003946439500000033
Figure BDA0003946439500000033

其中∑Top(A,k)表示集合A中K个最大元素的总和,Wa和Wb是任意两个不同类的类中心的近似替代值。余弦最优损失函数

Figure BDA0003946439500000034
的目的是在整个训练集中找到K对最近的类中心,并计算它们的距离总和。与不相邻的类中心相比,相邻中心的对应类很可能有较小的间隔或有重叠。如果所有相邻类都有适当的间隔,则非相邻类将具有更大的间隔。因此,没有必要考虑所有中心对。最有效的方法是优化所有相邻中心的距离。这里将K的值设置为P,其中P是类的数量。因为当所有类中心在超球面上排成一圈时,相邻中心对的最小数量是P。where ∑Top (A,k) denotes the sum of the K largest elements in the set A, and W a and W b are approximate surrogates for the class centers of any two different classes. Cosine optimal loss function
Figure BDA0003946439500000034
The purpose of is to find the K pairs of nearest class centers in the whole training set and calculate the sum of their distances. Corresponding classes of adjacent centers are likely to have smaller separations or overlap than non-adjacent class centers. If all adjacent classes have proper spacing, non-adjacent classes will have larger spacing. Therefore, it is not necessary to consider all center pairs. The most efficient way is to optimize the distance of all adjacent centers. Here the value of K is set to P, where P is the number of classes. Because when all the class centers are arranged in a circle on the hypersphere, the minimum number of pairs of adjacent centers is P.

e)整合步骤b)和步骤d)提出的两个轻量化版本创建出余弦最优损失函数的标准版

Figure BDA0003946439500000041
e) Integrate the two lightweight versions proposed in step b) and step d) to create a standard version of the cosine optimal loss function
Figure BDA0003946439500000041

Figure BDA0003946439500000042
Figure BDA0003946439500000042

(3)有益效果(3) Beneficial effect

本发明的余弦最优损失函数综合了近年来在人脸识别中提出的最优损失函数的优点。并首次尝试使用全局信息作为人脸识别的反馈。余弦最优损失函数运用了一种新的算法来学习类中心和类边缘之间的余弦相似度。本专利所提出的余弦最优损失函数在LFW、SLLFW和YTF数据集上进行了大量的实验。结果证明了其有效性并表明余弦最优损失函数实现了最先进的性能。The cosine optimal loss function of the present invention combines the advantages of the optimal loss function proposed in face recognition in recent years. And it is the first attempt to use global information as feedback for face recognition. The cosine optimal loss function employs a novel algorithm to learn the cosine similarity between class centers and class edges. The cosine optimal loss function proposed in this patent has been extensively tested on LFW, SLLFW and YTF data sets. The results demonstrate its effectiveness and show that the cosine-optimal loss function achieves state-of-the-art performance.

具体实施方式Detailed ways

下面对本发明做进一步说明。The present invention will be further described below.

可应用于人脸识别且基于全局信息的余弦最优损失函数的设计方法,包括如下步骤:A method for designing a cosine optimal loss function that can be applied to face recognition and based on global information includes the following steps:

a)固定Softmax损失中的bj=0和||Wj||=1来应用L2权重归一化。同时对特征向量fi应用L2归一化并将||fi||重新缩放到S,再与AM-Softmax损失结合。得到的总损失为L=LAM+λLG。其中S是一个指定的常数,LG是所提出的余弦最优损失函数,λ是用于调整这两种损失影响力的超参数,LAM为AM-Softmax损失的函数表述。a) Fix b j =0 and ||W j ||=1 in Softmax loss to apply L2 weight normalization. Simultaneously apply L2 normalization to feature vector f i and rescale ||f i || to S, combined with AM-Softmax loss. The resulting total loss is L = L AM + λL G . where S is a specified constant, L G is the proposed cosine optimal loss function, λ is a hyperparameter used to adjust the influence of these two losses, and L AM is the functional expression of AM-Softmax loss.

Softmax损失是深度学习中最常用的损失函数,可以表述为:Softmax loss is the most commonly used loss function in deep learning, which can be expressed as:

Figure BDA0003946439500000043
Figure BDA0003946439500000043

其中N表示批量大小,P表示整个训练集中的类别数,fi∈Rd是属于第yi个类的第i个样本的特征向量,Wj∈Rd是权重矩阵W的第j列最后的全连接层,bj是第j个类的偏置项;where N represents the batch size, P represents the number of categories in the entire training set, f i ∈ R d is the feature vector of the i-th sample belonging to the y i- th class, W j ∈ R d is the j-th column of the weight matrix W and finally The fully connected layer of , b j is the bias item of the jth class;

b)为了最小化类内变化,首先提出一个轻量化版本的余弦最优损失函数

Figure BDA0003946439500000044
其公式如下:b) In order to minimize intra-class variation, a lightweight version of the cosine optimal loss function is first proposed
Figure BDA0003946439500000044
Its formula is as follows:

Figure BDA0003946439500000045
R(j)=cos(cj,ej)
Figure BDA0003946439500000045
R(j)=cos(c j , e j )

其中P是整个训练集中的类别数,cj是类j的中心,ej表示类j的边缘(即类j的最远样本)。R(j)表示j类的余弦范围,即类中心与j类边缘的余弦相似度。我们使用Wj作为cj的近似替代,并且提出一种算法来递归更新每个类的范围。where P is the number of categories in the entire training set, c j is the center of class j, and e j represents the edge of class j (i.e. the farthest sample of class j). R(j) represents the cosine range of class j, that is, the cosine similarity between the class center and the edge of class j. We use Wj as an approximate surrogate for cj , and propose an algorithm to recursively update the bounds of each class.

c)根据步骤b)所提到的算法,一开始,R(j)被初始化为1。然后我们使用以下迭代方式来更新R(j):c) According to the algorithm mentioned in step b), at the beginning, R(j) is initialized to 1. Then we update R(j) iteratively as follows:

Figure BDA0003946439500000051
Figure BDA0003946439500000051

其中j=1,2,...,P,where j=1,2,...,P,

Figure BDA0003946439500000052
Figure BDA0003946439500000052

其中,当yi=j时φ(yi,j)=1,否则φ(yi,j)=0。β是收缩率,用于调整学习类别范围的收缩速度。Wherein, φ(y i , j)=1 when y i =j, otherwise φ(y i , j)=0. β is the shrinkage rate, which is used to adjust the shrinkage speed of the learned category range.

根据步骤b)中所提出的学习算法,其基本思想涉及两种情形:①如果输入样本与其对应的类中心的余弦相似度小于记录的类范围,则直接用它们的余弦相似度替换类范围;②相反,如果输入样本与其对应的类中心的余弦相似度不小于记录的类范围,则通过用β缩放它们的余弦相似度来收缩类范围。情形①使学习的类范围保持最新。随着训练的进行,真实的类范围会越来越小。情形②用于帮助学习的类范围缩小到真实值。According to the learning algorithm proposed in step b), its basic idea involves two situations: ① If the cosine similarity between the input sample and its corresponding class center is smaller than the recorded class range, directly replace the class range with their cosine similarity; ② Conversely, if the cosine similarity of an input sample and its corresponding class center is not smaller than the recorded class range, the class range is shrunk by scaling their cosine similarity with β. Scenario ① keeps the learned class range up-to-date. As training progresses, the true class range becomes smaller and smaller. Case ② is used to help the learning class narrow down to the real value.

d)为了最大化类间变化,提出另一个轻量化版本的余弦最优损失函数

Figure BDA0003946439500000055
d) In order to maximize the inter-class variation, another lightweight version of the cosine optimal loss function is proposed
Figure BDA0003946439500000055

Figure BDA0003946439500000053
Figure BDA0003946439500000053

其中∑Top(A,k)表示集合A中K个最大元素的总和,Wa和Wb是任意两个不同类的类中心的近似替代值。余弦最优损失函数

Figure BDA0003946439500000054
的目的是在整个训练集中找到K对最近的类中心,并计算它们的距离总和。与不相邻的类中心相比,相邻中心的对应类很可能有较小的间隔或有重叠。如果所有相邻类都有适当的间隔,则非相邻类将具有更大的间隔。因此,没有必要考虑所有中心对。最有效的方法是优化所有相邻中心的距离。这里将K的值设置为P,其中P是类的数量。因为当所有类中心在超球面上排成一圈时,相邻中心对的最小数量是P。where ∑Top (A,k) denotes the sum of the K largest elements in the set A, and W a and W b are approximate surrogates for the class centers of any two different classes. Cosine optimal loss function
Figure BDA0003946439500000054
The purpose of is to find the K pairs of nearest class centers in the whole training set and calculate the sum of their distances. Corresponding classes of adjacent centers are likely to have smaller separations or overlap than non-adjacent class centers. If all adjacent classes have proper spacing, non-adjacent classes will have larger spacing. Therefore, it is not necessary to consider all center pairs. The most efficient way is to optimize the distance of all adjacent centers. Here the value of K is set to P, where P is the number of classes. Because when all the class centers are arranged in a circle on the hypersphere, the minimum number of pairs of adjacent centers is P.

e)整合步骤b)和步骤d)提出的两个轻量化版本创建出余弦最优损失函数的标准版

Figure BDA0003946439500000061
e) Integrate the two lightweight versions proposed in step b) and step d) to create a standard version of the cosine optimal loss function
Figure BDA0003946439500000061

Figure BDA0003946439500000062
Figure BDA0003946439500000062

余弦最优损失函数综合了近年来在人脸识别中提出的最优损失函数的优点。并首次尝试使用全局信息作为人脸识别的反馈。余弦最优损失函数运用了一种新的算法来学习类中心和类边缘之间的余弦相似度。本专利所提出的余弦最优损失函数在LFW、SLLFW和YTF数据集上进行了大量的实验。结果证明了其有效性并表明余弦最优损失函数实现了最先进的性能。The cosine optimal loss function combines the advantages of the optimal loss functions proposed in face recognition in recent years. And it is the first attempt to use global information as feedback for face recognition. The cosine optimal loss function employs a novel algorithm to learn the cosine similarity between class centers and class edges. The cosine optimal loss function proposed in this patent has been extensively tested on LFW, SLLFW and YTF data sets. The results demonstrate its effectiveness and show that the cosine-optimal loss function achieves state-of-the-art performance.

Claims (6)

1. The optimization method of the cosine optimal loss function based on the global information is characterized by comprising the following steps of:
s1, combining the advantages of the existing loss function with a plurality of new attributes, and applying L2 weight normalization to obtain a total loss function;
s2, definitely following two targets of minimizing intra-class variation and maximizing inter-class variation, learning cosine similarity of class centers and class edges by means of a new algorithm, and respectively providing cosine optimal loss functions of two lightweight versions;
and S3, integrating the two lightweight versions to create a standard version of the cosine optimal loss function.
2. The method for optimizing a cosine optimal loss function based on global information according to claim 1, wherein the specific step S1 comprises:
the Softmax loss function is expressed as:
Figure FDA0003946439490000011
where N denotes the batch size, P denotes the number of classes in the entire training set, f i ∈R d Is of the y i Feature vector of ith sample of an individual class, W j ∈R d Is the last fully-connected layer of the jth column of the weight matrix W, b j Is the bias term of the jth class;
fixing b in Softmax loss j =0 and | | W j L2 weight normalization is applied, | = 1; for feature vector f simultaneously i Apply L2 normalization and convert | | | f i Rescaling | to S, S being a specified constant, and combining with AM-Softmax loss to obtain a total loss of L = L AM +λL G
In the formula, L G Is the proposed cosine optimum loss function, λ is the hyper-parameter for adjusting these two loss contributions, L AM Expressed as a function of the AM-Softmax loss.
3. The method for optimizing a cosine optimal loss function based on global information according to claim 1, wherein the step S2 comprises:
s21, in order to minimize the intra-class variation, a first lightweight version of the cosine optimal loss function is provided
Figure FDA0003946439490000012
The formula is as follows:
Figure FDA0003946439490000013
R(j)=cos(c j ,e j )
where P is the number of classes in the entire training set, c j Is the center of class j, e j Representing the edge of the class j, and R (j) represents the cosine range of the class j, namely the cosine similarity between the class center and the edge of the class j; using W j As c is j And a learning algorithm is employed to recursively update the range of each class;
s22, in order to maximize the inter-class variation, a second lightweight version of the cosine optimal loss function is provided
Figure FDA0003946439490000021
Figure FDA0003946439490000022
Figure FDA0003946439490000023
In the formula (E) Top (A, K) represents the sum of the K largest elements in set A, W a And W b Is an approximate replacement value for the class centers of any two different classes;
cosine optimum loss function
Figure FDA0003946439490000024
The aim of the method is to find the nearest class center of the K pairs in the whole training set and calculate the sum of the distances of the K pairs; optimizing the distance of all neighboring centers, setting the value of K to P, where P is the number of classes, since the minimum number of pairs of neighboring centers is P when all class centers are aligned in a circle on the hypersphere.
4. Optimization method of cosine optimal loss function based on global information according to claim 3The method is characterized in that the S3 comprises the following specific steps: and integrating the two lightweight versions provided by the step S2 to create a standard version of the cosine optimal loss function
Figure FDA0003946439490000025
5. The method for optimizing a cosine optimal loss function based on global information according to claim 3, wherein the learning algorithm of the step S21 is:
r (j) is initialized to 1; r (j) is then updated using the following iterative approach:
Figure FDA0003946439490000026
wherein j =1,2,. -, P;
Figure FDA0003946439490000031
when y is i Phi (y) when = j i J) =1, otherwise φ (y) i J) =0; β is a contraction rate for adjusting the contraction speed of the learning class range.
6. The method of claim 3, wherein the global information based optimization method for the cosine optimal loss function comprises: the learning algorithm of step S21 involves two cases:
(1) if the cosine similarity between the input sample and the corresponding class center is smaller than the recorded class range, directly replacing the class range by the cosine similarity;
(2) conversely, if the cosine similarity of the input sample and its corresponding class center is not less than the recorded class range, the class range is narrowed by scaling their cosine similarity by β;
the situation (1) keeps the learned class range up to date, and the real class range becomes smaller and smaller as the training progresses; case (2) is to help the class range of learning to narrow down to the true value.
CN202211442334.5A 2022-11-16 2022-11-16 Optimization method of cosine optimal loss function based on global information Active CN115761851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211442334.5A CN115761851B (en) 2022-11-16 2022-11-16 Optimization method of cosine optimal loss function based on global information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211442334.5A CN115761851B (en) 2022-11-16 2022-11-16 Optimization method of cosine optimal loss function based on global information

Publications (2)

Publication Number Publication Date
CN115761851A true CN115761851A (en) 2023-03-07
CN115761851B CN115761851B (en) 2025-07-11

Family

ID=85372857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211442334.5A Active CN115761851B (en) 2022-11-16 2022-11-16 Optimization method of cosine optimal loss function based on global information

Country Status (1)

Country Link
CN (1) CN115761851B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279091A1 (en) * 2018-03-12 2019-09-12 Carnegie Mellon University Discriminative Cosine Embedding in Machine Learning
CN110598603A (en) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 Face recognition model acquisition method, device, equipment and medium
CN113052261A (en) * 2021-04-22 2021-06-29 东南大学 Image classification loss function design method based on cosine space optimization
CN114627533A (en) * 2022-03-10 2022-06-14 厦门熵基科技有限公司 Face recognition method, face recognition device, face recognition equipment and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279091A1 (en) * 2018-03-12 2019-09-12 Carnegie Mellon University Discriminative Cosine Embedding in Machine Learning
CN110598603A (en) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 Face recognition model acquisition method, device, equipment and medium
CN113052261A (en) * 2021-04-22 2021-06-29 东南大学 Image classification loss function design method based on cosine space optimization
CN114627533A (en) * 2022-03-10 2022-06-14 厦门熵基科技有限公司 Face recognition method, face recognition device, face recognition equipment and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐健锋;何宇凡;刘斓: "三支决策代价目标函数的关系及推理研究", 计算机科学, 9 July 2018 (2018-07-09) *

Also Published As

Publication number Publication date
CN115761851B (en) 2025-07-11

Similar Documents

Publication Publication Date Title
Sucholutsky et al. Soft-label dataset distillation and text dataset distillation
Wang et al. Unsupervised deep clustering via adaptive GMM modeling and optimization
Xu et al. Weighted multi-view clustering with feature selection
CN110032646A (en) The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field
CN114022693A (en) A method for clustering single-cell RNA-seq data based on dual self-supervision
Shukla et al. Semi-supervised clustering with neural networks
Liu et al. A comparable study on model averaging, ensembling and reranking in nmt
Oskouei et al. RDEIC-LFW-DSS: ResNet-based deep embedded image clustering using local feature weighting and dynamic sample selection mechanism
CN114444600A (en) A Small-Sample Image Classification Method Based on Memory Augmented Prototype Network
Song et al. Real-world cross-modal retrieval via sequential learning
Cao et al. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization
Zhang et al. Effectiveness of scaled exponentially-regularized linear units (SERLUs)
Shi et al. Federated learning with ℓ1 regularization
Wu et al. Robust deep fuzzy K-means clustering for image data
Shi et al. Efficient federated learning with enhanced privacy via lottery ticket pruning in edge computing
Teng et al. Cluster ensemble framework based on the group method of data handling
CN111507263B (en) Face multi-attribute recognition method based on multi-source data
Li et al. Learning from crowds with robust logistic regression
Zhang et al. Transformer-based dynamic fusion clustering network
CN115761851A (en) Optimization Method of Cosine Optimal Loss Function Based on Global Information
Wu et al. Exponential discriminative metric embedding in deep learning
Yang et al. Modulation recognition based on incremental deep learning
Min et al. Bidirectional domain transfer knowledge distillation for catastrophic forgetting in federated learning with heterogeneous data
CN116662834B (en) Fuzzy hyperplane clustering method and device based on sample style characteristics
Nguyen et al. Model fusion of heterogeneous neural networks via cross-layer alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载