CN115641630A

CN115641630A - A small-sample multi-pose face recognition method based on hypergraph and multi-task collaboration

Info

Publication number: CN115641630A
Application number: CN202211196489.5A
Authority: CN
Inventors: 樊肖锦; 祝烈煌
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-24

Abstract

The invention relates to a hypergraph and multi-task cooperation based small-sample multi-pose face recognition method, and belongs to the technical field of artificial intelligence face recognition. The invention utilizes hypergraph and non-negative matrix decomposition to obtain an image similar to a front image, and designs a multi-pose face recognition framework based on hypergraph deflexion. The frame first separates the non-attitude deflected images. On the basis, a feature coding method based on improved support vector description is provided, features of the image without posture deflection are extracted, and joint optimization is carried out on the features and a classifier based on dictionary learning, so that the features are extracted and classified. The feature coding method utilizes improved support vector data description and triangular coding to make the extracted features more discriminative. Meanwhile, an effective feature extraction and feature classification optimization model is established, a solution closer to global optimum is easy to obtain, and the recognition performance is improved.

Description

Small sample multi-pose face recognition method based on hypergraph and multi-task cooperation

Technical Field

The invention relates to a face recognition method, in particular to a small sample multi-pose face recognition method based on hypergraph and multi-task cooperation, and belongs to the technical field of artificial intelligence face recognition.

Background

As a new identity recognition technology, face recognition is rapidly developed under the wave of artificial intelligence, and becomes a popular research field in recent years. The multi-pose face recognition mainly researches the influence of face angle change on the face recognition effect. How to solve the influence of the gesture on the face recognition, improve the multi-gesture face recognition effect and be beneficial to applying the face recognition technology to more identity recognition scenes.

The existing method based on posture correction mainly converts multi-posture human faces into front human faces. Currently, methods for fitting a frontal face include two types: one is an affine transformation method, which transforms a multi-pose face into a frontal face, but the transformation relation is simple, but face classification feature information is lost, and the recognition effect is not ideal. The other method is a virtual fusion method, which fuses a plurality of pieces of multi-pose face image information to construct a front face image, performs pose correction to a certain extent, and incorporates classification errors into a target function, thereby realizing the face classification capability.

Although the method based on the posture correction achieves better results in processing the face recognition and some scenes, the problem of recognizing the face image of the multi-posture thumbnail can be better solved. However, in practical applications, these multi-pose homogeneous images pose a great challenge to these methods because it is difficult for the pose deflection images to reveal the relationship between multiple samples.

In summary, the existing multi-pose face recognition method based on small samples has the defects and limitations. First, image features without attitude deflection are not easily extracted. Secondly, the extracted features are not highly discriminatory. Therefore, the conventional method cannot reveal the relationship between samples of the same kind, and it is difficult to learn the dictionary efficiently.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings in the prior art, and creatively provides a small sample multi-pose face recognition method based on hypergraph and multi-task cooperation for effectively solving the technical problem of multi-pose small sample face recognition.

Inspired by non-negative matrix decomposition, each image related to posture change can be decomposed through the non-negative matrix decomposition, one matrix obtained by decomposition is used as an image without posture deflection, and the other matrix is used as a posture change matrix. And finally, obtaining an image without posture deviation through multiple iterative decomposition. Compared with other single-resolution image identification methods, the methods can better deal with the identification problem of multi-resolution images. The hypergraph may represent a complex relationship between samples. And (4) inspiring by the hypergraph, regarding each image as a node in the hypergraph, performing non-negative matrix decomposition on the hypergraph formed by a plurality of images, and extracting the image with better performance and no posture deflection.

On the basis, the invention provides a small sample multi-pose face recognition method based on hypergraph and multi-task cooperation for the first time.

The innovation points of the invention are as follows: a hypergraph-based deflexion and multi-task collaborative optimization method is adopted, and an image similar to a front image is obtained by utilizing a hypergraph and non-negative matrix factorization. On the basis, a feature coding method based on improved support vector description is provided, and is combined with a classifier based on dictionary learning to be optimized for feature extraction and feature classification.

(1) A multi-pose face recognition framework based on hypergraph deflexion is provided. The framework firstly separates the non-attitude deflection images, then extracts the features of the non-attitude deflection images by using the proposed feature coding method based on the improved support vector data description, and identifies the extracted features.

(2) A feature coding method based on improved support vector description is provided. The feature coding method utilizes improved support vector data description and triangular coding to make the extracted features more discriminative.

(3) An effective feature extraction and feature classification optimization model is established, so that a solution closer to global optimum is easy to obtain, and the identification performance of the algorithm is improved.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. the invention introduces the hypergraph and embeds the hypergraph into a non-negative matrix decomposition process to more comprehensively disclose the relationship among samples of the same type.

2. The invention designs an improved feature coding method for supporting vector data description, and improves the identification of extracted features.

Drawings

FIG. 1 is a schematic diagram of hypergraph deflexion and multitask co-optimization in the present invention.

Fig. 2 is a schematic view of a flow chart of a face recognition process in the invention.

Fig. 3 is a schematic diagram of a process of extracting a front image.

Fig. 4 is a schematic diagram of encoding.

Detailed Description

The present invention is further described below with reference to examples in order to make the technical contents clearer and easier to understand. The described embodiments are merely representative of the invention, rather than complete. The invention belongs to the protection scope based on the embodiment of the invention.

A small sample multi-pose face recognition method based on hypergraph and multi-task cooperation comprises the steps of image orthogonalization, feature extraction, feature classification and joint optimization, and specifically comprises the following steps:

step 1: and carrying out image obverse processing.

As shown in fig. 1, feature discrimination enhancement is performed based on non-negative matrix factorization and hypergraph embedding. Comprises the following steps.

Step 1.1: non-negative matrix factorization is performed.

Given any non-negative matrix X ₀ Are all decomposed into two non-negative matrices

P ^T The sum, as follows:

wherein,

as a basis matrix, the matrix is,

is a submatrix, F represents a Frobenius norm, T represents a matrix transpose,

representing dimension symbols, m representing a sample X ₀ N represents the number of samples of the source data, and r represents the number of samples of the target data. s.t. is used to prompt that a constraint is followed.

Then, update

And P ^T ：

Wherein X represents the source data, P represents the transformation matrix, ij represents the ijth iteration, and jk represents the jk iteration.

Step 1.2: and (6) carrying out hypergraph embedding.

The hypergraph G is an ordered binary group G = (V, e). V is a non-empty set of nodes/vertices, called a set of vertices. e is a cluster of non-empty subsets whose elements are called super-edges.

Unlike the normal graph, each edge of the hypergraph not only connects two vertices, but also can connect more vertices.

Given a hypergraph G = (V, e), V = { V = { V) ₁ ，v ₂ ，…，v _k Is a finite set of data points, where v is _i (i =1,2, …, k) is a vertex. e.g. of the type _j Is a super edge, and the super edge set e satisfies the following conditions:

e ₁ ∪e ₂ ∪e ₃ …∪e _t ＝V

where t represents the super edge number.

Each super edge e _i All have a corresponding weight w _j The vertex hyper-edges will form a correlation matrix

The elements in the matrix are calculated using the following formula:

degree d of each vertex in hypergraph G _i Defined as the sum of the weights of the super edges to which it belongs. Degree of overcrowding ρ _i The number of nodes defined as the number of the nodes to which the super edge belongs is calculated as follows:

wherein w _j Representing the super-edge weight.

Let D _v Is a diagonal matrix whose main diagonal elements are

Let D _e And W is the sum of _j And w _j The diagonal matrices generated, j =1,2, …, t, respectively.

Then, the regularized hypergraph laplacian matrix L is calculated by the following equation ^H ：

Wherein,

representing a diagonal matrix D _e The inverse of (c).

Further, the present invention proposes a new feature encoding method capable of obtaining features of each image with almost no attitude deflection, which features have good category resolution. The method comprises the following specific steps:

representing a given data set as

Each column in Y represents an image sample.

First, for each image, noise in the image is removed (gaussian filter removal may be applied).

Then, whether the pixel of each image is negative is checked, the negative value is assigned to be 0, and the positive value retains the original value to obtain Y ^W ，Y ^W Representing a preprocessed image set.

Then, construct Y ^W Of the regularized hypergraph laplacian matrix L ^H . Let the number of hypergraph edges be t, the number of hypergraphs be N, N = t, and the number of vertices included in each hypergraph edge be s. The vertex included in each super edge is represented by Y _n ^W Generated by itself and its nearest s-1 neighbor, Y _n ^W Represents Y ^W Column n.

In particular, w _j The calculation method of (2) is as follows:

where exp () represents an exponential function.

To obtain Y ^W And L ^H The objective function is then as follows:

wherein,

represents Y ^W Is not negative, the error produced. Tr (P) ^T L ^H P) regular terms representing a hypergraph capable of protecting local dataAnd the performance of the algorithm is improved by the structure. Because the problem is difficult to solve, an iterative solution method is directly adopted to solve the problem. The lagrange function Δ corresponding to the above equation is:

wherein Ψ is Ψ _mk Correspond to

Of lagrange multipliers of [ phi ], [ _mk Representing the mk th iteration of Ψ. Phi is from phi _nk Corresponds to P _mk Matrices of lagrange multipliers of > 0, phi _nk Represents the nk iteration of Φ; λ denotes the coefficient and Tr denotes the trace of the matrix.

For ease of calculation, Δ of the above equation is rewritten as:

wherein,

Tr(B)＝Tr(B ^T ). B denotes an exemplary matrix.

By taking Δ and

the partial derivative of (a) yields:

according to KKT conditions (Karush-Kuhn-Tucker conditions, carrocon-Kuen-Take conditions)

And phi _nk P _nk =0, yielding:

in the above equation, the subscript of each variable indicates the number of iterations of the variable.

And P _nk Updating is carried out by the following method:

wherein,

representing the multiplication of the elements of the two matrices. Output the output

Is a collection of images that are geometrically without a pose deflection. D _v Representing a diagonal matrix.

Fig. 3 shows a process of extracting a near-frontal image from an image relating to a change in posture. F denotes the original image set with attitude deflection, Y ^W Representing the set of images obtained after preprocessing Y,

an image set representing an approximate frontal image obtained by decomposition and iteration, and P represents a pose transformation matrix. In FIG. 3Firstly, preprocessing each image in the original image set to obtain a non-negative image set without noise pollution. The hypergraph is then embedded in a non-negative matrix factorization to preserve the structure of the decomposed image. Finally, through matrix decomposition and iterative updating, an image set with almost no deflection is obtained.

Step 2: and (5) feature extraction. Comprises the following steps.

Step 2.1: feature coding based on improved support vector data description.

First, the present invention proposes an improved support vector data description with which to obtain the sphere center and radius of each cluster. Then, feature encoding is performed using the radius and center of the sphere corresponding to each cluster.

The existing support vector data description considers that each data point plays the same role in calculating the radius of each cluster, but this does not coincide with the actual situation. Therefore, the invention assigns a learned weight to each data in model learning and proposes an improved support vector data description, as follows:

wherein r represents the radius of the sphere, y _i Denotes the ith sample, ρ (y) _i ) Denotes y _i B represents the center of the sphere, num represents the number of samples, χ _i The value of the relaxation variable is represented by,

are parameters.

For the convenience of solving, the above equation is written in the form of lagrange function:

wherein,

representing the Lagrangian function, alpha, beta representing the corresponding Lagrangian factors, alpha _i 、β _i The ith elements representing alpha and beta, respectively.

Order to

And

obtaining:

wherein Q = (C)<x _i ，x _j >) _num×num ，Ω＝(<x _i ，x _j >) _num×1 ，e＝(1，1，1，…，1) ^T ，x _i 、x _j Is to remove the ith and jth samples of the attitude deflection. α = [ α = ₁ ，α ₂ ，…，α _num ]α is obtained using a linear algorithm and T represents transposition.

r represents a vector composed of the radii of the SVDD clustering sphere, and is obtained by the following formula:

where γ is the set of support vectors, i.e. the sample points used in the above equation are support vectors. Whether a sample point is a support vector or not, the condition needs to be satisfied: if the sample point y _i Is a support vector, then it corresponds to alpha _i Is non-zero. r = [ r ] ₁ ，r ₂ ，…，r _C ]And C is the number of clusters in the data set.

Step 2.2: and carrying out triangular coding.

For each image with its attitude deflection removed, decomposing it into

Blocks, each block being encoded。

Specifically, for an image q with attitude deflection removed, it is decomposed into

And (5) blocking.

For any block q ^j ，

Representing the number of decomposed blocks included in the image, is coded as U (q) ^j )：

U(q ^j )＝[U ₁ (q ^j ) U ₂ (q ^j )…U _C (q ^j )] ^T

Wherein, U _i (q ^j )＝[U _i，1 (q ^j ) U _i，2 (q ^j )]，i＝1，2，…，C，

U _i，1 (q ^j ) And U _i，2 (q ^j ) All are obtained by triangular coding.

U _i，1 (q ^j )＝max{0，d(s)-s _i (q ^j )}，s _i (q ^j )＝||q ^j -o _i || ₂ Denotes a symbol from q ^j To o _i A distance of o _i Denotes the center of the SVDD sphere formed by the jth cluster, and d(s) is all s _i (q ^j ) Is measured. U shape _i，2 (q ^j )＝max{0，A(m)-m _i (q ^j )}，

Is all m _i Is measured.

Fig. 4 shows a schematic diagram of the encoding. q. q.s ^j Representing the jth block of image q, image q being divided into

And (5) blocking. O is _i Representing the center of the SVDD sphere formed by the ith cluster, clustering a plurality of sample points into a cluster, r _i Is shown asRadius, s, of i clustered SVDD spheres _i (q ^j ) Denotes q ^j And O _i The distance between them. O is _j The center of the SVDD sphere formed in the jth cluster is shown. r is _j Is the radius of the SVDD sphere formed for the jth cluster. s _j (q ^j ) Denotes q ^j To O _j The distance between them.

Thus, the image is encoded as F _q The expression is as follows:

F _q ＝[(U(q ¹ )) ^T (U(q ² )) ^T …(U(q ^N )) ^T ] ^T

and step 3: and (5) classifying the features. Comprises the following steps.

Through the operations of deflexion removal and feature coding in the

steps

1 and 2, the influence of the posture change on the face recognition is greatly reduced. In order to further improve the recognition rate of the algorithm, the dictionary is learned, the learned dictionary is used for representing the test samples, and the classes of the test samples are determined according to the representation residual errors.

Specifically, the model of the classifier based on dictionary learning is as follows:

where X is a training sample, D is a learned dictionary, Z is a representation coefficient, D _i Represents the ith atom in D. F represents the Frobenius norm.

And 4, step 4: and (5) a joint optimization stage. Comprises the following steps.

In order to obtain a global optimal solution of HDMCO (HDMCO, a short term for representing a proposed algorithm), feature extraction and feature classification are jointly optimized as follows:

thus, α, D and Z were obtained. Z represents a coefficient.

Wherein α is derived from the formula:

the value of alpha is obtained by using a linear algorithm.

D is obtained from the following formula:

the above transformation is solved as follows:

wherein J represents an intermediate matrix, θ represents a coefficient, v ⁱ Representing the ith atom of the matrix V.

D is obtained by iteratively solving the variables in the above equation.

D is obtained by solving the following formula:

solutions of the above formula are as follows:

where shrnk (x, a) = signmax (| x | -a, 0), x represents an example variable. η represents a coefficient.

And 5: and predicting a classification result by using a classifier obtained by learning, classifying the sample, and realizing face recognition.

Claims

1. A small sample multi-pose face recognition method based on hypergraph and multi-task cooperation is characterized by comprising image orthogonalization, feature extraction, feature classification and joint optimization;

step 1: image surface treatment;

based on nonnegative matrix decomposition and hypergraph embedding, the method carries out feature discrimination enhancement and comprises the following steps;

step 1.1: performing non-negative matrix factorization;

P ^T The sum, as follows:

wherein,

as a basis matrix, the matrix is,

is a submatrix, F represents a Frobenius norm, T represents a matrix transpose,

representing dimension symbols, m representing a sample X ₀ N represents the number of samples of the source data, r represents the number of samples of the target data; s.t. for prompting that a constraint is followed;

then, update

And P ^T ：

Wherein X represents source data, P represents a transformation matrix, ij represents the ijth iteration, and jk represents the jk iteration;

step 1.2: carrying out hypergraph embedding;

hypergraph G is an ordered binary group G = (V, e); v is a non-empty set with nodes/vertices as elements, called a vertex set; e is a cluster of non-empty subsets whose elements are called super-edges;

each edge of the hypergraph not only connects two vertexes, but also can connect more vertexes;

given a hypergraph G = (V, e), V = { V = { V) ₁ ,v ₂ ,…,v _k Is a finite set of data points, where v is _i (i =1,2, …, k) is a vertex; e.g. of the type _j Is a super edge, and the super edge set e satisfies the following conditions:

e ₁ ∪e ₂ ∪e ₃ …∪e _t ＝V

wherein t represents a super edge number;

The elements in the matrix are calculated using the following formula:

degree d of each vertex in hypergraph G _i Defining as the sum of the weights of the super edges to which it belongs; degree of overcrowding ρ _i The number of nodes defined as the number of the nodes to which the super edge belongs is calculated as follows:

wherein, w _j Representing a super-edge weight;

let D _v Is a diagonal matrix whose main diagonal elements are

Let D _e And W is the sum of _j And w _j The diagonal matrices, j =1,2, …, t, generated respectively;

Wherein,

representing a diagonal matrix D _e The inverse of (1);

step 2: extracting characteristics;

step 2.1: feature encoding based on the improved support vector data description;

using a support vector data description to obtain the sphere center and radius of each cluster; then, carrying out feature coding by utilizing the radius and the center of the ball corresponding to each cluster;

assigning a learned weight to each data in model learning, and proposing a support vector data description, which is as follows:

is a parameter;

the above equation is written in the form of a lagrange function:

wherein,

representing the Lagrangian function, alpha, beta representing the corresponding Lagrangian factors, alpha _i 、β _i The ith element representing α and β, respectively;

order to

And

obtaining:

wherein Q = (< x) _i ,x _j ＞) _num×num ，Ω＝(＜x _i ,x _j ＞) _num×1 ，e＝(1,1,1,…,1) ^T ，x _i 、x _j Is the ith and jth samples from which attitude deflection is removed; α = [ α = ₁ ,α ₂ ,…,α _num ]Alpha is obtained by using a linear algorithm, and T represents transposition;

r represents a vector consisting of the radii of the SVDD cluster sphere, and is obtained by the following formula:

wherein γ is a set of support vectors, i.e., the sample points used in the above formula are support vectors; whether a sample point is a support vector or not, the condition needs to be satisfied: if the sample point y _i Is a support vector, then it corresponds to alpha _i Is non-zero; r = [ r ] ₁ ,r ₂ ,…,r _C ]C is a cluster in the data setThe number of (2);

step 2.2: carrying out triangular coding;

for each image with its attitude deflection removed, decomposing it into

Blocks, each block being encoded;

for an image q with its attitude deflection removed, it is decomposed into

A block; for any block q ^j ，

Representing the number of decomposed blocks included in the image, is encoded as U (q) ^j )：

U(q ^j )＝[U ₁ (q ^j ) U ₂ (q ^j )…U _C (q ^j )] ^T

Wherein,

U _i,1 (q ^j ) And U _i,2 (q ^j ) All are obtained by triangular coding;

U _i,1 (q ^j )＝max{0,d(s)-s _i (q ^j )}，s _i (q ^j )＝||q ^j -o _i || ₂ denotes a symbol from q ^j To o _i A distance of o _i Denotes the center of the SVDD sphere formed by the jth cluster, and d(s) is all s _i (q ^j ) The mean value of (a); u shape _i,2 (q ^j )＝max{0,A(m)-m _i (q ^j )}，

Is all m _i The mean value of (a);

wherein q is ^j Representing the jth block of image q, image q being divided into

A block; o is _i Representing the center of the SVDD sphere formed by the ith cluster, clustering a plurality of sample points into a cluster, r _i Denotes the radius, s, of the SVDD sphere formed by the ith cluster _i (q ^j ) Denotes q ^j And O _i The distance between them; o is _j Represents the center of the SVDD sphere formed by the jth cluster; r is _j Is the radius of the SVDD sphere formed in cluster j; s _j (q ^j ) Represents q ^j To O _j The distance between them;

the image is coded as F _q The expression is as follows:

and step 3: classifying the characteristics;

representing the test sample by learning the dictionary and using the learned dictionary, determining a category of the test sample from the representation residuals;

the model of the classifier based on dictionary learning is as follows:

where X is a training sample, D is a learned dictionary, Z is a representation coefficient, D _i Represents the ith atom in D; f represents a Frobenius norm;

and 4, step 4: a joint optimization stage;

in order to obtain a global optimal solution of the algorithm HDMCO, feature extraction and feature classification are optimized in a combined mode, and the method specifically comprises the following steps:

thereby obtaining α, D and Z; z represents a coefficient;

wherein α is derived from the formula:

obtaining the value of alpha by utilizing a linear algorithm;

d is obtained from the following formula:

the above transformation is solved as follows:

wherein J represents an intermediate matrix, θ represents a coefficient, v ⁱ The ith atom representing the matrix V;

d is obtained by iteratively solving the variables in the above formula;

d is obtained by solving the following formula:

solutions of the above formula are as follows:

wherein shrink (x, a) = signmax (| x | -a, 0), x represents an exemplified variable; η represents a coefficient;

2. The method for recognizing the multi-pose face with the small sample based on the hypergraph and the multi-task cooperation as claimed in claim 1, wherein in the step 1, a feature coding method is adopted to obtain the features of each image with almost no pose deflection, and the features have good category resolution, which is as follows:

representing a given data set as

Each column in Y represents an image sample;

firstly, removing noise in each image;

then, whether the pixel of each image is negative is checked, the negative value is assigned to be 0, and the positive value retains the original value to obtain Y ^W ，Y ^W Representing a pre-processed image set;

then, construct Y ^W Is represented by the regularized hypergraph laplacian matrix L ^H (ii) a Setting the number of the edges of the hypergraph as t, the number of the hypergraphs as N, N = t, and the number of vertexes contained in each hypergraph as s; the vertex included in each super edge is represented by Y _n ^W Generated by itself and its nearest s-1 neighbor, Y _n ^W Represents Y ^W The nth column of (1);

w _j the calculation method of (2) is as follows:

wherein exp () represents an exponential function;

to obtain Y ^W And L ^H The objective function is then as follows:

wherein,

represents Y ^W Error resulting from the non-negative decomposition of (d); tr (P) ^T L ^H P) a regularization term representing a hypergraph;

by adopting an iterative solution method, the lagrangian function delta corresponding to the above formula is:

wherein Ψ is Ψ _mk Correspond to

Of lagrange multipliers of [ phi ], [ _mk The mk th iteration representing Ψ; phi is from phi _nk Corresponds to P _mk Matrices composed of lagrange multipliers no less than 0, phi _nk Represents the nk iteration of Φ; λ represents a coefficient, tr represents a trace of the matrix;

rewriting Δ of the above formula as: