CN106874868A

CN106874868A - A kind of method for detecting human face and system based on three-level convolutional neural networks

Info

Publication number: CN106874868A
Application number: CN201710078431.3A
Authority: CN
Inventors: 王鲁许; 白洪亮; 董远
Original assignee: Beijing Faceall Co
Current assignee: Beijing Faceall Co
Priority date: 2017-02-14
Filing date: 2017-02-14
Publication date: 2017-06-20
Anticipated expiration: 2037-02-14
Also published as: CN106874868B

Abstract

The invention discloses a kind of method for detecting human face based on three-level convolutional neural networks and system, method has the advantages that：In the training process, by increasing first n grades training result as the input of rear stage, the missing problem of training data is compensate for, so as to improve the degree of accuracy and the recall rate of Face datection, and improves the performance of overall network.Human face characteristic point is added in training sample, the classification of face and the positioning precision of face rectangle frame are improved by human face characteristic point, so as to be reached the standard grade close to network is reached, and further improve recall rate and the degree of accuracy of Face datection；Only passing through the classification side-play amount in the first (the second) side-play amount being calculated carries out the regression correction of picture classification, so ensure that the correct part of classification no longer carries out regression correction, so that the speed of Face datection is improved, and reach the purpose for further excavating network performance.System has detection method identical beneficial effect.

Description

A kind of method for detecting human face and system based on three-level convolutional neural networks

Technical field

The present invention relates to human face detection tech field, and in particular to a kind of Face datection based on three-level convolutional neural networks Method and system.

Background technology

Since 21st century, computer technology flourishes, and is widely applied to various fields；With calculating The development of machine technology, human face detection tech arises at the historic moment and in continuous iteration, renewal.Face datection refers to for any Image collection, uses certain strategy to scan for it to determine the wherein image with face.

Face datection is a key link in Automatic face recognition system.Early stage recognition of face research mainly for With the facial image recognition (such as image without background) compared with Condition of Strong Constraint, often assume face location always or be readily available, Therefore Face datection problem is not taken seriously.

With the development of the applications such as ecommerce, recognition of face turns into most potential biometric verification of identity means, this Application background requirement Automatic face recognition system can have certain recognition capability to general pattern, and for thus being faced is Row problem causes that Face datection is paid attention to initially as an independent problem by researcher.Today, the application of Face datection Background far beyond the category of face identification system, content-based retrieval, Digital Video Processing, video detection, The aspect such as face modeling and face tracking has important application value.

The search strategy that human face detection tech is typically used is rolled up for decision tree, logistic regression, naive Bayesian and three-level Product neutral net scheduling algorithm etc., wherein the method for detecting human face based on three-level convolutional neural networks/system addresses detection speed is fast, Recognition accuracy is high and rapid iteration, update.Method for detecting human face based on three-level convolutional neural networks of the prior art：1) By multistage performance, enhanced network is trained step by step step by step, and the candidate frame that previous stage is judged as face is passed into next stage Learnt as training sample；2) made decisions by the classification of face and the Recurrent networks of face frame in every one-level；3) such as Fruit is classified correct directly by corrected data whole rear feed.

The deficiencies in the prior art part is that, because previous stage network performance is poor, there is part face cannot correctly sentence It is fixed, cause incoming next stage face candidate frame to have loss, overall performance is poor；Only by face classification and the recurrence nothing of face frame The performance that method reaches network is reached the standard grade, and still has room for promotion；Data whole rear feed, the depth of e-learning is inadequate, it is impossible to excavate net Network performance.

The content of the invention

It is an object of the invention to provide a kind of method for detecting human face based on three-level convolutional neural networks and system, to solve Overall performance is poor；Only it is corrected by face classification and face frame, it is impossible to which the performance for reaching network is reached the standard grade；The portion of correct classification Divide the problem for still carrying out regression correction.

To achieve these goals, the present invention provides following technical scheme：

A kind of method for detecting human face based on three-level convolutional neural networks, comprises the following steps：

Obtain training sample and detection picture；The training sample at least includes being labeled with face frame and human face characteristic point Face picture；

Training sample input three-level convolutional neural networks are trained step by step, the process of the training is：

Rear dimensionality reduction is predicted according to the training sample and first n grades training result, obtain corresponding two dimensional character to Amount, and obtain the first side-play amount according to its calculating；

Regression correction is carried out to the two-dimensional feature vector by first side-play amount, corresponding training result is obtained；

Three-level convolutional neural networks after the detection picture input training are carried out into Face datection step by step, face square is obtained Shape frame.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the face picture in the training sample also contains Picture classification label and the face frame for uniquely determining.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of the two-dimensional feature vector is including following Step：

M dimensional feature vectors are obtained according to the training sample and first n grades training result；

Dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, obtain the two dimensional character to Amount.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the three-level network includes tie point, second Branch road and the 3rd branch road, the two grade network include the tie point and second branch road, the tie point with it is described Primary network station is identical.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, in three-level network, the acquisition of m dimensional feature vectors Comprise the following steps：

The training result of the training sample and upper level is input into the tie point and obtains first eigenvector, by it It is input into second branch road and obtains second feature vector, is inputted the 3rd branch road and obtains third dimension characteristic vector；

By the first eigenvector, second feature vector and third feature vector spliced, obtain m dimensional features to Amount.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of first side-play amount includes following step Suddenly：

The two-dimensional feature vector is input into SoftmaxWithLoss layers, is calculated and is obtained classification side-play amount；

The two-dimensional feature vector is input into Loss layers of Euclidean, is calculated and is obtained face frame side-play amount and the people Face characteristic point side-play amount.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the calculating of the classification side-play amount includes following step Suddenly：

The two-dimensional feature vector is defined；It is defined as Z={ z₁,z₂, wherein

Classified by softmax functions；It is divided into two classes, it is special to turn to：

The difference between the two-dimensional feature vector and the training sample for predicting is calculated by loss function；

Loss function is：

WhereinCalculate

AmendmentWherein α is coefficient.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of the face rectangle frame includes following step Suddenly：

The detection picture input primary network station is screened to it, regression correction and is merged, obtained the first face time Select frame；

First face candidate frame input two grade network is screened to it, regression correction and is merged, obtained second Face candidate frame；

Second face candidate frame input three-level network is screened to it, regression correction and is merged, obtained face Rectangle frame.

The above-mentioned method for detecting human face based on three-level convolutional neural networks, screened, regression correction and merge include with Lower step：

According to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, filter out big In the face candidate frame of setting probability threshold value；

Calculated according to the face candidate frame obtained after screening and obtain the second side-play amount, it is entered by second side-play amount Row regression correction；

The face candidate frame obtained after non-maxima suppression algorithm is to correction is merged, and obtains the first face candidate Frame/the second face candidate frame/face rectangle frame.

The method for detecting human face based on three-level convolutional neural networks that the present invention is provided, has the advantages that：

1) in the training process, by increasing first n grades training result as the input of rear stage, compensate for training data Missing problem, so as to improve the degree of accuracy and the recall rate of Face datection, and improve the performance of overall network；

2) human face characteristic point is added in training sample, the classification of face and face rectangle frame is made by human face characteristic point Positioning precision be improved, so as to reach the standard grade close to reaching network, and further improve Face datection recall rate and The degree of accuracy；

3) only passing through the classification side-play amount in the first (the second) side-play amount being calculated carries out the recurrence school of picture classification Just, so ensure that the correct part of classification no longer carries out regression correction, so that the speed of Face datection is improved, and reach To the purpose for further excavating network performance.

A kind of face detection system based on three-level convolutional neural networks, including three-level convolutional neural networks, the three-level Convolutional neural networks include：

Acquiring unit, is used to obtain training sample and detection picture；The training sample at least includes being labeled with face spy Levy face picture a little；

Network training unit, is used to step by step be trained training sample input three-level convolutional neural networks；

It includes：Feature vector module and regression correction module,

The feature vector module, to be dropped after being predicted according to the training sample and first n grades training result Dimension, obtains corresponding two-dimensional feature vector, and obtain the first side-play amount according to its calculating；

The regression correction module, is used to carry out recurrence school to the two-dimensional feature vector by first side-play amount Just, corresponding training result is obtained；

Face datection unit, is used to for the three-level convolutional neural networks after the detection picture input training to carry out people step by step Face detection, obtains face rectangle frame.

The face detection system based on three-level convolutional neural networks that the present invention is provided, has the advantages that：

1) one-level again is made up by the two grade network and three-level network in network training unit 2 (or Face datection unit 3) The defect of network performance difference, is improved the accuracy of picture classification, so as to improve the recall rate of Face datection and accurate Degree, and improve the performance of overall network；

2) human face characteristic point is added in the face picture in the training sample of acquiring unit 1, is made by human face characteristic point The classification of face and the positioning precision of face rectangle frame are improved, so that reached the standard grade close to network is reached, and further Improve recall rate and the degree of accuracy of Face datection；

3) the classification side-play amount for only being obtained by the cooperation of feature vector module 21 and regression correction module 22 carries out picture The regression correction of classification, so ensure that classification correctly is partly not required to be corrected, so that the speed of Face datection is obtained Improve, and reach the purpose for further excavating network performance.

Brief description of the drawings

In order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing for needing to use is briefly described, it should be apparent that, drawings in the following description are only described in the present invention A little embodiments, for those of ordinary skill in the art, can also obtain other accompanying drawings according to these accompanying drawings.

Fig. 1 is the structured flowchart of the method for detecting human face based on three-level convolutional neural networks provided in an embodiment of the present invention；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 2 is provided for one embodiment of the present invention Schematic diagram；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 3 is provided for one embodiment of the present invention Schematic diagram；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 4 is provided for one embodiment of the present invention Schematic diagram；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 5 is provided for one embodiment of the present invention Schematic diagram；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 6 is provided for one embodiment of the present invention Schematic diagram；

The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 7 is provided for one embodiment of the present invention Schematic diagram；

Fig. 8 is the structural representation of the face detection system based on three-level convolutional neural networks provided in an embodiment of the present invention Figure；

The structural representation of the primary network station that Fig. 9 is provided for one embodiment of the present invention；

The structural representation of the two grade network that Figure 10 is provided for one embodiment of the present invention；

The structural representation of the three-level network that Figure 11 is provided for one embodiment of the present invention.

Description of reference numerals：

1st, acquiring unit；2nd, network training unit；21st, feature vector module；22nd, regression correction module；3rd, Face datection Unit.

Specific embodiment

In order that those skilled in the art more fully understands technical scheme, below in conjunction with accompanying drawing to this hair It is bright to be further detailed.

It is the Face datection based on three-level convolutional neural networks provided in an embodiment of the present invention as shown in Fig. 1-7 and 9-11 Method, it is further comprising the steps of：

S101, acquisition training sample and detection picture；The training sample at least includes being labeled with face frame and face is special Levy face picture a little；

As shown in figs. 9-11, further, the three-level convolutional neural networks include primary network station, two grade network and three Level network, the three-level network includes tie point, the second branch road and the 3rd branch road, and the two grade network includes described first Road and second branch road, the tie point are identical with the primary network station.The network structure and primary network station of tie point It is identical, it is easy to differentiate, primary network station is represented with 12-net in figure, 24-net represents two grade network, and 48-net represents three-level Network；That is 24-net include 12-net branch roads and 24-net branch roads, 48-net include 12-net branch roads, 24-net branch roads and 48-net branch roads, and 12-net, 24-net be connected step by step with 48-net, and in this way, can step by step select training sample, exclusion does not have There are other pictures of face, obtain accurate face picture and its corresponding more accurate face frame (determines face position Put).

Further, the face picture in the training sample also contains picture classification label.Specifically, the training sample This for include the human face characteristic point information of tag along sort, the face frame that uniquely determines and mark face picture and other Picture；Picture classification training can be carried out by tag along sort, will training sample be divided into the face picture set that has label and Other classes of picture set two；Rectangular area of the face in the face picture can determine that by face frame, so as to confine the region As determine face location；Human face characteristic point (landmark points) is nose, glasses, face, forehead and facial contour line etc. Protruding parts, can be easy to judge the difference of face by these positions；Due to only determining face location by face frame There is error, face can be accurately positioned by human face characteristic point：By increasing or reducing face frame, human face characteristic point is set to fall Within the scope of face frame, so as to improve the Face detection precision of face frame.Detection picture be face picture, environment picture and The set of other any images；After the completion of waiting to train, can carry out detecting the Face datection of picture.The mode for obtaining training sample can Think by transferring face database of the prior art, or face picture is obtained by modes such as 3D printings, and add contingency table Face frame, the mark human face characteristic point for sign, uniquely determining, then be mixed in other pictures.

S102, by the training sample input three-level convolutional neural networks trained step by step；

Step by step training refer to successively according to primary network station, two grade network, three-level network order to three-level convolutional Neural net Network is trained, and three-level convolutional neural networks have learning ability, the mode by that can learn picture classification after training, and Corresponding position can be found out in picture to be confined with rectangle frame, it might even be possible to by introducing position of the human face characteristic point to rectangle frame Further correction is put, so as to when largely different picture is input into, be realized by the three-level convolutional neural networks after training Face classification, positioning.

In S102 steps, the training is further comprising the steps of：

S1021, rear dimensionality reduction is predicted according to the training sample and first n grades training result, obtains corresponding two dimension Characteristic vector, and obtain the first side-play amount according to its calculating；

Training result refers to be obtained after the neural network forecast of each grade, dimensionality reduction and regression correction in three-level convolutional neural networks Result；When training sample is input into primary network station, first n grades training result is " sky ", when training sample is input into two grade network When, first n grades training result is " training result of primary network station ", when training sample is input into three-level network, the instruction of upper level It is " training result of primary network station " and " training result of two grade network " to practice result；Predict and during dimensionality reduction refers to training process, Training sample to being input into is classified, the prediction of face location, and is converted the two-dimensional feature vector for ease of computing； During first side-play amount refers to training process, the two-dimensional feature vector predicted and obtained after dimensionality reduction is relative to training sample (mainly Refer to the tag along sort in predicted value and training sample, the face frame that uniquely determines and the human face characteristic point of mark these aspects Difference) difference, that is, predict after with prediction before value between difference；Preferably, carried out between the two by loss function Calculate.Upper level network (training result and training sample of upper level network are input into next stage) is made up by next stage network The defect of poor performance, is improved the accuracy of picture classification, so as to improve recall rate and the degree of accuracy of Face datection, and And improve the performance of overall network.

In step S1021, the two-dimensional feature vector is comprised the following steps：

S201, m dimensional feature vectors are obtained according to the training sample and first n grades training result；

It was pre- geodesic structure before full convolutional layer/full articulamentum, the m dimensional features of each network is obtained by the structure prediction Vector, because the structure of primary network station, two grade network, three-level network is differed, and is input into the picture being wherein trained Also difference, therefore the m dimensional feature vectors for obtaining also is differed；Two grade network enters to the error component that primary network station prediction occurs Row is corrected, and ibid, three-level network corrects two grade network；The main points of correction are can during one-level/two grade network predicts the result for obtaining Can occur not being classified to face picture set, but the picture containing label or containing label not be classified to face picture but The situation of the picture of set；The probability of the occurrence of can substantially reducing above-mentioned by two grades/three-level network, so that three-level is rolled up Product neutral net has the ability of self purification.

In three-level network in S201 steps, m dimensional feature vectors are comprised the following steps：

S301, the training result input tie point acquisition first eigenvector by training sample and upper level, its is defeated Enter the second branch road and obtain second feature vector, be inputted the 3rd branch road and obtain third dimension characteristic vector；

S302, by the first eigenvector, second feature vector and third feature vector spliced, obtain m dimension Characteristic vector.

Pre- geodesic structure in networks at different levels is respectively provided with splicing function；In three-level network, branch roads at different levels are separately operable and obtain Different characteristic vectors, the dimension of each characteristic vector (i.e. first eigenvector, second feature vector and third feature vector) Degree is differed, and features described above vector is overlapped, and obtains m dimensional feature vectors；In two grade network, splicing side ibid Formula, a few branch road, therefore no third characteristic vector；In primary network station, only one branch road, therefore the result that splicing is obtained is just It is the result of the branch road.Prepared to be converted into two-dimensional feature vector, face is represented in vector form, make calculating more square Just.Specifically, corresponding training data is separately input in three branch roads.First branch road and just the same with 12-net, Before full convolution, can obtain m dimension (by 16 tie up as a example by) characteristic vector, second branch road by the full articulamentums of 24-net it The face feature vector of n dimensions (so that 128 tie up as an example) can be obtained after preceding layer.3rd branch road is by the full articulamentums of 48-net The face feature vector of p dimensions (so that 256 tie up as an example) can be obtained after layer before, three characteristic vectors are spliced.It is false IfIt is the characteristic vector of 12-net,It is the characteristic vector of 24-net.It is the characteristic vector of 48-net.Three vectors are carried out into splicing can obtain 400 dimensions ((m+n+p) is tieed up)By X₄By full articulamentum.

S202, dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, obtain the two dimension Characteristic vector.

There is the pre- geodesic structure being predicted before full convolutional layer, it is from the training sample that this is pre- by pre- geodesic structure Geodesic structure is considered that face picture set is divided into a class, and other picture set are divided into another kind of；And obtain the face figure The prediction face frame and prediction human face characteristic point of piece set, the form for being converted into m dimensional feature vectors are represented.Full volume Lamination has the effect of multidimensional characteristic vectors dimensionality reduction to two dimension, and the m dimensional feature vectors can just obtain two by the full convolutional layer Dimensional feature vector, the calculating of the side-play amount being convenient between predicted value and training sample.

S1022, regression correction is carried out to the two-dimensional feature vector by first side-play amount, obtain corresponding training As a result；

There is rear feed structure after full convolution/articulamentum in networks at different levels, predicted value is returned by the structure Correction；Regression correction refers to that the value by the first side-play amount to predicting is compensated, skew, face frame that correction classification is produced The skew that the skew of generation and human face characteristic point are produced, so that face classification, Face detection are more accurate, final acquisition Face frame is also more accurate, the regression correction that the categorized correct part of network is not classified, and further excavates network Performance, it is ensured that detection speed.

In S1022 steps, the acquisition of first side-play amount is comprised the following steps：

S401, by the two-dimensional feature vector be input into SoftmaxWithLoss layer, calculate obtain classify side-play amount；

After two-dimensional feature vector is obtained, by the SoftmaxWithLoss layers of calculating of classification side-play amount, will be calculated Weight W, bias term b carries out rear feed, i.e., the regression correction that can be classified by side-play amount of classifying improves recalling for classification Rate, the degree of accuracy.

In S401 steps, the calculating of the classification side-play amount is comprised the following steps：

S501, the two-dimensional feature vector is defined；

It is defined as Z={ z₁,z₂, wherein

S502, classified by softmax functions；It is divided into two classes, it is special to turn to：

Difference between S503, the two-dimensional feature vector predicted by loss function calculating and the training sample；

Loss function is：

WhereinCalculate

AmendmentWherein α is coefficient.

S402, by the two-dimensional feature vector be input into Loss layer of Euclidean, calculate acquisition face frame side-play amount and The human face characteristic point side-play amount.

Face frame side-play amount is carried out in networks at different levels by the combination of Euclidean distance and loss function and face characteristic is inclined The regression correction of shifting amount, so as to realize the correction to the final face rectangle frame for obtaining, it is ensured that on the premise of recognition of face speed Further improve face identification rate.

S103, the detection picture is input into the three-level convolutional neural networks carries out Face datection step by step, obtain face Rectangle frame.

Testing result is that the detection picture being input into is classified by the networks at different levels in three-level convolutional neural networks, and Detection obtains the general designation of face location and human face characteristic point, and it is the face candidate frame that the detection of each network is obtained；Correspondence three Individual network, testing result has three, it is screened, regression correction and merge after be input into next stage detect, finally may be used To obtain face rectangle frame；Face rectangle frame is screened to first pass through specific program, then by human face characteristic point side-play amount, face frame After the combination of both side-play amounts is corrected to it, the rectangle frame that same or analogous face frame is obtained is remerged, it can determine The information such as the face location stated.

In S103 steps, the face rectangle frame is comprised the following steps：

S601, the detection picture input primary network station is screened to it, regression correction and is merged, being obtained the first Face candidate frame；

S602, the first face candidate frame input two grade network is screened to it, regression correction and is merged, being obtained Second face candidate frame；

S603, the second face candidate frame input three-level network is screened to it, regression correction and is merged, being obtained Face rectangle frame.

Primary network station detection obtains the first face candidate frame, and two grade network detection obtains the second face candidate frame, three-level net Network detection obtains face rectangle frame (three testing results in above three face candidate frame correspondence step 103), to the first two Testing result is screened, regression correction and merge after respectively obtain the second face candidate frame and final face rectangle frame； Further, obtain intercepting out from artwork after the first face candidate frame and be adjusted to 24*24px sizes and be input into the second network Detected, obtain intercepting out from artwork after the second face candidate frame and be adjusted to 48*48px sizes and be input into the 3rd network Detected, screened again after detection, regression correction and merge after obtain face rectangle frame.Detect step by step, obtain accurate face Rectangle frame (face location), so as to further increase recall rate and the degree of accuracy of detection.

In S103 steps, screened, regression correction and merging comprise the following steps：

S701, according to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, sieve Select the face candidate frame more than setting probability threshold value；

S702, calculated according to the face candidate frame that is obtained after screening and obtain the second side-play amount, by second side-play amount Regression correction is carried out to it；

S703, the face candidate frame obtained after non-maxima suppression algorithm is to correction are merged, and obtain the first Face candidate frame/the second face candidate frame/face rectangle frame.

Face probability refers to that will detect that the part picture classification in picture is the face picture set after face pictures are closed In picture wherein include the probability of face；Face probability is compared with the probability threshold value of setting, if less than the setting Value, then delete the face candidate frame less than the setting value, the face candidate frame after being screened；By SoftmaxWithLoss The calculating that layer and Euclidean Loss layer carry out the second side-play amount, second side-play amount includes that the picture in detection process divides Class skew, the face frame skew for detecting and the human face characteristic point skew for detecting, so that after above-mentioned skew is to screening The face candidate frame for obtaining carries out regression correction, face candidate frame after being corrected；Again by non-maxima suppression algorithm to school The face candidate frame for just obtaining afterwards carries out frame merging, and non-maxima suppression algorithm is that face frame is arranged by the probability of face Sequence, the face frame and other frames for choosing maximum probability calculates registration, and registration just deletes corresponding frame more than certain threshold value, Merge the purpose of frame so as to reach, obtain the first face candidate frame/the second face candidate frame/face rectangle frame.By screening, return Returning correction and frame to merge makes the recall rate of Face datection and the degree of accuracy further improve, and ensure that the speed of detection.

As shown in figure 8, the embodiment of the present invention also provides the face detection system based on three-level convolutional neural networks, including three Level convolutional neural networks, the three-level convolutional neural networks include：

Acquiring unit 1, is used to obtain training sample and detection picture；The training sample at least includes being labeled with face spy Levy face picture a little；

Network training unit 2, is used to step by step be trained training sample input three-level convolutional neural networks；

It includes：Feature vector module and regression correction module,

The feature vector module 21, after being predicted according to the training sample and first n grades training result Dimensionality reduction, obtains corresponding two-dimensional feature vector, and obtain the first side-play amount according to its calculating；

The regression correction module 22, is used to carry out recurrence school to the two-dimensional feature vector by first side-play amount Just, corresponding training result is obtained；

Face datection unit 3, is used to carry out the three-level convolutional neural networks after the detection picture input training step by step Face datection, obtains face rectangle frame.

Some one exemplary embodiments of the invention only are described by way of explanation above, undoubtedly, for ability The those of ordinary skill in domain, without departing from the spirit and scope of the present invention, can be with a variety of modes to institute The embodiment of description is modified.Therefore, above-mentioned accompanying drawing and description are inherently illustrative, should not be construed as to the present invention The limitation of claims.

Claims

1. a kind of method for detecting human face based on three-level convolutional neural networks, it is characterised in that comprise the following steps：

Obtain training sample and detection picture；The training sample at least includes being labeled with the face of face frame and human face characteristic point Picture；

Rear dimensionality reduction is predicted according to the training sample and first n grades training result, corresponding two-dimensional feature vector is obtained, and The first side-play amount is obtained according to its calculating；

Three-level convolutional neural networks after the detection picture input training are carried out into Face datection step by step, face rectangle is obtained Frame.

2. method for detecting human face according to claim 1, it is characterised in that the face picture in the training sample also contains There is picture classification label.

3. method for detecting human face according to claim 1, it is characterised in that the acquisition of the two-dimensional feature vector include with Lower step：

Dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, the two-dimensional feature vector is obtained.

4. method for detecting human face according to claim 1, it is characterised in that three-level convolutional neural networks include one-level net Network, two grade network and three-level network, the three-level network include tie point, the second branch road and the 3rd branch road, described two grades Network includes the tie point and second branch road, and the tie point is identical with the primary network station.

5. the method for detecting human face according to claim 3 or 4, it is characterised in that in three-level network, m dimensional feature vectors Comprised the following steps：

The training result of the training sample and upper level is input into the tie point and obtains first eigenvector, be inputted Second branch road obtains second feature vector, is inputted the 3rd branch road and obtains third feature vector；

The first eigenvector, second feature vector and third feature vector are spliced, m dimensional feature vectors are obtained.

6. method for detecting human face according to claim 1, it is characterised in that the acquisition of first side-play amount includes following Step：

The two-dimensional feature vector is input into Loss layers of Euclidean, is calculated and is obtained face frame side-play amount and face spy Levy a side-play amount.

7. method for detecting human face according to claim 6, it is characterised in that the calculating of the classification side-play amount includes following Step：

y_{1} = h_{θ} (z_{1}) = \frac{e^{z_{1}}}{Σ_{j = 1}^{2} e^{z_{j}}}, y_{2} = h_{θ} (z_{2}) = \frac{e^{z_{2}}}{Σ_{j = 1}^{2} e^{z_{j}}};

Loss function is：

WhereinCalculate

AmendmentWherein α is coefficient.

8. method for detecting human face according to claim 1, it is characterised in that the acquisition of the face rectangle frame includes following Step：

The detection picture input primary network station is screened to it, regression correction and is merged, obtained the first face candidate frame；

First face candidate frame input two grade network is screened to it, regression correction and is merged, obtained the second face Candidate frame；

9. method for detecting human face according to claim 8, it is characterised in that screened, regression correction and merging include Following steps：

According to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, filter out more than setting Determine the face candidate frame of probability threshold value；

Calculated according to the face candidate frame obtained after screening and obtain the second side-play amount, it is returned by second side-play amount Return correction；

The face candidate frame obtained after non-maxima suppression algorithm is to correction is merged, obtain the first face candidate frame/ Second face candidate frame/face rectangle frame.

10. a kind of face detection system based on three-level convolutional neural networks, it is characterised in that including three-level convolutional Neural net Network, the three-level convolutional neural networks include：

Acquiring unit, is used to obtain training sample and detection picture；The training sample at least includes being labeled with human face characteristic point Face picture；

It includes：Feature vector module and regression correction module,

The feature vector module, to be predicted rear dimensionality reduction according to the training sample and first n grades training result, obtains To corresponding two-dimensional feature vector, and the first side-play amount is obtained according to its calculating；

The regression correction module, is used to carry out regression correction to the two-dimensional feature vector by first side-play amount, obtains To corresponding training result；

Face datection unit, is used to for the three-level convolutional neural networks after the detection picture input training to carry out face inspection step by step Survey, obtain face rectangle frame.