CN106874868A - A kind of method for detecting human face and system based on three-level convolutional neural networks - Google Patents
A kind of method for detecting human face and system based on three-level convolutional neural networks Download PDFInfo
- Publication number
- CN106874868A CN106874868A CN201710078431.3A CN201710078431A CN106874868A CN 106874868 A CN106874868 A CN 106874868A CN 201710078431 A CN201710078431 A CN 201710078431A CN 106874868 A CN106874868 A CN 106874868A
- Authority
- CN
- China
- Prior art keywords
- face
- network
- feature vector
- training
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of method for detecting human face based on three-level convolutional neural networks and system, method has the advantages that:In the training process, by increasing first n grades training result as the input of rear stage, the missing problem of training data is compensate for, so as to improve the degree of accuracy and the recall rate of Face datection, and improves the performance of overall network.Human face characteristic point is added in training sample, the classification of face and the positioning precision of face rectangle frame are improved by human face characteristic point, so as to be reached the standard grade close to network is reached, and further improve recall rate and the degree of accuracy of Face datection;Only passing through the classification side-play amount in the first (the second) side-play amount being calculated carries out the regression correction of picture classification, so ensure that the correct part of classification no longer carries out regression correction, so that the speed of Face datection is improved, and reach the purpose for further excavating network performance.System has detection method identical beneficial effect.
Description
Technical field
The present invention relates to human face detection tech field, and in particular to a kind of Face datection based on three-level convolutional neural networks
Method and system.
Background technology
Since 21st century, computer technology flourishes, and is widely applied to various fields;With calculating
The development of machine technology, human face detection tech arises at the historic moment and in continuous iteration, renewal.Face datection refers to for any
Image collection, uses certain strategy to scan for it to determine the wherein image with face.
Face datection is a key link in Automatic face recognition system.Early stage recognition of face research mainly for
With the facial image recognition (such as image without background) compared with Condition of Strong Constraint, often assume face location always or be readily available,
Therefore Face datection problem is not taken seriously.
With the development of the applications such as ecommerce, recognition of face turns into most potential biometric verification of identity means, this
Application background requirement Automatic face recognition system can have certain recognition capability to general pattern, and for thus being faced is
Row problem causes that Face datection is paid attention to initially as an independent problem by researcher.Today, the application of Face datection
Background far beyond the category of face identification system, content-based retrieval, Digital Video Processing, video detection,
The aspect such as face modeling and face tracking has important application value.
The search strategy that human face detection tech is typically used is rolled up for decision tree, logistic regression, naive Bayesian and three-level
Product neutral net scheduling algorithm etc., wherein the method for detecting human face based on three-level convolutional neural networks/system addresses detection speed is fast,
Recognition accuracy is high and rapid iteration, update.Method for detecting human face based on three-level convolutional neural networks of the prior art:1)
By multistage performance, enhanced network is trained step by step step by step, and the candidate frame that previous stage is judged as face is passed into next stage
Learnt as training sample;2) made decisions by the classification of face and the Recurrent networks of face frame in every one-level;3) such as
Fruit is classified correct directly by corrected data whole rear feed.
The deficiencies in the prior art part is that, because previous stage network performance is poor, there is part face cannot correctly sentence
It is fixed, cause incoming next stage face candidate frame to have loss, overall performance is poor;Only by face classification and the recurrence nothing of face frame
The performance that method reaches network is reached the standard grade, and still has room for promotion;Data whole rear feed, the depth of e-learning is inadequate, it is impossible to excavate net
Network performance.
The content of the invention
It is an object of the invention to provide a kind of method for detecting human face based on three-level convolutional neural networks and system, to solve
Overall performance is poor;Only it is corrected by face classification and face frame, it is impossible to which the performance for reaching network is reached the standard grade;The portion of correct classification
Divide the problem for still carrying out regression correction.
To achieve these goals, the present invention provides following technical scheme:
A kind of method for detecting human face based on three-level convolutional neural networks, comprises the following steps:
Obtain training sample and detection picture;The training sample at least includes being labeled with face frame and human face characteristic point
Face picture;
Training sample input three-level convolutional neural networks are trained step by step, the process of the training is:
Rear dimensionality reduction is predicted according to the training sample and first n grades training result, obtain corresponding two dimensional character to
Amount, and obtain the first side-play amount according to its calculating;
Regression correction is carried out to the two-dimensional feature vector by first side-play amount, corresponding training result is obtained;
Three-level convolutional neural networks after the detection picture input training are carried out into Face datection step by step, face square is obtained
Shape frame.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the face picture in the training sample also contains
Picture classification label and the face frame for uniquely determining.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of the two-dimensional feature vector is including following
Step:
M dimensional feature vectors are obtained according to the training sample and first n grades training result;
Dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, obtain the two dimensional character to
Amount.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the three-level network includes tie point, second
Branch road and the 3rd branch road, the two grade network include the tie point and second branch road, the tie point with it is described
Primary network station is identical.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, in three-level network, the acquisition of m dimensional feature vectors
Comprise the following steps:
The training result of the training sample and upper level is input into the tie point and obtains first eigenvector, by it
It is input into second branch road and obtains second feature vector, is inputted the 3rd branch road and obtains third dimension characteristic vector;
By the first eigenvector, second feature vector and third feature vector spliced, obtain m dimensional features to
Amount.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of first side-play amount includes following step
Suddenly:
The two-dimensional feature vector is input into SoftmaxWithLoss layers, is calculated and is obtained classification side-play amount;
The two-dimensional feature vector is input into Loss layers of Euclidean, is calculated and is obtained face frame side-play amount and the people
Face characteristic point side-play amount.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the calculating of the classification side-play amount includes following step
Suddenly:
The two-dimensional feature vector is defined;It is defined as Z={ z1,z2, wherein
Classified by softmax functions;It is divided into two classes, it is special to turn to:
The difference between the two-dimensional feature vector and the training sample for predicting is calculated by loss function;
Loss function is:
WhereinCalculate
AmendmentWherein α is coefficient.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, the acquisition of the face rectangle frame includes following step
Suddenly:
The detection picture input primary network station is screened to it, regression correction and is merged, obtained the first face time
Select frame;
First face candidate frame input two grade network is screened to it, regression correction and is merged, obtained second
Face candidate frame;
Second face candidate frame input three-level network is screened to it, regression correction and is merged, obtained face
Rectangle frame.
The above-mentioned method for detecting human face based on three-level convolutional neural networks, screened, regression correction and merge include with
Lower step:
According to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, filter out big
In the face candidate frame of setting probability threshold value;
Calculated according to the face candidate frame obtained after screening and obtain the second side-play amount, it is entered by second side-play amount
Row regression correction;
The face candidate frame obtained after non-maxima suppression algorithm is to correction is merged, and obtains the first face candidate
Frame/the second face candidate frame/face rectangle frame.
The method for detecting human face based on three-level convolutional neural networks that the present invention is provided, has the advantages that:
1) in the training process, by increasing first n grades training result as the input of rear stage, compensate for training data
Missing problem, so as to improve the degree of accuracy and the recall rate of Face datection, and improve the performance of overall network;
2) human face characteristic point is added in training sample, the classification of face and face rectangle frame is made by human face characteristic point
Positioning precision be improved, so as to reach the standard grade close to reaching network, and further improve Face datection recall rate and
The degree of accuracy;
3) only passing through the classification side-play amount in the first (the second) side-play amount being calculated carries out the recurrence school of picture classification
Just, so ensure that the correct part of classification no longer carries out regression correction, so that the speed of Face datection is improved, and reach
To the purpose for further excavating network performance.
A kind of face detection system based on three-level convolutional neural networks, including three-level convolutional neural networks, the three-level
Convolutional neural networks include:
Acquiring unit, is used to obtain training sample and detection picture;The training sample at least includes being labeled with face spy
Levy face picture a little;
Network training unit, is used to step by step be trained training sample input three-level convolutional neural networks;
It includes:Feature vector module and regression correction module,
The feature vector module, to be dropped after being predicted according to the training sample and first n grades training result
Dimension, obtains corresponding two-dimensional feature vector, and obtain the first side-play amount according to its calculating;
The regression correction module, is used to carry out recurrence school to the two-dimensional feature vector by first side-play amount
Just, corresponding training result is obtained;
Face datection unit, is used to for the three-level convolutional neural networks after the detection picture input training to carry out people step by step
Face detection, obtains face rectangle frame.
The face detection system based on three-level convolutional neural networks that the present invention is provided, has the advantages that:
1) one-level again is made up by the two grade network and three-level network in network training unit 2 (or Face datection unit 3)
The defect of network performance difference, is improved the accuracy of picture classification, so as to improve the recall rate of Face datection and accurate
Degree, and improve the performance of overall network;
2) human face characteristic point is added in the face picture in the training sample of acquiring unit 1, is made by human face characteristic point
The classification of face and the positioning precision of face rectangle frame are improved, so that reached the standard grade close to network is reached, and further
Improve recall rate and the degree of accuracy of Face datection;
3) the classification side-play amount for only being obtained by the cooperation of feature vector module 21 and regression correction module 22 carries out picture
The regression correction of classification, so ensure that classification correctly is partly not required to be corrected, so that the speed of Face datection is obtained
Improve, and reach the purpose for further excavating network performance.
Brief description of the drawings
In order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing for needing to use is briefly described, it should be apparent that, drawings in the following description are only described in the present invention
A little embodiments, for those of ordinary skill in the art, can also obtain other accompanying drawings according to these accompanying drawings.
Fig. 1 is the structured flowchart of the method for detecting human face based on three-level convolutional neural networks provided in an embodiment of the present invention;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 2 is provided for one embodiment of the present invention
Schematic diagram;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 3 is provided for one embodiment of the present invention
Schematic diagram;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 4 is provided for one embodiment of the present invention
Schematic diagram;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 5 is provided for one embodiment of the present invention
Schematic diagram;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 6 is provided for one embodiment of the present invention
Schematic diagram;
The flow of the method for detecting human face based on three-level convolutional neural networks that Fig. 7 is provided for one embodiment of the present invention
Schematic diagram;
Fig. 8 is the structural representation of the face detection system based on three-level convolutional neural networks provided in an embodiment of the present invention
Figure;
The structural representation of the primary network station that Fig. 9 is provided for one embodiment of the present invention;
The structural representation of the two grade network that Figure 10 is provided for one embodiment of the present invention;
The structural representation of the three-level network that Figure 11 is provided for one embodiment of the present invention.
Description of reference numerals:
1st, acquiring unit;2nd, network training unit;21st, feature vector module;22nd, regression correction module;3rd, Face datection
Unit.
Specific embodiment
In order that those skilled in the art more fully understands technical scheme, below in conjunction with accompanying drawing to this hair
It is bright to be further detailed.
It is the Face datection based on three-level convolutional neural networks provided in an embodiment of the present invention as shown in Fig. 1-7 and 9-11
Method, it is further comprising the steps of:
S101, acquisition training sample and detection picture;The training sample at least includes being labeled with face frame and face is special
Levy face picture a little;
As shown in figs. 9-11, further, the three-level convolutional neural networks include primary network station, two grade network and three
Level network, the three-level network includes tie point, the second branch road and the 3rd branch road, and the two grade network includes described first
Road and second branch road, the tie point are identical with the primary network station.The network structure and primary network station of tie point
It is identical, it is easy to differentiate, primary network station is represented with 12-net in figure, 24-net represents two grade network, and 48-net represents three-level
Network;That is 24-net include 12-net branch roads and 24-net branch roads, 48-net include 12-net branch roads, 24-net branch roads and
48-net branch roads, and 12-net, 24-net be connected step by step with 48-net, and in this way, can step by step select training sample, exclusion does not have
There are other pictures of face, obtain accurate face picture and its corresponding more accurate face frame (determines face position
Put).
Further, the face picture in the training sample also contains picture classification label.Specifically, the training sample
This for include the human face characteristic point information of tag along sort, the face frame that uniquely determines and mark face picture and other
Picture;Picture classification training can be carried out by tag along sort, will training sample be divided into the face picture set that has label and
Other classes of picture set two;Rectangular area of the face in the face picture can determine that by face frame, so as to confine the region
As determine face location;Human face characteristic point (landmark points) is nose, glasses, face, forehead and facial contour line etc.
Protruding parts, can be easy to judge the difference of face by these positions;Due to only determining face location by face frame
There is error, face can be accurately positioned by human face characteristic point:By increasing or reducing face frame, human face characteristic point is set to fall
Within the scope of face frame, so as to improve the Face detection precision of face frame.Detection picture be face picture, environment picture and
The set of other any images;After the completion of waiting to train, can carry out detecting the Face datection of picture.The mode for obtaining training sample can
Think by transferring face database of the prior art, or face picture is obtained by modes such as 3D printings, and add contingency table
Face frame, the mark human face characteristic point for sign, uniquely determining, then be mixed in other pictures.
S102, by the training sample input three-level convolutional neural networks trained step by step;
Step by step training refer to successively according to primary network station, two grade network, three-level network order to three-level convolutional Neural net
Network is trained, and three-level convolutional neural networks have learning ability, the mode by that can learn picture classification after training, and
Corresponding position can be found out in picture to be confined with rectangle frame, it might even be possible to by introducing position of the human face characteristic point to rectangle frame
Further correction is put, so as to when largely different picture is input into, be realized by the three-level convolutional neural networks after training
Face classification, positioning.
In S102 steps, the training is further comprising the steps of:
S1021, rear dimensionality reduction is predicted according to the training sample and first n grades training result, obtains corresponding two dimension
Characteristic vector, and obtain the first side-play amount according to its calculating;
Training result refers to be obtained after the neural network forecast of each grade, dimensionality reduction and regression correction in three-level convolutional neural networks
Result;When training sample is input into primary network station, first n grades training result is " sky ", when training sample is input into two grade network
When, first n grades training result is " training result of primary network station ", when training sample is input into three-level network, the instruction of upper level
It is " training result of primary network station " and " training result of two grade network " to practice result;Predict and during dimensionality reduction refers to training process,
Training sample to being input into is classified, the prediction of face location, and is converted the two-dimensional feature vector for ease of computing;
During first side-play amount refers to training process, the two-dimensional feature vector predicted and obtained after dimensionality reduction is relative to training sample (mainly
Refer to the tag along sort in predicted value and training sample, the face frame that uniquely determines and the human face characteristic point of mark these aspects
Difference) difference, that is, predict after with prediction before value between difference;Preferably, carried out between the two by loss function
Calculate.Upper level network (training result and training sample of upper level network are input into next stage) is made up by next stage network
The defect of poor performance, is improved the accuracy of picture classification, so as to improve recall rate and the degree of accuracy of Face datection, and
And improve the performance of overall network.
In step S1021, the two-dimensional feature vector is comprised the following steps:
S201, m dimensional feature vectors are obtained according to the training sample and first n grades training result;
It was pre- geodesic structure before full convolutional layer/full articulamentum, the m dimensional features of each network is obtained by the structure prediction
Vector, because the structure of primary network station, two grade network, three-level network is differed, and is input into the picture being wherein trained
Also difference, therefore the m dimensional feature vectors for obtaining also is differed;Two grade network enters to the error component that primary network station prediction occurs
Row is corrected, and ibid, three-level network corrects two grade network;The main points of correction are can during one-level/two grade network predicts the result for obtaining
Can occur not being classified to face picture set, but the picture containing label or containing label not be classified to face picture but
The situation of the picture of set;The probability of the occurrence of can substantially reducing above-mentioned by two grades/three-level network, so that three-level is rolled up
Product neutral net has the ability of self purification.
In three-level network in S201 steps, m dimensional feature vectors are comprised the following steps:
S301, the training result input tie point acquisition first eigenvector by training sample and upper level, its is defeated
Enter the second branch road and obtain second feature vector, be inputted the 3rd branch road and obtain third dimension characteristic vector;
S302, by the first eigenvector, second feature vector and third feature vector spliced, obtain m dimension
Characteristic vector.
Pre- geodesic structure in networks at different levels is respectively provided with splicing function;In three-level network, branch roads at different levels are separately operable and obtain
Different characteristic vectors, the dimension of each characteristic vector (i.e. first eigenvector, second feature vector and third feature vector)
Degree is differed, and features described above vector is overlapped, and obtains m dimensional feature vectors;In two grade network, splicing side ibid
Formula, a few branch road, therefore no third characteristic vector;In primary network station, only one branch road, therefore the result that splicing is obtained is just
It is the result of the branch road.Prepared to be converted into two-dimensional feature vector, face is represented in vector form, make calculating more square
Just.Specifically, corresponding training data is separately input in three branch roads.First branch road and just the same with 12-net,
Before full convolution, can obtain m dimension (by 16 tie up as a example by) characteristic vector, second branch road by the full articulamentums of 24-net it
The face feature vector of n dimensions (so that 128 tie up as an example) can be obtained after preceding layer.3rd branch road is by the full articulamentums of 48-net
The face feature vector of p dimensions (so that 256 tie up as an example) can be obtained after layer before, three characteristic vectors are spliced.It is false
IfIt is the characteristic vector of 12-net,It is the characteristic vector of 24-net.It is the characteristic vector of 48-net.Three vectors are carried out into splicing can obtain 400 dimensions ((m+n+p) is tieed up)By X4By full articulamentum.
S202, dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, obtain the two dimension
Characteristic vector.
There is the pre- geodesic structure being predicted before full convolutional layer, it is from the training sample that this is pre- by pre- geodesic structure
Geodesic structure is considered that face picture set is divided into a class, and other picture set are divided into another kind of;And obtain the face figure
The prediction face frame and prediction human face characteristic point of piece set, the form for being converted into m dimensional feature vectors are represented.Full volume
Lamination has the effect of multidimensional characteristic vectors dimensionality reduction to two dimension, and the m dimensional feature vectors can just obtain two by the full convolutional layer
Dimensional feature vector, the calculating of the side-play amount being convenient between predicted value and training sample.
S1022, regression correction is carried out to the two-dimensional feature vector by first side-play amount, obtain corresponding training
As a result;
There is rear feed structure after full convolution/articulamentum in networks at different levels, predicted value is returned by the structure
Correction;Regression correction refers to that the value by the first side-play amount to predicting is compensated, skew, face frame that correction classification is produced
The skew that the skew of generation and human face characteristic point are produced, so that face classification, Face detection are more accurate, final acquisition
Face frame is also more accurate, the regression correction that the categorized correct part of network is not classified, and further excavates network
Performance, it is ensured that detection speed.
In S1022 steps, the acquisition of first side-play amount is comprised the following steps:
S401, by the two-dimensional feature vector be input into SoftmaxWithLoss layer, calculate obtain classify side-play amount;
After two-dimensional feature vector is obtained, by the SoftmaxWithLoss layers of calculating of classification side-play amount, will be calculated
Weight W, bias term b carries out rear feed, i.e., the regression correction that can be classified by side-play amount of classifying improves recalling for classification
Rate, the degree of accuracy.
In S401 steps, the calculating of the classification side-play amount is comprised the following steps:
S501, the two-dimensional feature vector is defined;
It is defined as Z={ z1,z2, wherein
S502, classified by softmax functions;It is divided into two classes, it is special to turn to:
Difference between S503, the two-dimensional feature vector predicted by loss function calculating and the training sample;
Loss function is:
WhereinCalculate
AmendmentWherein α is coefficient.
S402, by the two-dimensional feature vector be input into Loss layer of Euclidean, calculate acquisition face frame side-play amount and
The human face characteristic point side-play amount.
Face frame side-play amount is carried out in networks at different levels by the combination of Euclidean distance and loss function and face characteristic is inclined
The regression correction of shifting amount, so as to realize the correction to the final face rectangle frame for obtaining, it is ensured that on the premise of recognition of face speed
Further improve face identification rate.
S103, the detection picture is input into the three-level convolutional neural networks carries out Face datection step by step, obtain face
Rectangle frame.
Testing result is that the detection picture being input into is classified by the networks at different levels in three-level convolutional neural networks, and
Detection obtains the general designation of face location and human face characteristic point, and it is the face candidate frame that the detection of each network is obtained;Correspondence three
Individual network, testing result has three, it is screened, regression correction and merge after be input into next stage detect, finally may be used
To obtain face rectangle frame;Face rectangle frame is screened to first pass through specific program, then by human face characteristic point side-play amount, face frame
After the combination of both side-play amounts is corrected to it, the rectangle frame that same or analogous face frame is obtained is remerged, it can determine
The information such as the face location stated.
In S103 steps, the face rectangle frame is comprised the following steps:
S601, the detection picture input primary network station is screened to it, regression correction and is merged, being obtained the first
Face candidate frame;
S602, the first face candidate frame input two grade network is screened to it, regression correction and is merged, being obtained
Second face candidate frame;
S603, the second face candidate frame input three-level network is screened to it, regression correction and is merged, being obtained
Face rectangle frame.
Primary network station detection obtains the first face candidate frame, and two grade network detection obtains the second face candidate frame, three-level net
Network detection obtains face rectangle frame (three testing results in above three face candidate frame correspondence step 103), to the first two
Testing result is screened, regression correction and merge after respectively obtain the second face candidate frame and final face rectangle frame;
Further, obtain intercepting out from artwork after the first face candidate frame and be adjusted to 24*24px sizes and be input into the second network
Detected, obtain intercepting out from artwork after the second face candidate frame and be adjusted to 48*48px sizes and be input into the 3rd network
Detected, screened again after detection, regression correction and merge after obtain face rectangle frame.Detect step by step, obtain accurate face
Rectangle frame (face location), so as to further increase recall rate and the degree of accuracy of detection.
In S103 steps, screened, regression correction and merging comprise the following steps:
S701, according to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, sieve
Select the face candidate frame more than setting probability threshold value;
S702, calculated according to the face candidate frame that is obtained after screening and obtain the second side-play amount, by second side-play amount
Regression correction is carried out to it;
S703, the face candidate frame obtained after non-maxima suppression algorithm is to correction are merged, and obtain the first
Face candidate frame/the second face candidate frame/face rectangle frame.
Face probability refers to that will detect that the part picture classification in picture is the face picture set after face pictures are closed
In picture wherein include the probability of face;Face probability is compared with the probability threshold value of setting, if less than the setting
Value, then delete the face candidate frame less than the setting value, the face candidate frame after being screened;By SoftmaxWithLoss
The calculating that layer and Euclidean Loss layer carry out the second side-play amount, second side-play amount includes that the picture in detection process divides
Class skew, the face frame skew for detecting and the human face characteristic point skew for detecting, so that after above-mentioned skew is to screening
The face candidate frame for obtaining carries out regression correction, face candidate frame after being corrected;Again by non-maxima suppression algorithm to school
The face candidate frame for just obtaining afterwards carries out frame merging, and non-maxima suppression algorithm is that face frame is arranged by the probability of face
Sequence, the face frame and other frames for choosing maximum probability calculates registration, and registration just deletes corresponding frame more than certain threshold value,
Merge the purpose of frame so as to reach, obtain the first face candidate frame/the second face candidate frame/face rectangle frame.By screening, return
Returning correction and frame to merge makes the recall rate of Face datection and the degree of accuracy further improve, and ensure that the speed of detection.
The method for detecting human face based on three-level convolutional neural networks that the present invention is provided, has the advantages that:
1) in the training process, by increasing first n grades training result as the input of rear stage, compensate for training data
Missing problem, so as to improve the degree of accuracy and the recall rate of Face datection, and improve the performance of overall network;
2) human face characteristic point is added in training sample, the classification of face and face rectangle frame is made by human face characteristic point
Positioning precision be improved, so as to reach the standard grade close to reaching network, and further improve Face datection recall rate and
The degree of accuracy;
3) only passing through the classification side-play amount in the first (the second) side-play amount being calculated carries out the recurrence school of picture classification
Just, so ensure that the correct part of classification no longer carries out regression correction, so that the speed of Face datection is improved, and reach
To the purpose for further excavating network performance.
As shown in figure 8, the embodiment of the present invention also provides the face detection system based on three-level convolutional neural networks, including three
Level convolutional neural networks, the three-level convolutional neural networks include:
Acquiring unit 1, is used to obtain training sample and detection picture;The training sample at least includes being labeled with face spy
Levy face picture a little;
Network training unit 2, is used to step by step be trained training sample input three-level convolutional neural networks;
It includes:Feature vector module and regression correction module,
The feature vector module 21, after being predicted according to the training sample and first n grades training result
Dimensionality reduction, obtains corresponding two-dimensional feature vector, and obtain the first side-play amount according to its calculating;
The regression correction module 22, is used to carry out recurrence school to the two-dimensional feature vector by first side-play amount
Just, corresponding training result is obtained;
Face datection unit 3, is used to carry out the three-level convolutional neural networks after the detection picture input training step by step
Face datection, obtains face rectangle frame.
The face detection system based on three-level convolutional neural networks that the present invention is provided, has the advantages that:
1) one-level again is made up by the two grade network and three-level network in network training unit 2 (or Face datection unit 3)
The defect of network performance difference, is improved the accuracy of picture classification, so as to improve the recall rate of Face datection and accurate
Degree, and improve the performance of overall network;
2) human face characteristic point is added in the face picture in the training sample of acquiring unit 1, is made by human face characteristic point
The classification of face and the positioning precision of face rectangle frame are improved, so that reached the standard grade close to network is reached, and further
Improve recall rate and the degree of accuracy of Face datection;
3) the classification side-play amount for only being obtained by the cooperation of feature vector module 21 and regression correction module 22 carries out picture
The regression correction of classification, so ensure that classification correctly is partly not required to be corrected, so that the speed of Face datection is obtained
Improve, and reach the purpose for further excavating network performance.
Some one exemplary embodiments of the invention only are described by way of explanation above, undoubtedly, for ability
The those of ordinary skill in domain, without departing from the spirit and scope of the present invention, can be with a variety of modes to institute
The embodiment of description is modified.Therefore, above-mentioned accompanying drawing and description are inherently illustrative, should not be construed as to the present invention
The limitation of claims.
Claims (10)
1. a kind of method for detecting human face based on three-level convolutional neural networks, it is characterised in that comprise the following steps:
Obtain training sample and detection picture;The training sample at least includes being labeled with the face of face frame and human face characteristic point
Picture;
Training sample input three-level convolutional neural networks are trained step by step, the process of the training is:
Rear dimensionality reduction is predicted according to the training sample and first n grades training result, corresponding two-dimensional feature vector is obtained, and
The first side-play amount is obtained according to its calculating;
Regression correction is carried out to the two-dimensional feature vector by first side-play amount, corresponding training result is obtained;
Three-level convolutional neural networks after the detection picture input training are carried out into Face datection step by step, face rectangle is obtained
Frame.
2. method for detecting human face according to claim 1, it is characterised in that the face picture in the training sample also contains
There is picture classification label.
3. method for detecting human face according to claim 1, it is characterised in that the acquisition of the two-dimensional feature vector include with
Lower step:
M dimensional feature vectors are obtained according to the training sample and first n grades training result;
Dimension-reduction treatment is carried out to the m dimensional feature vectors by full convolutional layer/full articulamentum, the two-dimensional feature vector is obtained.
4. method for detecting human face according to claim 1, it is characterised in that three-level convolutional neural networks include one-level net
Network, two grade network and three-level network, the three-level network include tie point, the second branch road and the 3rd branch road, described two grades
Network includes the tie point and second branch road, and the tie point is identical with the primary network station.
5. the method for detecting human face according to claim 3 or 4, it is characterised in that in three-level network, m dimensional feature vectors
Comprised the following steps:
The training result of the training sample and upper level is input into the tie point and obtains first eigenvector, be inputted
Second branch road obtains second feature vector, is inputted the 3rd branch road and obtains third feature vector;
The first eigenvector, second feature vector and third feature vector are spliced, m dimensional feature vectors are obtained.
6. method for detecting human face according to claim 1, it is characterised in that the acquisition of first side-play amount includes following
Step:
The two-dimensional feature vector is input into SoftmaxWithLoss layers, is calculated and is obtained classification side-play amount;
The two-dimensional feature vector is input into Loss layers of Euclidean, is calculated and is obtained face frame side-play amount and face spy
Levy a side-play amount.
7. method for detecting human face according to claim 6, it is characterised in that the calculating of the classification side-play amount includes following
Step:
The two-dimensional feature vector is defined;It is defined as Z={ z1,z2, wherein
Classified by softmax functions;It is divided into two classes, it is special to turn to:
The difference between the two-dimensional feature vector and the training sample for predicting is calculated by loss function;
Loss function is:
WhereinCalculate
AmendmentWherein α is coefficient.
8. method for detecting human face according to claim 1, it is characterised in that the acquisition of the face rectangle frame includes following
Step:
The detection picture input primary network station is screened to it, regression correction and is merged, obtained the first face candidate frame;
First face candidate frame input two grade network is screened to it, regression correction and is merged, obtained the second face
Candidate frame;
Second face candidate frame input three-level network is screened to it, regression correction and is merged, obtained face rectangle
Frame.
9. method for detecting human face according to claim 8, it is characterised in that screened, regression correction and merging include
Following steps:
According to detection picture/the first face candidate frame/the second face candidate frame and corresponding face probability, filter out more than setting
Determine the face candidate frame of probability threshold value;
Calculated according to the face candidate frame obtained after screening and obtain the second side-play amount, it is returned by second side-play amount
Return correction;
The face candidate frame obtained after non-maxima suppression algorithm is to correction is merged, obtain the first face candidate frame/
Second face candidate frame/face rectangle frame.
10. a kind of face detection system based on three-level convolutional neural networks, it is characterised in that including three-level convolutional Neural net
Network, the three-level convolutional neural networks include:
Acquiring unit, is used to obtain training sample and detection picture;The training sample at least includes being labeled with human face characteristic point
Face picture;
Network training unit, is used to step by step be trained training sample input three-level convolutional neural networks;
It includes:Feature vector module and regression correction module,
The feature vector module, to be predicted rear dimensionality reduction according to the training sample and first n grades training result, obtains
To corresponding two-dimensional feature vector, and the first side-play amount is obtained according to its calculating;
The regression correction module, is used to carry out regression correction to the two-dimensional feature vector by first side-play amount, obtains
To corresponding training result;
Face datection unit, is used to for the three-level convolutional neural networks after the detection picture input training to carry out face inspection step by step
Survey, obtain face rectangle frame.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710078431.3A CN106874868B (en) | 2017-02-14 | 2017-02-14 | Face detection method and system based on three-level convolutional neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710078431.3A CN106874868B (en) | 2017-02-14 | 2017-02-14 | Face detection method and system based on three-level convolutional neural network |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106874868A true CN106874868A (en) | 2017-06-20 |
| CN106874868B CN106874868B (en) | 2020-09-18 |
Family
ID=59167030
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710078431.3A Active CN106874868B (en) | 2017-02-14 | 2017-02-14 | Face detection method and system based on three-level convolutional neural network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106874868B (en) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107665355A (en) * | 2017-09-27 | 2018-02-06 | 重庆邮电大学 | A kind of agricultural pests detection method based on region convolutional neural networks |
| CN107679450A (en) * | 2017-08-25 | 2018-02-09 | 珠海多智科技有限公司 | Obstruction conditions servant's face recognition method based on deep learning |
| CN107688786A (en) * | 2017-08-30 | 2018-02-13 | 南京理工大学 | A kind of method for detecting human face based on concatenated convolutional neutral net |
| CN107784294A (en) * | 2017-11-15 | 2018-03-09 | 武汉烽火众智数字技术有限责任公司 | A kind of persona face detection method based on deep learning |
| CN107784288A (en) * | 2017-10-30 | 2018-03-09 | 华南理工大学 | A kind of iteration positioning formula method for detecting human face based on deep neural network |
| CN107808142A (en) * | 2017-11-09 | 2018-03-16 | 北京小米移动软件有限公司 | Eyeglass detection method and device |
| CN107886074A (en) * | 2017-11-13 | 2018-04-06 | 苏州科达科技股份有限公司 | A kind of method for detecting human face and face detection system |
| CN108363957A (en) * | 2018-01-19 | 2018-08-03 | 成都考拉悠然科技有限公司 | Road traffic sign detection based on cascade network and recognition methods |
| CN108509940A (en) * | 2018-04-20 | 2018-09-07 | 北京达佳互联信息技术有限公司 | Facial image tracking, device, computer equipment and storage medium |
| CN108921131A (en) * | 2018-07-26 | 2018-11-30 | 中国银联股份有限公司 | A kind of method and device generating Face datection model, three-dimensional face images |
| CN108960064A (en) * | 2018-06-01 | 2018-12-07 | 重庆锐纳达自动化技术有限公司 | A kind of Face datection and recognition methods based on convolutional neural networks |
| CN109344740A (en) * | 2018-09-12 | 2019-02-15 | 上海了物网络科技有限公司 | Face identification system, method and computer readable storage medium |
| CN109376693A (en) * | 2018-11-22 | 2019-02-22 | 四川长虹电器股份有限公司 | Method for detecting human face and system |
| CN109389105A (en) * | 2018-12-20 | 2019-02-26 | 北京万里红科技股份有限公司 | A kind of iris detection and viewpoint classification method based on multitask |
| CN109635693A (en) * | 2018-12-03 | 2019-04-16 | 武汉烽火众智数字技术有限责任公司 | A kind of face image detection method and device |
| CN109753931A (en) * | 2019-01-04 | 2019-05-14 | 广州广电卓识智能科技有限公司 | Convolutional neural network training method, system and face feature point detection method |
| CN110263852A (en) * | 2019-06-20 | 2019-09-20 | 北京字节跳动网络技术有限公司 | Data processing method, device and electronic equipment |
| CN110717481A (en) * | 2019-12-12 | 2020-01-21 | 浙江鹏信信息科技股份有限公司 | Method for realizing face detection by using cascaded convolutional neural network |
| CN111209819A (en) * | 2019-12-30 | 2020-05-29 | 新大陆数字技术股份有限公司 | Rotation-invariant face detection method, system equipment and readable storage medium |
| CN111382297A (en) * | 2018-12-29 | 2020-07-07 | 杭州海康存储科技有限公司 | Method and device for reporting user data of user side |
| CN112232215A (en) * | 2020-10-16 | 2021-01-15 | 哈尔滨市科佳通用机电股份有限公司 | Railway wagon coupler yoke key joist falling fault detection method |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105740758A (en) * | 2015-12-31 | 2016-07-06 | 上海极链网络科技有限公司 | Internet video face recognition method based on deep learning |
| CN106096535A (en) * | 2016-06-07 | 2016-11-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | A kind of face verification method based on bilinearity associating CNN |
| CN106228137A (en) * | 2016-07-26 | 2016-12-14 | 广州市维安科技股份有限公司 | A kind of ATM abnormal human face detection based on key point location |
| CN106295476A (en) * | 2015-05-29 | 2017-01-04 | 腾讯科技(深圳)有限公司 | Face key point localization method and device |
-
2017
- 2017-02-14 CN CN201710078431.3A patent/CN106874868B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106295476A (en) * | 2015-05-29 | 2017-01-04 | 腾讯科技(深圳)有限公司 | Face key point localization method and device |
| CN105740758A (en) * | 2015-12-31 | 2016-07-06 | 上海极链网络科技有限公司 | Internet video face recognition method based on deep learning |
| CN106096535A (en) * | 2016-06-07 | 2016-11-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | A kind of face verification method based on bilinearity associating CNN |
| CN106228137A (en) * | 2016-07-26 | 2016-12-14 | 广州市维安科技股份有限公司 | A kind of ATM abnormal human face detection based on key point location |
Non-Patent Citations (1)
| Title |
|---|
| HAOXIANG LI.ET AL: ""A Convolutional Neural Network Cascade for Face Detection"", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
Cited By (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107679450A (en) * | 2017-08-25 | 2018-02-09 | 珠海多智科技有限公司 | Obstruction conditions servant's face recognition method based on deep learning |
| CN107688786A (en) * | 2017-08-30 | 2018-02-13 | 南京理工大学 | A kind of method for detecting human face based on concatenated convolutional neutral net |
| CN107665355A (en) * | 2017-09-27 | 2018-02-06 | 重庆邮电大学 | A kind of agricultural pests detection method based on region convolutional neural networks |
| CN107784288A (en) * | 2017-10-30 | 2018-03-09 | 华南理工大学 | A kind of iteration positioning formula method for detecting human face based on deep neural network |
| CN107808142A (en) * | 2017-11-09 | 2018-03-16 | 北京小米移动软件有限公司 | Eyeglass detection method and device |
| CN107886074A (en) * | 2017-11-13 | 2018-04-06 | 苏州科达科技股份有限公司 | A kind of method for detecting human face and face detection system |
| CN107886074B (en) * | 2017-11-13 | 2020-05-19 | 苏州科达科技股份有限公司 | Face detection method and face detection system |
| CN107784294A (en) * | 2017-11-15 | 2018-03-09 | 武汉烽火众智数字技术有限责任公司 | A kind of persona face detection method based on deep learning |
| CN107784294B (en) * | 2017-11-15 | 2021-06-11 | 武汉烽火众智数字技术有限责任公司 | Face detection and tracking method based on deep learning |
| CN108363957A (en) * | 2018-01-19 | 2018-08-03 | 成都考拉悠然科技有限公司 | Road traffic sign detection based on cascade network and recognition methods |
| CN108509940A (en) * | 2018-04-20 | 2018-09-07 | 北京达佳互联信息技术有限公司 | Facial image tracking, device, computer equipment and storage medium |
| CN108960064A (en) * | 2018-06-01 | 2018-12-07 | 重庆锐纳达自动化技术有限公司 | A kind of Face datection and recognition methods based on convolutional neural networks |
| CN108921131B (en) * | 2018-07-26 | 2022-05-24 | 中国银联股份有限公司 | A method and device for generating a face detection model and a three-dimensional face image |
| CN108921131A (en) * | 2018-07-26 | 2018-11-30 | 中国银联股份有限公司 | A kind of method and device generating Face datection model, three-dimensional face images |
| CN109344740A (en) * | 2018-09-12 | 2019-02-15 | 上海了物网络科技有限公司 | Face identification system, method and computer readable storage medium |
| CN109376693A (en) * | 2018-11-22 | 2019-02-22 | 四川长虹电器股份有限公司 | Method for detecting human face and system |
| CN109635693A (en) * | 2018-12-03 | 2019-04-16 | 武汉烽火众智数字技术有限责任公司 | A kind of face image detection method and device |
| CN109635693B (en) * | 2018-12-03 | 2023-03-31 | 武汉烽火众智数字技术有限责任公司 | Front face image detection method and device |
| CN109389105A (en) * | 2018-12-20 | 2019-02-26 | 北京万里红科技股份有限公司 | A kind of iris detection and viewpoint classification method based on multitask |
| CN109389105B (en) * | 2018-12-20 | 2022-02-08 | 北京万里红科技有限公司 | Multitask-based iris detection and visual angle classification method |
| CN111382297A (en) * | 2018-12-29 | 2020-07-07 | 杭州海康存储科技有限公司 | Method and device for reporting user data of user side |
| CN111382297B (en) * | 2018-12-29 | 2024-05-17 | 杭州海康存储科技有限公司 | User side user data reporting method and device |
| CN109753931A (en) * | 2019-01-04 | 2019-05-14 | 广州广电卓识智能科技有限公司 | Convolutional neural network training method, system and face feature point detection method |
| CN110263852A (en) * | 2019-06-20 | 2019-09-20 | 北京字节跳动网络技术有限公司 | Data processing method, device and electronic equipment |
| CN110263852B (en) * | 2019-06-20 | 2021-10-08 | 北京字节跳动网络技术有限公司 | Data processing method and device and electronic equipment |
| CN110717481B (en) * | 2019-12-12 | 2020-04-07 | 浙江鹏信信息科技股份有限公司 | Method for realizing face detection by using cascaded convolutional neural network |
| CN110717481A (en) * | 2019-12-12 | 2020-01-21 | 浙江鹏信信息科技股份有限公司 | Method for realizing face detection by using cascaded convolutional neural network |
| CN111209819A (en) * | 2019-12-30 | 2020-05-29 | 新大陆数字技术股份有限公司 | Rotation-invariant face detection method, system equipment and readable storage medium |
| CN112232215B (en) * | 2020-10-16 | 2021-04-06 | 哈尔滨市科佳通用机电股份有限公司 | Railway wagon coupler yoke key joist falling fault detection method |
| CN112232215A (en) * | 2020-10-16 | 2021-01-15 | 哈尔滨市科佳通用机电股份有限公司 | Railway wagon coupler yoke key joist falling fault detection method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106874868B (en) | 2020-09-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106874868A (en) | A kind of method for detecting human face and system based on three-level convolutional neural networks | |
| CN114648665B (en) | Weak supervision target detection method and system | |
| CN113378686A (en) | Two-stage remote sensing target detection method based on target center point estimation | |
| CN112200143A (en) | Road disease detection method based on candidate area network and machine vision | |
| WO2019140767A1 (en) | Recognition system for security check and control method thereof | |
| CN111368690A (en) | Deep learning-based video image ship detection method and system under influence of sea waves | |
| CN116778277B (en) | Cross-domain model training method based on progressive information decoupling | |
| Wang et al. | Feature extraction and segmentation of pavement distress using an improved hybrid task cascade network | |
| CN113704276B (en) | Map updating method, device, electronic device and computer-readable storage medium | |
| CN112651996B (en) | Target detection tracking method, device, electronic equipment and storage medium | |
| CN118968035B (en) | A method for detecting small targets in target areas based on UAV images | |
| CN110929746A (en) | A deep neural network-based method for location, extraction and classification of electronic file titles | |
| CN116977710A (en) | Remote sensing image long tail distribution target semi-supervised detection method | |
| CN102024149B (en) | Method of object detection and training method of classifier in hierarchical object detector | |
| CN113343989B (en) | Target detection method and system based on self-adaption of foreground selection domain | |
| CN115170611A (en) | Complex intersection vehicle driving track analysis method, system and application | |
| CN118823736A (en) | A wildlife target detection method based on improved yolov8 and knowledge distillation | |
| Han et al. | MS-YOLOv8-based object detection method for pavement diseases | |
| Mu et al. | Small target detection in drone aerial images based on feature fusion | |
| CN112287895A (en) | Model construction method, recognition method and system for river drain outlet detection | |
| Chen et al. | All-in-one YOLO architecture for safety hazard detection of environment along high-speed railway | |
| CN107247967A (en) | A kind of vehicle window annual test mark detection method based on R CNN | |
| Bi et al. | DR-YOLO: An improved multi-scale small object detection model for drone aerial photography scenes based on YOLOv7 | |
| Dou et al. | Analysis of vehicle and pedestrian detection effects of improved YOLOv8 model in drone-assisted urban traffic monitoring system | |
| CN114863122A (en) | Intelligent high-precision pavement disease identification method based on artificial intelligence |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |