CN111079686B

CN111079686B - Single-stage face detection and key point positioning method and system

Info

Publication number: CN111079686B
Application number: CN201911358998.1A
Authority: CN
Inventors: 黄明飞; 姚宏贵; 王普
Original assignee: Open Intelligent Machine Shanghai Co ltd
Current assignee: Open Intelligent Machine Shanghai Co ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2023-05-23
Anticipated expiration: 2039-12-25
Also published as: CN111079686A

Abstract

The invention provides a single-stage face detection and key point positioning method and a system, which relate to the technical field of face detection and key point positioning and comprise the steps of marking a face image to obtain a marked image; training according to the annotation image to obtain a face detection and key point positioning fusion model; inputting the face picture to be detected of the current frame into a face detection and key point positioning fusion model to obtain the face detection frame of the current frame and the key point position of the face of the current frame; performing key point anti-shake processing on a next frame of face picture to be detected according to the current frame of face key point position to obtain a next frame of face detection frame and a next frame of face key point position; and when the total times of the key point anti-shake processing is not more than the times threshold, adopting the key point anti-shake processing and adopting a face detection and key point positioning fusion model to carry out face detection and key point positioning when the total times of the key point anti-shake processing is not more than the times threshold. The invention effectively improves the accuracy of face detection and key point positioning; the key point jitter is improved; the edge computing device is suitable for single-model deployment.

Description

Single-stage face detection and key point positioning method and system

Technical Field

The invention relates to the technical field of face detection and key point positioning, in particular to a single-stage face detection and key point positioning method and system.

Background

Face detection is a technique for automatically searching the position and size of a face in any input image, and key point positioning is a process for correctly positioning the position of a key point in a given face frame. In the face related field, face detection and key point positioning are pre-algorithms of many algorithms, such as face recognition, face beautification, face change, etc., so that face detection and key point positioning are vital in the face field.

At present, most face and key point detection methods are implemented step by step, namely, advanced face detection is performed, then key point detection is performed, a face detection algorithm is only responsible for face detection, a face key point algorithm is only responsible for key point positioning, the two algorithms are independent of each other and are not associated, the method ignores the internal relation between the two tasks, and the overall detection efficiency is low. In the prior art, the following problems exist in face detection and key point positioning: firstly, the accuracy problem is that the two algorithms are independently trained, so that the mutual supplementing promotion effect is avoided, and the accuracy is general. Secondly, the problem of key point jitter exists in the current key point positioning algorithm. Thirdly, the problem of deployment, some edge computing devices at present only support single model reasoning, for example: if two or more models are loaded on the development board of certain models of Haiyi, the reasoning speed is greatly reduced, the speed requirement cannot be met, and the landing becomes very difficult.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a single-stage face detection and key point positioning method, which specifically comprises the following steps:

step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;

step S2, training according to the labeling image to obtain a face detection and key point positioning fusion model;

step S3, inputting a face picture to be detected of a current frame in a video image into the face detection and key point positioning fusion model, obtaining a face detection frame of the current frame corresponding to the face picture to be detected of the current frame and the key point position of the face of the current frame, and outputting the face detection frame and the key point position;

step S4, performing key point anti-shake processing on the face picture to be detected of the next frame according to the position of the key point of the face of the current frame, and recording the total times of performing the key point anti-shake processing;

step S5, comparing the recorded total times with a preset times threshold value:

if the total times is not greater than the times threshold, turning to the step S6;

if the total times is larger than the times threshold, resetting the total times, and returning to the step S3;

step S6, the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected are directly obtained according to the processing result of the key point anti-shake processing, and are output as the positions of the face detection frame of the current frame and the face key point of the current frame, and then the step S4 is returned;

the above process continues until all frames of the video image have been processed.

Preferably, the face detection and key point positioning fusion model adopts a retinanet network structure, and a feature map output by a three-layer convolution layer in the retinanet network structure adopts a feature pyramid network structure.

Preferably, in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to conduct regression prediction of the face detection frame and prediction of the position of the key point of the face.

Preferably, in the training process of the face detection and key point positioning fusion model, in a feature map generated by convolution operation, the size of a receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame.

Preferably, the preset ratio is 1:1.

Preferably, the step S2 specifically includes:

step S21, inputting the labeling image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;

the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;

step S22, respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;

step S23, performing weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function, and comparing the total loss function with a preset loss function threshold value:

if the total loss function is not less than the loss function threshold, turning to step S24;

if the total loss function is smaller than the loss function threshold, turning to step S25;

step S24, training parameters in the initial fusion model are adjusted according to a preset learning rate, and then the step S21 is returned to continue a new training process;

and S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the same.

Preferably, in the step S4, the key point anti-shake processing specifically includes:

step A1, expanding the corresponding key point positions of the face in the next frame of face picture to be detected by preset times according to the key point positions of the face to obtain a face region picture;

step A2, verifying the face region picture according to a pre-generated face verification model, and judging whether the face region picture is a face or not according to a verification result:

if yes, turning to a step A3;

if not, exiting;

and step A3, tracking the human face by adopting a tracking algorithm to obtain the human face detection frame and the position of the key point of the human face corresponding to the human face picture to be detected in the next frame.

A single-stage face detection and key point positioning system, applying the single-stage face detection and key point positioning method described in any one of the above, the face detection and key point positioning system specifically comprising:

the data labeling module is used for acquiring a plurality of face images, labeling each face image and obtaining a labeling image with a real face frame and a real key point position;

the data training module is connected with the data labeling module and is used for training according to the labeling image to obtain a face detection and key point positioning fusion model;

the model prediction module is connected with the data training module and is used for inputting a face picture to be detected of a current frame into the face detection and key point positioning fusion model, obtaining a face detection frame corresponding to the face picture of the current frame and the position of a key point of the face and outputting the face detection frame and the position of the key point of the face;

the anti-shake processing module is connected with the model prediction module and is used for carrying out key point anti-shake processing on a next frame of face picture to be detected according to the face detection frame and the face key point position, and recording the total times of carrying out the key point anti-shake processing;

the data comparison module is connected with the anti-shake processing module and is used for comparing the recorded total times with a preset times threshold, generating a first comparison result when the total times are not more than the times threshold, and generating a second comparison result when the total times are more than the times threshold;

the first processing module is connected with the data comparison module, and is used for directly obtaining the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected according to the first comparison result and the processing result of the key point anti-shake processing, and outputting the positions as the positions of the current frame of face detection frame and the current frame of face key point;

and the second processing module is connected with the data comparison module and is used for resetting the total times according to the second comparison result.

Preferably, the data training module specifically includes:

the data prediction unit is used for inputting the marked image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;

the first processing unit is connected with the data prediction unit and is used for respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;

the second processing unit is connected with the first processing unit and is used for carrying out weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function;

the data comparison unit is connected with the second processing unit and is used for comparing the total loss function with a preset loss function threshold value, generating a first comparison result when the total loss function is not smaller than the loss function threshold value, and generating a second comparison result when the total loss function is smaller than the loss function threshold value;

the third processing unit is connected with the data comparison unit and is used for adjusting training parameters in the initial fusion model according to the first comparison result and a preset learning rate so as to continue a new training process;

and the fourth processing unit is connected with the data comparison unit and is used for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the initial fusion model.

Preferably, the anti-shake processing module specifically includes:

the image processing unit is used for expanding the corresponding face key point position in the next frame of face picture to be detected by a preset multiple according to the face key point position to obtain a face region picture;

the face verification unit is connected with the image processing unit and is used for verifying the face region picture according to a pre-generated face verification model and outputting a corresponding face verification result when the verification result shows that the face region picture is a face;

and the face tracking unit is connected with the face checking unit and is used for tracking the face by adopting a tracking algorithm to obtain the face detection frame and the face key point position corresponding to the face picture to be detected of the next frame.

The technical scheme has the following advantages or beneficial effects:

1) The face detection and the key point positioning are fused, and the face detection and the key point positioning are mutually promoted by adopting an end-to-end training mode, so that the accuracy of the face detection and the key point positioning is effectively improved;

2) By combining the face verification model and the tracking model, the problem of key point shake is effectively improved;

3) The face detection and the key point positioning are fused into one model, so that the reasoning speed is effectively improved, and the method is suitable for edge computing equipment only supporting single-model deployment.

Drawings

FIG. 1 is a flow chart of a single-stage face detection and key point positioning method according to a preferred embodiment of the present invention;

FIG. 2 is a flow chart of a training process of a face detection and key point localization fusion model according to a preferred embodiment of the present invention;

FIG. 3 is a schematic flow chart of the key point anti-shake processing process according to the preferred embodiment of the invention;

fig. 4 is a schematic structural diagram of a single-stage face detection and key point positioning system according to a preferred embodiment of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present invention is not limited to the embodiment, and other embodiments may fall within the scope of the present invention as long as they conform to the gist of the present invention.

In a preferred embodiment of the present invention, based on the above-mentioned problems existing in the prior art, a single-stage face detection and key point positioning method is now provided, as shown in fig. 1, which specifically includes:

step S3, inputting a face picture to be detected of a current frame in a video image into a face detection and key point positioning fusion model, obtaining a face detection frame of the current frame corresponding to the face picture to be detected of the current frame and the key point position of the face of the current frame, and outputting the face detection frame and the key point position;

step S4, performing key point anti-shake processing on the next frame of face picture to be detected according to the key point position of the face of the current frame, and recording the total times of performing the key point anti-shake processing;

if the total times is greater than the times threshold, resetting the total times, and returning to the step S3;

step S6, the positions of a next frame of face detection frame and a next frame of face key point corresponding to the next frame of face picture to be detected are directly obtained according to the processing result of the key point anti-shake processing, and are used as the positions of the current frame of face detection frame and the current frame of face key point to be output, and then the step S4 is returned;

the above process is continuously executed until all the frame video images are processed.

The key point positioning method fuses the face detection and the key point positioning, adopts an end-to-end training mode, promotes the face detection and the key point positioning mutually, and effectively improves the accuracy of the face detection and the key point positioning; meanwhile, the problem of key point shake is effectively improved by combining the face verification model and the tracking model; furthermore, the face detection and the key point positioning are fused into one model, so that the reasoning speed is effectively improved, the method is suitable for edge computing equipment which only supports single-model deployment, and the problem that the whole reasoning speed can be very slow when the edge computing equipment loads more than one model is solved.

Further specifically, the technical scheme of the invention comprises the training process of a face detection and key point positioning fusion model:

firstly, training data are prepared, namely, a plurality of obtained face images are marked, and marked images with real face frames and real key point positions are obtained. In this embodiment, the above-described annotation image is preferably stored in a text document format. Specifically, a text document named train. Txt is created as a training set, wherein each line in train. Txt represents a piece of annotation image data. Each tagged image data preferably contains 6 points and picture paths of a face box (box) and a face key point (landmark), and the specific storage format is as follows:

Path/xxx.jpg x1,y1,x2,y2,ptx1,pty1,ptx2,pty2……ptx6,pty6

where Path represents a storage Path, xxx.jpg represents a name of a label image, x1, y1, x2, y2 represent face frame data, ptx1, pty1, ptx2, pty2 … … ptx6, pty6 represent 6 face key point data. x1, y1 until ptx6, pty represent the data of one face in the labeling image, and if one picture contains multiple faces, the process is repeated at the back. The training data of face detection and key point positioning are fused together to form a labeling file, so that the data processing and reading are convenient.

Secondly, presetting a network frame of a face detection and key point positioning fusion model, wherein the invention adopts a single-stage network frame as a basis, and preferably adopts a retinanet network structure, and the retinanet network structure has a single-stage detection network structure with a characteristic pyramid (fpn) structure. Because of the particularity of the human face, the ratio of anchor blocks (anchors) in the training process is preferably set to be 1:1, so that the phenomenon that the human face frames detected by the model are long, wide and not in line with the human face ratio is effectively avoided. In order to further improve the recall rate and the precision rate of face detection and key point positioning, in the feature map generated by convolution operation in the face detection and key point positioning fusion model training process, the size of the receptive field of each pixel point in the feature map in the corresponding face image is preferably set to be twice the size of the anchor point frame, so that the problem of poor detection precision caused by the fact that the feature map adopts a default value is avoided. In order to ensure that the face detection and key point positioning fusion model has good detection effect on the small face, the invention preferably selects the feature map of three layers behind a backbone network (backup) to perform up-sampling by adopting a feature pyramid (fpn), thereby realizing feature fusion and effectively improving the recall rate of the small face.

More preferably, in the training process of the face detection and key point positioning fusion model, after each training is finished, the method further comprises the steps of balancing positive and negative samples according to the prediction result of the training, and sending the obtained positive samples and negative samples to the next training. In this embodiment, for positive and negative samples of face detection, it is preferable to use the positive sample when the intersection ratio (iou) of the anchor frame (anchor) and the real face frame (gt) obtained by the training prediction is calculated to be greater than 0.5, and the negative sample when the intersection ratio (iou) of the anchor frame (anchor) and the real face frame (gt) is less than 0.3. For positive and negative samples of the key point positioning, preferably, when the intersection ratio (iou) of the anchor point frame (anchor) obtained by the training prediction and the real face frame (gt) is larger than 0.7, a loss function between the predicted key point positioning and the real key point position is calculated, so that the problems that the network is difficult to converge and the key point positioning is inaccurate due to the fact that the intersection ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) is too small are avoided.

Further specifically, in the training process of the face detection and key point positioning fusion model, four loss functions are preferably set as return values in the training process, so that network effectiveness is ensured. The four loss functions preferably include a first loss function between a face classification prediction result and a real face classification result included in a real face frame, a second loss function between a face regression prediction result and a real face region included in a real face frame, a third loss function between a face frame proportion prediction result and a preset proportion, and a fourth loss function between a key point prediction result and a real key point position. The first loss function preferably adopts a softmax function, the second loss function preferably adopts a smooth function, the third loss function preferably adopts an MSE function, and the fourth loss function ensures that the face box proportion is 1:1.

Preferably, the first, second, third and fourth loss functions are weighted and summed, respectively, to obtain a total loss function. The weight of the first loss function is preferably 1, the weight of the second loss function is preferably 1, the weight of the third loss function is preferably 0.5, and the weight of the fourth loss function is preferably 0.1. Because the weight setting of the fourth loss function is smaller, the overall effect of the network on face detection and key point positioning is not influenced while the face frame proportion is ensured. In the training process, the preset learning rate is preferably 0.0001, and after the training is finished, the model is fused by face detection and key point positioning.

Further, the face detection and key point positioning fusion model is adopted to carry out face detection and key point positioning on the picture to be detected of the video image, so that the problem of jitter in key point positioning caused by subjectivity of data marking and jitter of a face detection frame is solved.

Specifically, when face detection and key point positioning are performed on pictures to be detected of a plurality of continuous frames of video images, the pictures to be detected of a current frame are preferably used as detection starting nodes, the face detection and key point positioning fusion model is adopted for predicting the pictures to be detected of the current frame to obtain the face detection frame of the current frame and the key point position of the current frame corresponding to the pictures to be detected of the current frame, and the key point anti-shake processing is adopted for the pictures to be detected of the subsequent continuous frames to obtain the corresponding face detection frame and the corresponding key point position. Preferably, one-time face detection and key point positioning fusion model prediction is adopted, key point anti-shake processing is adopted for ten subsequent pictures to be detected, and then face detection and key point positioning fusion model prediction is adopted, so that the method is used for pushing.

In the preferred embodiment of the invention, the face detection and key point positioning fusion model adopts a retinanet network structure, and the feature map output by the three later convolution layers in the retinanet network structure adopts a feature pyramid network structure.

In the preferred embodiment of the invention, in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to carry out regression prediction of the face detection frame and prediction of the position of the key point of the face.

In the preferred embodiment of the invention, in the training process of the face detection and key point positioning fusion model, the size of the receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame in the feature map generated by convolution operation.

In a preferred embodiment of the present invention, the predetermined ratio is 1:1.

In a preferred embodiment of the present invention, as shown in fig. 2, step S2 specifically includes:

s21, inputting the labeling image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;

the face detection prediction results comprise a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;

step S22, respectively calculating a first loss function between a face classification prediction result and a real face classification result contained in a real face frame, a second loss function between a face frame regression prediction result and a real face region contained in the real face frame, a third loss function between a face frame proportion prediction result and a preset proportion, and a fourth loss function between a key point prediction result and a real key point position;

step S23, the first loss function, the second loss function, the third loss function and the fourth loss function are weighted and summed to obtain a total loss function, and the total loss function is compared with a preset loss function threshold value:

step S24, training parameters in the initial fusion model are adjusted according to a preset learning rate, and then step S21 is returned to continue the training process of a new round;

In a preferred embodiment of the present invention, step S4, as shown in fig. 3, the key point anti-shake processing specifically includes:

if yes, turning to a step A3;

if not, exiting;

and step A3, tracking the human face by adopting a tracking algorithm to obtain a human face detection frame and a human face key point position corresponding to the next frame of human face picture to be detected.

Specifically, in this embodiment, the amplitude of the key point shake can be greatly reduced by the key point shake prevention processing. The key point anti-shake processing mainly comprises two parts, namely a face checking module for judging whether the face is a face, wherein the face checking module is preferably realized by adopting a very simple two-class network and only needs a plurality of convolution layers; and secondly, a tracking algorithm is preferably adopted, and is used for carrying out face tracking to obtain a face detection frame when the face checking module judges that the face is the face, and the characteristic of high matching degree of the tracking algorithm ensures that the covered face positions of the face detection frames of two continuous frames are almost the same. The face area is enlarged by 1.5 times and is sent into a detection algorithm, so that the output picture is ensured not to have large change, and the accuracy of key point regression is ensured.

A single-stage face detection and key point positioning system, as shown in fig. 4, applying any one of the above single-stage face detection and key point positioning methods, the face detection and key point positioning system specifically includes:

the data labeling module 1 is used for acquiring a plurality of face images, labeling each face image and obtaining a labeling image with a real face frame and a real key point position;

the data training module 2 is connected with the data labeling module 1 and is used for training according to the labeling images to obtain a face detection and key point positioning fusion model;

the model prediction module 3 is connected with the data training module 2 and is used for inputting the face picture to be detected of the current frame into a face detection and key point positioning fusion model, obtaining the face detection frame and the face key point position corresponding to the face picture detected of the current frame and outputting the face detection frame and the face key point position;

the anti-shake processing module 4 is connected with the model prediction module 3 and is used for carrying out key point anti-shake processing on the next frame of face picture to be detected according to the face detection frame and the position of the key point of the face, and recording the total times of carrying out the key point anti-shake processing;

the data comparison module 5 is connected with the anti-shake processing module 4 and is used for comparing the recorded total times with a preset times threshold value, generating a first comparison result when the total times are not more than the times threshold value, and generating a second comparison result when the total times are more than the times threshold value;

the first processing module 6 is connected with the data comparing module 5, and is used for directly obtaining the positions of a next frame face detection frame and a next frame face key point corresponding to the next frame face picture to be detected according to the first comparing result and the processing result of the key point anti-shake processing, and outputting the positions as the positions of the current frame face detection frame and the current frame face key point;

the second processing module 7 is connected with the data comparison module 5 and is used for resetting the total times according to the second comparison result.

In a preferred embodiment of the present invention, the data training module 2 specifically includes:

the data prediction unit 21 is configured to input the labeling image into a pre-generated initial fusion model, so as to obtain a corresponding face detection prediction result and a corresponding key point prediction result;

a first processing unit 22, connected to the data prediction unit 21, configured to calculate a first loss function between the face classification prediction result and a real face classification result included in the real face frame, a second loss function between the face frame regression prediction result and a real face region included in the real face frame, a third loss function between the face frame proportion prediction result and a preset proportion, and a fourth loss function between the key point prediction result and a real key point position, respectively;

the second processing unit 23 is connected to the first processing unit 22, and is configured to perform weighted summation on the first loss function, the second loss function, the third loss function, and the fourth loss function, so as to obtain a total loss function;

a data comparing unit 24, connected to the second processing unit 23, for comparing the total loss function with a preset loss function threshold, and generating a first comparison result when the total loss function is not less than the loss function threshold, and generating a second comparison result when the total loss function is less than the loss function threshold;

the third processing unit 25 is connected with the data comparing unit 24 and is used for adjusting training parameters in the initial fusion model according to the first comparison result and the preset learning rate so as to continue the new training process;

and a fourth processing unit 26, connected to the data comparing unit 24, for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the same.

In a preferred embodiment of the present invention, the anti-shake processing module 4 specifically includes:

the image processing unit 41 is configured to enlarge a corresponding face key point position in a next frame of face picture to be detected by a preset multiple according to the face key point position, so as to obtain a face region picture;

a face verification unit 42 connected to the image processing unit 41, and configured to verify the face area picture according to a pre-generated face verification model, and output a corresponding face verification result when the verification result indicates that the face area picture is a face;

the face tracking unit 43 is connected to the face verification unit 42, and is configured to track a face by using a tracking algorithm, so as to obtain a processing result of the key point anti-shake processing, where the processing result includes a face detection frame corresponding to a next frame of face picture to be detected and a key point position of the face.

The data recording unit 44 is connected to the face tracking unit 43, and is configured to record the total number of times of performing the key point anti-shake processing according to the processing result.

The foregoing description is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, and it will be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the description and drawings, and are intended to be included within the scope of the present invention.

Claims

1. A single-stage face detection and key point positioning method is characterized by comprising the following steps:

2. The single-phase face detection and keypoint location method according to claim 1, wherein the face detection and keypoint location fusion model adopts a retinanet network structure, and a feature map output by a three-layer convolution layer in the retinanet network structure adopts a feature pyramid network structure.

3. The single-stage face detection and key point positioning method according to claim 2, wherein in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to conduct regression prediction of the face detection frame and prediction of the position of the key point of the face.

4. A single-stage face detection and keypoint location method as claimed in claim 3, wherein in the training process of the face detection and keypoint location fusion model, the size of the receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame in a feature map generated by convolution operation.

5. A single-stage face detection and keypoint location method as claimed in claim 3, characterized in that the preset ratio is 1:1.

6. A single-phase face detection and keypoint locating method according to claim 3, characterized in that said step S2 comprises in particular:

7. The single-stage face detection and keypoint location method according to claim 1, wherein the keypoint anti-shake processing in step S4 specifically comprises:

if yes, turning to a step A3;

if not, exiting;

8. A single-stage face detection and keypoint location system, characterized in that a single-stage face detection and keypoint location method according to any one of claims 1-7 is applied, said face detection and keypoint location system comprising in particular:

9. The single-phase face detection and keypoint location system of claim 8, wherein the data training module comprises:

the first processing unit is connected with the data prediction unit and is used for respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and a preset proportion, and a fourth loss function between the key point prediction result and the real key point position;

10. The single-phase face detection and keypoint location system of claim 8, wherein the anti-shake processing module comprises:

the face tracking unit is connected with the face checking unit and is used for tracking the face by adopting a tracking algorithm to obtain a processing result of the key point anti-shake processing of the face detection frame corresponding to the face picture to be detected and the key point position of the face;

and the data recording unit is connected with the face tracking unit and is used for recording the total times of the anti-shake processing of the key points according to the processing result.