+

CN111079686B - Single-stage face detection and key point positioning method and system - Google Patents

Single-stage face detection and key point positioning method and system Download PDF

Info

Publication number
CN111079686B
CN111079686B CN201911358998.1A CN201911358998A CN111079686B CN 111079686 B CN111079686 B CN 111079686B CN 201911358998 A CN201911358998 A CN 201911358998A CN 111079686 B CN111079686 B CN 111079686B
Authority
CN
China
Prior art keywords
face
key point
frame
loss function
face detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911358998.1A
Other languages
Chinese (zh)
Other versions
CN111079686A (en
Inventor
黄明飞
姚宏贵
王普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Open Intelligent Machine Shanghai Co ltd
Original Assignee
Open Intelligent Machine Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Open Intelligent Machine Shanghai Co ltd filed Critical Open Intelligent Machine Shanghai Co ltd
Priority to CN201911358998.1A priority Critical patent/CN111079686B/en
Publication of CN111079686A publication Critical patent/CN111079686A/en
Application granted granted Critical
Publication of CN111079686B publication Critical patent/CN111079686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a single-stage face detection and key point positioning method and a system, which relate to the technical field of face detection and key point positioning and comprise the steps of marking a face image to obtain a marked image; training according to the annotation image to obtain a face detection and key point positioning fusion model; inputting the face picture to be detected of the current frame into a face detection and key point positioning fusion model to obtain the face detection frame of the current frame and the key point position of the face of the current frame; performing key point anti-shake processing on a next frame of face picture to be detected according to the current frame of face key point position to obtain a next frame of face detection frame and a next frame of face key point position; and when the total times of the key point anti-shake processing is not more than the times threshold, adopting the key point anti-shake processing and adopting a face detection and key point positioning fusion model to carry out face detection and key point positioning when the total times of the key point anti-shake processing is not more than the times threshold. The invention effectively improves the accuracy of face detection and key point positioning; the key point jitter is improved; the edge computing device is suitable for single-model deployment.

Description

Single-stage face detection and key point positioning method and system
Technical Field
The invention relates to the technical field of face detection and key point positioning, in particular to a single-stage face detection and key point positioning method and system.
Background
Face detection is a technique for automatically searching the position and size of a face in any input image, and key point positioning is a process for correctly positioning the position of a key point in a given face frame. In the face related field, face detection and key point positioning are pre-algorithms of many algorithms, such as face recognition, face beautification, face change, etc., so that face detection and key point positioning are vital in the face field.
At present, most face and key point detection methods are implemented step by step, namely, advanced face detection is performed, then key point detection is performed, a face detection algorithm is only responsible for face detection, a face key point algorithm is only responsible for key point positioning, the two algorithms are independent of each other and are not associated, the method ignores the internal relation between the two tasks, and the overall detection efficiency is low. In the prior art, the following problems exist in face detection and key point positioning: firstly, the accuracy problem is that the two algorithms are independently trained, so that the mutual supplementing promotion effect is avoided, and the accuracy is general. Secondly, the problem of key point jitter exists in the current key point positioning algorithm. Thirdly, the problem of deployment, some edge computing devices at present only support single model reasoning, for example: if two or more models are loaded on the development board of certain models of Haiyi, the reasoning speed is greatly reduced, the speed requirement cannot be met, and the landing becomes very difficult.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a single-stage face detection and key point positioning method, which specifically comprises the following steps:
step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;
step S2, training according to the labeling image to obtain a face detection and key point positioning fusion model;
step S3, inputting a face picture to be detected of a current frame in a video image into the face detection and key point positioning fusion model, obtaining a face detection frame of the current frame corresponding to the face picture to be detected of the current frame and the key point position of the face of the current frame, and outputting the face detection frame and the key point position;
step S4, performing key point anti-shake processing on the face picture to be detected of the next frame according to the position of the key point of the face of the current frame, and recording the total times of performing the key point anti-shake processing;
step S5, comparing the recorded total times with a preset times threshold value:
if the total times is not greater than the times threshold, turning to the step S6;
if the total times is larger than the times threshold, resetting the total times, and returning to the step S3;
step S6, the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected are directly obtained according to the processing result of the key point anti-shake processing, and are output as the positions of the face detection frame of the current frame and the face key point of the current frame, and then the step S4 is returned;
the above process continues until all frames of the video image have been processed.
Preferably, the face detection and key point positioning fusion model adopts a retinanet network structure, and a feature map output by a three-layer convolution layer in the retinanet network structure adopts a feature pyramid network structure.
Preferably, in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to conduct regression prediction of the face detection frame and prediction of the position of the key point of the face.
Preferably, in the training process of the face detection and key point positioning fusion model, in a feature map generated by convolution operation, the size of a receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame.
Preferably, the preset ratio is 1:1.
Preferably, the step S2 specifically includes:
step S21, inputting the labeling image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
step S22, respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
step S23, performing weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function, and comparing the total loss function with a preset loss function threshold value:
if the total loss function is not less than the loss function threshold, turning to step S24;
if the total loss function is smaller than the loss function threshold, turning to step S25;
step S24, training parameters in the initial fusion model are adjusted according to a preset learning rate, and then the step S21 is returned to continue a new training process;
and S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the same.
Preferably, in the step S4, the key point anti-shake processing specifically includes:
step A1, expanding the corresponding key point positions of the face in the next frame of face picture to be detected by preset times according to the key point positions of the face to obtain a face region picture;
step A2, verifying the face region picture according to a pre-generated face verification model, and judging whether the face region picture is a face or not according to a verification result:
if yes, turning to a step A3;
if not, exiting;
and step A3, tracking the human face by adopting a tracking algorithm to obtain the human face detection frame and the position of the key point of the human face corresponding to the human face picture to be detected in the next frame.
A single-stage face detection and key point positioning system, applying the single-stage face detection and key point positioning method described in any one of the above, the face detection and key point positioning system specifically comprising:
the data labeling module is used for acquiring a plurality of face images, labeling each face image and obtaining a labeling image with a real face frame and a real key point position;
the data training module is connected with the data labeling module and is used for training according to the labeling image to obtain a face detection and key point positioning fusion model;
the model prediction module is connected with the data training module and is used for inputting a face picture to be detected of a current frame into the face detection and key point positioning fusion model, obtaining a face detection frame corresponding to the face picture of the current frame and the position of a key point of the face and outputting the face detection frame and the position of the key point of the face;
the anti-shake processing module is connected with the model prediction module and is used for carrying out key point anti-shake processing on a next frame of face picture to be detected according to the face detection frame and the face key point position, and recording the total times of carrying out the key point anti-shake processing;
the data comparison module is connected with the anti-shake processing module and is used for comparing the recorded total times with a preset times threshold, generating a first comparison result when the total times are not more than the times threshold, and generating a second comparison result when the total times are more than the times threshold;
the first processing module is connected with the data comparison module, and is used for directly obtaining the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected according to the first comparison result and the processing result of the key point anti-shake processing, and outputting the positions as the positions of the current frame of face detection frame and the current frame of face key point;
and the second processing module is connected with the data comparison module and is used for resetting the total times according to the second comparison result.
Preferably, the data training module specifically includes:
the data prediction unit is used for inputting the marked image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
the first processing unit is connected with the data prediction unit and is used for respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
the second processing unit is connected with the first processing unit and is used for carrying out weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function;
the data comparison unit is connected with the second processing unit and is used for comparing the total loss function with a preset loss function threshold value, generating a first comparison result when the total loss function is not smaller than the loss function threshold value, and generating a second comparison result when the total loss function is smaller than the loss function threshold value;
the third processing unit is connected with the data comparison unit and is used for adjusting training parameters in the initial fusion model according to the first comparison result and a preset learning rate so as to continue a new training process;
and the fourth processing unit is connected with the data comparison unit and is used for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the initial fusion model.
Preferably, the anti-shake processing module specifically includes:
the image processing unit is used for expanding the corresponding face key point position in the next frame of face picture to be detected by a preset multiple according to the face key point position to obtain a face region picture;
the face verification unit is connected with the image processing unit and is used for verifying the face region picture according to a pre-generated face verification model and outputting a corresponding face verification result when the verification result shows that the face region picture is a face;
and the face tracking unit is connected with the face checking unit and is used for tracking the face by adopting a tracking algorithm to obtain the face detection frame and the face key point position corresponding to the face picture to be detected of the next frame.
The technical scheme has the following advantages or beneficial effects:
1) The face detection and the key point positioning are fused, and the face detection and the key point positioning are mutually promoted by adopting an end-to-end training mode, so that the accuracy of the face detection and the key point positioning is effectively improved;
2) By combining the face verification model and the tracking model, the problem of key point shake is effectively improved;
3) The face detection and the key point positioning are fused into one model, so that the reasoning speed is effectively improved, and the method is suitable for edge computing equipment only supporting single-model deployment.
Drawings
FIG. 1 is a flow chart of a single-stage face detection and key point positioning method according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of a training process of a face detection and key point localization fusion model according to a preferred embodiment of the present invention;
FIG. 3 is a schematic flow chart of the key point anti-shake processing process according to the preferred embodiment of the invention;
fig. 4 is a schematic structural diagram of a single-stage face detection and key point positioning system according to a preferred embodiment of the present invention.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present invention is not limited to the embodiment, and other embodiments may fall within the scope of the present invention as long as they conform to the gist of the present invention.
In a preferred embodiment of the present invention, based on the above-mentioned problems existing in the prior art, a single-stage face detection and key point positioning method is now provided, as shown in fig. 1, which specifically includes:
step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;
step S2, training according to the labeling image to obtain a face detection and key point positioning fusion model;
step S3, inputting a face picture to be detected of a current frame in a video image into a face detection and key point positioning fusion model, obtaining a face detection frame of the current frame corresponding to the face picture to be detected of the current frame and the key point position of the face of the current frame, and outputting the face detection frame and the key point position;
step S4, performing key point anti-shake processing on the next frame of face picture to be detected according to the key point position of the face of the current frame, and recording the total times of performing the key point anti-shake processing;
step S5, comparing the recorded total times with a preset times threshold value:
if the total times is not greater than the times threshold, turning to the step S6;
if the total times is greater than the times threshold, resetting the total times, and returning to the step S3;
step S6, the positions of a next frame of face detection frame and a next frame of face key point corresponding to the next frame of face picture to be detected are directly obtained according to the processing result of the key point anti-shake processing, and are used as the positions of the current frame of face detection frame and the current frame of face key point to be output, and then the step S4 is returned;
the above process is continuously executed until all the frame video images are processed.
The key point positioning method fuses the face detection and the key point positioning, adopts an end-to-end training mode, promotes the face detection and the key point positioning mutually, and effectively improves the accuracy of the face detection and the key point positioning; meanwhile, the problem of key point shake is effectively improved by combining the face verification model and the tracking model; furthermore, the face detection and the key point positioning are fused into one model, so that the reasoning speed is effectively improved, the method is suitable for edge computing equipment which only supports single-model deployment, and the problem that the whole reasoning speed can be very slow when the edge computing equipment loads more than one model is solved.
Further specifically, the technical scheme of the invention comprises the training process of a face detection and key point positioning fusion model:
firstly, training data are prepared, namely, a plurality of obtained face images are marked, and marked images with real face frames and real key point positions are obtained. In this embodiment, the above-described annotation image is preferably stored in a text document format. Specifically, a text document named train. Txt is created as a training set, wherein each line in train. Txt represents a piece of annotation image data. Each tagged image data preferably contains 6 points and picture paths of a face box (box) and a face key point (landmark), and the specific storage format is as follows:
Path/xxx.jpg x1,y1,x2,y2,ptx1,pty1,ptx2,pty2……ptx6,pty6
where Path represents a storage Path, xxx.jpg represents a name of a label image, x1, y1, x2, y2 represent face frame data, ptx1, pty1, ptx2, pty2 … … ptx6, pty6 represent 6 face key point data. x1, y1 until ptx6, pty represent the data of one face in the labeling image, and if one picture contains multiple faces, the process is repeated at the back. The training data of face detection and key point positioning are fused together to form a labeling file, so that the data processing and reading are convenient.
Secondly, presetting a network frame of a face detection and key point positioning fusion model, wherein the invention adopts a single-stage network frame as a basis, and preferably adopts a retinanet network structure, and the retinanet network structure has a single-stage detection network structure with a characteristic pyramid (fpn) structure. Because of the particularity of the human face, the ratio of anchor blocks (anchors) in the training process is preferably set to be 1:1, so that the phenomenon that the human face frames detected by the model are long, wide and not in line with the human face ratio is effectively avoided. In order to further improve the recall rate and the precision rate of face detection and key point positioning, in the feature map generated by convolution operation in the face detection and key point positioning fusion model training process, the size of the receptive field of each pixel point in the feature map in the corresponding face image is preferably set to be twice the size of the anchor point frame, so that the problem of poor detection precision caused by the fact that the feature map adopts a default value is avoided. In order to ensure that the face detection and key point positioning fusion model has good detection effect on the small face, the invention preferably selects the feature map of three layers behind a backbone network (backup) to perform up-sampling by adopting a feature pyramid (fpn), thereby realizing feature fusion and effectively improving the recall rate of the small face.
More preferably, in the training process of the face detection and key point positioning fusion model, after each training is finished, the method further comprises the steps of balancing positive and negative samples according to the prediction result of the training, and sending the obtained positive samples and negative samples to the next training. In this embodiment, for positive and negative samples of face detection, it is preferable to use the positive sample when the intersection ratio (iou) of the anchor frame (anchor) and the real face frame (gt) obtained by the training prediction is calculated to be greater than 0.5, and the negative sample when the intersection ratio (iou) of the anchor frame (anchor) and the real face frame (gt) is less than 0.3. For positive and negative samples of the key point positioning, preferably, when the intersection ratio (iou) of the anchor point frame (anchor) obtained by the training prediction and the real face frame (gt) is larger than 0.7, a loss function between the predicted key point positioning and the real key point position is calculated, so that the problems that the network is difficult to converge and the key point positioning is inaccurate due to the fact that the intersection ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) is too small are avoided.
Further specifically, in the training process of the face detection and key point positioning fusion model, four loss functions are preferably set as return values in the training process, so that network effectiveness is ensured. The four loss functions preferably include a first loss function between a face classification prediction result and a real face classification result included in a real face frame, a second loss function between a face regression prediction result and a real face region included in a real face frame, a third loss function between a face frame proportion prediction result and a preset proportion, and a fourth loss function between a key point prediction result and a real key point position. The first loss function preferably adopts a softmax function, the second loss function preferably adopts a smooth function, the third loss function preferably adopts an MSE function, and the fourth loss function ensures that the face box proportion is 1:1.
Preferably, the first, second, third and fourth loss functions are weighted and summed, respectively, to obtain a total loss function. The weight of the first loss function is preferably 1, the weight of the second loss function is preferably 1, the weight of the third loss function is preferably 0.5, and the weight of the fourth loss function is preferably 0.1. Because the weight setting of the fourth loss function is smaller, the overall effect of the network on face detection and key point positioning is not influenced while the face frame proportion is ensured. In the training process, the preset learning rate is preferably 0.0001, and after the training is finished, the model is fused by face detection and key point positioning.
Further, the face detection and key point positioning fusion model is adopted to carry out face detection and key point positioning on the picture to be detected of the video image, so that the problem of jitter in key point positioning caused by subjectivity of data marking and jitter of a face detection frame is solved.
Specifically, when face detection and key point positioning are performed on pictures to be detected of a plurality of continuous frames of video images, the pictures to be detected of a current frame are preferably used as detection starting nodes, the face detection and key point positioning fusion model is adopted for predicting the pictures to be detected of the current frame to obtain the face detection frame of the current frame and the key point position of the current frame corresponding to the pictures to be detected of the current frame, and the key point anti-shake processing is adopted for the pictures to be detected of the subsequent continuous frames to obtain the corresponding face detection frame and the corresponding key point position. Preferably, one-time face detection and key point positioning fusion model prediction is adopted, key point anti-shake processing is adopted for ten subsequent pictures to be detected, and then face detection and key point positioning fusion model prediction is adopted, so that the method is used for pushing.
In the preferred embodiment of the invention, the face detection and key point positioning fusion model adopts a retinanet network structure, and the feature map output by the three later convolution layers in the retinanet network structure adopts a feature pyramid network structure.
In the preferred embodiment of the invention, in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to carry out regression prediction of the face detection frame and prediction of the position of the key point of the face.
In the preferred embodiment of the invention, in the training process of the face detection and key point positioning fusion model, the size of the receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame in the feature map generated by convolution operation.
In a preferred embodiment of the present invention, the predetermined ratio is 1:1.
In a preferred embodiment of the present invention, as shown in fig. 2, step S2 specifically includes:
s21, inputting the labeling image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction results comprise a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
step S22, respectively calculating a first loss function between a face classification prediction result and a real face classification result contained in a real face frame, a second loss function between a face frame regression prediction result and a real face region contained in the real face frame, a third loss function between a face frame proportion prediction result and a preset proportion, and a fourth loss function between a key point prediction result and a real key point position;
step S23, the first loss function, the second loss function, the third loss function and the fourth loss function are weighted and summed to obtain a total loss function, and the total loss function is compared with a preset loss function threshold value:
if the total loss function is not less than the loss function threshold, turning to step S24;
if the total loss function is smaller than the loss function threshold, turning to step S25;
step S24, training parameters in the initial fusion model are adjusted according to a preset learning rate, and then step S21 is returned to continue the training process of a new round;
and S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the same.
In a preferred embodiment of the present invention, step S4, as shown in fig. 3, the key point anti-shake processing specifically includes:
step A1, expanding the corresponding key point positions of the face in the next frame of face picture to be detected by preset times according to the key point positions of the face to obtain a face region picture;
step A2, verifying the face region picture according to a pre-generated face verification model, and judging whether the face region picture is a face or not according to a verification result:
if yes, turning to a step A3;
if not, exiting;
and step A3, tracking the human face by adopting a tracking algorithm to obtain a human face detection frame and a human face key point position corresponding to the next frame of human face picture to be detected.
Specifically, in this embodiment, the amplitude of the key point shake can be greatly reduced by the key point shake prevention processing. The key point anti-shake processing mainly comprises two parts, namely a face checking module for judging whether the face is a face, wherein the face checking module is preferably realized by adopting a very simple two-class network and only needs a plurality of convolution layers; and secondly, a tracking algorithm is preferably adopted, and is used for carrying out face tracking to obtain a face detection frame when the face checking module judges that the face is the face, and the characteristic of high matching degree of the tracking algorithm ensures that the covered face positions of the face detection frames of two continuous frames are almost the same. The face area is enlarged by 1.5 times and is sent into a detection algorithm, so that the output picture is ensured not to have large change, and the accuracy of key point regression is ensured.
A single-stage face detection and key point positioning system, as shown in fig. 4, applying any one of the above single-stage face detection and key point positioning methods, the face detection and key point positioning system specifically includes:
the data labeling module 1 is used for acquiring a plurality of face images, labeling each face image and obtaining a labeling image with a real face frame and a real key point position;
the data training module 2 is connected with the data labeling module 1 and is used for training according to the labeling images to obtain a face detection and key point positioning fusion model;
the model prediction module 3 is connected with the data training module 2 and is used for inputting the face picture to be detected of the current frame into a face detection and key point positioning fusion model, obtaining the face detection frame and the face key point position corresponding to the face picture detected of the current frame and outputting the face detection frame and the face key point position;
the anti-shake processing module 4 is connected with the model prediction module 3 and is used for carrying out key point anti-shake processing on the next frame of face picture to be detected according to the face detection frame and the position of the key point of the face, and recording the total times of carrying out the key point anti-shake processing;
the data comparison module 5 is connected with the anti-shake processing module 4 and is used for comparing the recorded total times with a preset times threshold value, generating a first comparison result when the total times are not more than the times threshold value, and generating a second comparison result when the total times are more than the times threshold value;
the first processing module 6 is connected with the data comparing module 5, and is used for directly obtaining the positions of a next frame face detection frame and a next frame face key point corresponding to the next frame face picture to be detected according to the first comparing result and the processing result of the key point anti-shake processing, and outputting the positions as the positions of the current frame face detection frame and the current frame face key point;
the second processing module 7 is connected with the data comparison module 5 and is used for resetting the total times according to the second comparison result.
In a preferred embodiment of the present invention, the data training module 2 specifically includes:
the data prediction unit 21 is configured to input the labeling image into a pre-generated initial fusion model, so as to obtain a corresponding face detection prediction result and a corresponding key point prediction result;
the face detection prediction results comprise a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
a first processing unit 22, connected to the data prediction unit 21, configured to calculate a first loss function between the face classification prediction result and a real face classification result included in the real face frame, a second loss function between the face frame regression prediction result and a real face region included in the real face frame, a third loss function between the face frame proportion prediction result and a preset proportion, and a fourth loss function between the key point prediction result and a real key point position, respectively;
the second processing unit 23 is connected to the first processing unit 22, and is configured to perform weighted summation on the first loss function, the second loss function, the third loss function, and the fourth loss function, so as to obtain a total loss function;
a data comparing unit 24, connected to the second processing unit 23, for comparing the total loss function with a preset loss function threshold, and generating a first comparison result when the total loss function is not less than the loss function threshold, and generating a second comparison result when the total loss function is less than the loss function threshold;
the third processing unit 25 is connected with the data comparing unit 24 and is used for adjusting training parameters in the initial fusion model according to the first comparison result and the preset learning rate so as to continue the new training process;
and a fourth processing unit 26, connected to the data comparing unit 24, for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the same.
In a preferred embodiment of the present invention, the anti-shake processing module 4 specifically includes:
the image processing unit 41 is configured to enlarge a corresponding face key point position in a next frame of face picture to be detected by a preset multiple according to the face key point position, so as to obtain a face region picture;
a face verification unit 42 connected to the image processing unit 41, and configured to verify the face area picture according to a pre-generated face verification model, and output a corresponding face verification result when the verification result indicates that the face area picture is a face;
the face tracking unit 43 is connected to the face verification unit 42, and is configured to track a face by using a tracking algorithm, so as to obtain a processing result of the key point anti-shake processing, where the processing result includes a face detection frame corresponding to a next frame of face picture to be detected and a key point position of the face.
The data recording unit 44 is connected to the face tracking unit 43, and is configured to record the total number of times of performing the key point anti-shake processing according to the processing result.
The foregoing description is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, and it will be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the description and drawings, and are intended to be included within the scope of the present invention.

Claims (10)

1. A single-stage face detection and key point positioning method is characterized by comprising the following steps:
step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;
step S2, training according to the labeling image to obtain a face detection and key point positioning fusion model;
step S3, inputting a face picture to be detected of a current frame in a video image into the face detection and key point positioning fusion model, obtaining a face detection frame of the current frame corresponding to the face picture to be detected of the current frame and the key point position of the face of the current frame, and outputting the face detection frame and the key point position;
step S4, performing key point anti-shake processing on the face picture to be detected of the next frame according to the position of the key point of the face of the current frame, and recording the total times of performing the key point anti-shake processing;
step S5, comparing the recorded total times with a preset times threshold value:
if the total times is not greater than the times threshold, turning to the step S6;
if the total times is larger than the times threshold, resetting the total times, and returning to the step S3;
step S6, the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected are directly obtained according to the processing result of the key point anti-shake processing, and are output as the positions of the face detection frame of the current frame and the face key point of the current frame, and then the step S4 is returned;
the above process continues until all frames of the video image have been processed.
2. The single-phase face detection and keypoint location method according to claim 1, wherein the face detection and keypoint location fusion model adopts a retinanet network structure, and a feature map output by a three-layer convolution layer in the retinanet network structure adopts a feature pyramid network structure.
3. The single-stage face detection and key point positioning method according to claim 2, wherein in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is adopted to conduct regression prediction of the face detection frame and prediction of the position of the key point of the face.
4. A single-stage face detection and keypoint location method as claimed in claim 3, wherein in the training process of the face detection and keypoint location fusion model, the size of the receptive field of each pixel point in the corresponding face image is twice the size of the anchor point frame in a feature map generated by convolution operation.
5. A single-stage face detection and keypoint location method as claimed in claim 3, characterized in that the preset ratio is 1:1.
6. A single-phase face detection and keypoint locating method according to claim 3, characterized in that said step S2 comprises in particular:
step S21, inputting the labeling image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
step S22, respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
step S23, performing weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function, and comparing the total loss function with a preset loss function threshold value:
if the total loss function is not less than the loss function threshold, turning to step S24;
if the total loss function is smaller than the loss function threshold, turning to step S25;
step S24, training parameters in the initial fusion model are adjusted according to a preset learning rate, and then the step S21 is returned to continue a new training process;
and S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the same.
7. The single-stage face detection and keypoint location method according to claim 1, wherein the keypoint anti-shake processing in step S4 specifically comprises:
step A1, expanding the corresponding key point positions of the face in the next frame of face picture to be detected by preset times according to the key point positions of the face to obtain a face region picture;
step A2, verifying the face region picture according to a pre-generated face verification model, and judging whether the face region picture is a face or not according to a verification result:
if yes, turning to a step A3;
if not, exiting;
and step A3, tracking the human face by adopting a tracking algorithm to obtain the human face detection frame and the position of the key point of the human face corresponding to the human face picture to be detected in the next frame.
8. A single-stage face detection and keypoint location system, characterized in that a single-stage face detection and keypoint location method according to any one of claims 1-7 is applied, said face detection and keypoint location system comprising in particular:
the data labeling module is used for acquiring a plurality of face images, labeling each face image and obtaining a labeling image with a real face frame and a real key point position;
the data training module is connected with the data labeling module and is used for training according to the labeling image to obtain a face detection and key point positioning fusion model;
the model prediction module is connected with the data training module and is used for inputting a face picture to be detected of a current frame into the face detection and key point positioning fusion model, obtaining a face detection frame corresponding to the face picture of the current frame and the position of a key point of the face and outputting the face detection frame and the position of the key point of the face;
the anti-shake processing module is connected with the model prediction module and is used for carrying out key point anti-shake processing on a next frame of face picture to be detected according to the face detection frame and the face key point position, and recording the total times of carrying out the key point anti-shake processing;
the data comparison module is connected with the anti-shake processing module and is used for comparing the recorded total times with a preset times threshold, generating a first comparison result when the total times are not more than the times threshold, and generating a second comparison result when the total times are more than the times threshold;
the first processing module is connected with the data comparison module, and is used for directly obtaining the positions of a next frame of face detection frame and a next frame of face key point corresponding to the face picture to be detected according to the first comparison result and the processing result of the key point anti-shake processing, and outputting the positions as the positions of the current frame of face detection frame and the current frame of face key point;
and the second processing module is connected with the data comparison module and is used for resetting the total times according to the second comparison result.
9. The single-phase face detection and keypoint location system of claim 8, wherein the data training module comprises:
the data prediction unit is used for inputting the marked image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
the first processing unit is connected with the data prediction unit and is used for respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and a preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
the second processing unit is connected with the first processing unit and is used for carrying out weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function;
the data comparison unit is connected with the second processing unit and is used for comparing the total loss function with a preset loss function threshold value, generating a first comparison result when the total loss function is not smaller than the loss function threshold value, and generating a second comparison result when the total loss function is smaller than the loss function threshold value;
the third processing unit is connected with the data comparison unit and is used for adjusting training parameters in the initial fusion model according to the first comparison result and a preset learning rate so as to continue a new training process;
and the fourth processing unit is connected with the data comparison unit and is used for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the initial fusion model.
10. The single-phase face detection and keypoint location system of claim 8, wherein the anti-shake processing module comprises:
the image processing unit is used for expanding the corresponding face key point position in the next frame of face picture to be detected by a preset multiple according to the face key point position to obtain a face region picture;
the face verification unit is connected with the image processing unit and is used for verifying the face region picture according to a pre-generated face verification model and outputting a corresponding face verification result when the verification result shows that the face region picture is a face;
the face tracking unit is connected with the face checking unit and is used for tracking the face by adopting a tracking algorithm to obtain a processing result of the key point anti-shake processing of the face detection frame corresponding to the face picture to be detected and the key point position of the face;
and the data recording unit is connected with the face tracking unit and is used for recording the total times of the anti-shake processing of the key points according to the processing result.
CN201911358998.1A 2019-12-25 2019-12-25 Single-stage face detection and key point positioning method and system Active CN111079686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911358998.1A CN111079686B (en) 2019-12-25 2019-12-25 Single-stage face detection and key point positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911358998.1A CN111079686B (en) 2019-12-25 2019-12-25 Single-stage face detection and key point positioning method and system

Publications (2)

Publication Number Publication Date
CN111079686A CN111079686A (en) 2020-04-28
CN111079686B true CN111079686B (en) 2023-05-23

Family

ID=70317792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911358998.1A Active CN111079686B (en) 2019-12-25 2019-12-25 Single-stage face detection and key point positioning method and system

Country Status (1)

Country Link
CN (1) CN111079686B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783608B (en) * 2020-06-24 2024-03-19 南京烽火星空通信发展有限公司 Face-changing video detection method
CN111881876B (en) * 2020-08-06 2022-04-08 桂林电子科技大学 Attendance checking method based on single-order anchor-free detection network
CN111783749A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Face detection method and device, electronic equipment and storage medium
CN112241700A (en) * 2020-10-15 2021-01-19 希望银蕨智能科技有限公司 Multi-target forehead temperature measurement method for forehead accurate positioning
CN114627519A (en) * 2020-12-14 2022-06-14 阿里巴巴集团控股有限公司 Data processing method, apparatus, electronic device and storage medium
CN112949492A (en) * 2021-03-03 2021-06-11 南京视察者智能科技有限公司 Model series training method and device for face detection and key point detection and terminal equipment
CN113011356B (en) * 2021-03-26 2024-08-06 杭州网易智企科技有限公司 Face feature detection method and device, medium and electronic equipment
CN113449657B (en) * 2021-07-05 2022-08-30 中山大学 Method, system and medium for detecting depth-forged face video based on face key points
CN114257748B (en) * 2022-01-26 2024-08-27 Oppo广东移动通信有限公司 Video anti-shake method and device, computer readable medium and electronic equipment
CN114418901B (en) * 2022-03-30 2022-08-09 江西中业智能科技有限公司 Image beautifying processing method, system, storage medium and equipment based on Retinaface algorithm
CN114926876B (en) * 2022-04-26 2025-06-06 黑芝麻智能科技有限公司 Image key point detection method, device, computer equipment and storage medium
CN114924645A (en) * 2022-05-18 2022-08-19 上海庄生晓梦信息科技有限公司 Interaction method and system based on gesture recognition
CN115050129B (en) * 2022-06-27 2023-06-13 北京睿家科技有限公司 Data processing method and system for intelligent access control
CN117079337B (en) * 2023-10-17 2024-02-06 成都信息工程大学 A high-precision facial attribute feature recognition device and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170864A1 (en) * 2017-03-20 2018-09-27 成都通甲优博科技有限责任公司 Face recognition and tracking method
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning
CN109977775A (en) * 2019-02-25 2019-07-05 腾讯科技(深圳)有限公司 Critical point detection method, apparatus, equipment and readable storage medium storing program for executing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170864A1 (en) * 2017-03-20 2018-09-27 成都通甲优博科技有限责任公司 Face recognition and tracking method
CN109977775A (en) * 2019-02-25 2019-07-05 腾讯科技(深圳)有限公司 Critical point detection method, apparatus, equipment and readable storage medium storing program for executing
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐威威 ; 李俊 ; .一种鲁棒的人脸关键点实时跟踪方法.计算机工程.2017,(04),全文. *

Also Published As

Publication number Publication date
CN111079686A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN111079686B (en) Single-stage face detection and key point positioning method and system
Chen et al. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system
CN107481327B (en) About the processing method of augmented reality scene, device, terminal device and system
CN111324774B (en) Video duplicate removal method and device
CN113689440B (en) Video processing method, device, computer equipment and storage medium
CN109977921B (en) Method for detecting hidden danger of power transmission line
US20240362746A1 (en) Modifying sensor data using generative adversarial models
CN112200041B (en) Video motion recognition method and device, storage medium and electronic equipment
US20210176174A1 (en) Load balancing device and method for an edge computing network
CN102799890A (en) Image clustering method
CN112085768B (en) Optical flow information prediction method, optical flow information prediction device, electronic equipment and storage medium
CN109919110A (en) Video attention area detection method, device and equipment
CN107077507A (en) A kind of information-pushing method, device and system
CN110827292A (en) Video instance segmentation method and device based on convolutional neural network
CN110096617A (en) Video classification methods, device, electronic equipment and computer readable storage medium
CN112580581A (en) Target detection method and device and electronic equipment
CN112132167A (en) Image generation and neural network training method, apparatus, device, and medium
CN115616937A (en) Automatic driving simulation test method, device, equipment and computer readable medium
CN118351435A (en) A method and device for detecting target in UAV remote sensing images based on lightweight model LTE-Det
CN110223320A (en) Object detection tracking and detecting and tracking device
CN113779366B (en) Automatic optimization deployment method and device for neural network architecture for automatic driving
US20180307910A1 (en) Evaluation of models generated from objects in video
CN118279348A (en) Method and related equipment for detecting and tracking moving target based on video image
CN115457105B (en) Depth information acquisition method and device, electronic equipment and storage medium
WO2015093385A1 (en) Album generation device, album generation method, album generation program and recording medium that stores program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载