Disclosure of Invention
Aiming at the problems, the invention provides a method for identifying and scoring a training action of a queue type parachuting, which specifically comprises the following steps:
S1, setting a depth camera for acquiring a video and a point cloud image of parachuting training;
S2, starting a depth camera when training is started, and calibrating camera parameters aiming at a training area to obtain a calibration space as small as possible;
S3, acquiring a video stream and point cloud information corresponding to each frame through a depth camera after training starts to obtain continuous time sequence data;
S4, searching for a training target to lock according to the gesture action, entering S5 if the target is locked, otherwise, recording the stage in a data sequence synchronous with the video as irrelevant stage 0, setting the key point data as 0 matrix of 25 x 3, and repeating S4;
s5, continuously tracking the locked target, recording the three-dimensional coordinates corresponding to the posture key point coordinates of each frame of the target in the N s th frame when the locking is started, recording the stage of the motion of each frame of the target recognized by the motion recognition model, judging whether the target leaves a calibration space, recording the step S6 after the N e th frame when the target leaves if the target leaves the calibration space, and otherwise repeating the step S5;
s6, scoring the target and judging whether training is finished, if so, entering S7, otherwise, returning to S4;
And S7, dividing the obtained video into respective training fragments of single students, and outputting the training fragments after associating respective scoring reports.
Further, in S1, the parachuting process is decomposed into three parts of cabin exit, landing and landing, and then the position of the depth camera is correspondingly set according to different parts.
Furthermore, the specific method for obtaining the calibration space in the S2 is that the initial position of the trainee to be trained is calibrated in the area of the two-dimensional image through the depth camera, and meanwhile the calibration area just covers the trainee to be trained in the training process.
Further, in S4, the specific method for searching for the training target to lock according to the gesture action is as follows:
Tracking all people in a current picture by using a target tracking model, endowing each person with a unique ID j, and obtaining a Mask which can only cover the person, wherein the Mask is a binary matrix with the same resolution as a picture acquired by a camera and is used for coding an area outside a target, and simultaneously acquiring a spatial position P j of the person according to point cloud information, wherein the spatial position is based on three-dimensional coordinates (x j,yj,zj) with the camera as an origin;
Selecting personnel appearing in the calibration space, coding a non-personnel area by using a Mask, identifying 25 key point two-dimensional coordinates (x p',yp') of each personnel by using a gesture key point identification model, and finally representing the gesture of each personnel by using a matrix of 25 x 2;
Converting the obtained 25 x 2 matrix into a vector, and judging the training stage of the vector through an action stage identification model, wherein the classification labels of the action stage identification model are that a classification 1 is a preparation stage, a classification 2 is a jump stage, a classification 3 is a landing stage and a classification 0 is an irrelevant action;
Searching the personnel with highest confidence from the personnel in the calibration space, wherein the personnel is judged to be in a preparation stage, namely the class 1 by the action stage identification model;
If the personnel in the preparation stage is selected, the personnel is used as a training target and records the ID t of the personnel and the N s frame acquired by the video at the moment, the target is locked, and if the personnel in the preparation stage is not satisfied, the 25 x 2 zero matrix, the 25 x 3 zero matrix, the class 0 and the unlocked target are recorded in the time sequence data.
Further, the specific method for continuously tracking the locked target in S5 is as follows:
Acquiring a Mask of the target and a spatial position P t(xt,yt,zt according to the ID t, and coding the image only by using the Mask;
Acquiring a 25 x 2 key point coordinate matrix of a target through a gesture key point identification model;
mapping the two-dimensional key point coordinates to point cloud information to obtain a corresponding 25 x 3 three-dimensional key point coordinate matrix;
identifying the training stage of the motion stage identification model through the motion stage identification model;
and recording the obtained two-dimensional coordinate matrix, three-dimensional coordinate matrix and the stage in time sequence data.
Further, the specific method for scoring the target in S6 is as follows:
Performing phase sequence smoothing on the locked data in the data sequence interval [ N s,Ne ] acquired when the training target leaves the calibration space:
Setting a new sequence S ' for storing a stage smoothing result, utilizing a sliding window with the size of 5 to slide from head to head in the stage sequence, counting the number of each classified label in the sliding window, recording the stage with the largest number in the corresponding position of the third position in the middle of the sliding window in the S ', wherein the sliding window takes the stage of the middle [ N 1,N5 ] in the S, the position in the middle of the sliding window is actually corresponding to N 3, so that the smoothing result is recorded in the N 3 position of the S ';
carrying out smoothing treatment on the S 'again to obtain S';
Taking out the longest subsequence of S 'with the stage categories arranged from small to large and the first number being 1 and the last number being 3, recording the N is' frame and the N ie 'frame of the actual positions corresponding to the first position and the last position of the subsequence, wherein the video and the data sequence of the interval [ N is',Nie' ] are the actual fragments of the tracked training target, and if no subsequence is regular, the video and the data sequence are regarded as the misidentified related process, and the interval is not recorded;
The method comprises the steps of taking out a data sequence S i of a time sequence data interval [ N s',Ne' ], processing data according to index requirements and calculating and grading the data according to the index requirements, generating grading and reporting advice R i according to index calculation results, wherein the body integral stability index calculation method comprises the steps of traversing the correlation of data adjacent to each other before and after the whole sequence calculation, calculating the correlation based on vectors formed by included angles between limbs calculated by using three-dimensional key coordinates, accumulating each correlation value to serve as a judgment parameter, calculating a ratio of a distance between double hips and a distance between double ankles in each frame and a ratio of the distance between double ankles in each frame to serve as a judgment parameter, calculating a standard degree calculation index by using a DWT algorithm to calculate similarity of a standard action sequence and a to-be-evaluated sequence, determining a cosine value from a matrix A, a cosine value to a corresponding to a training parameter (the similarity between two corresponding positions of the three-dimensional key coordinates of the two sequences by using a matrix A, a cosine value to a corresponding score value and a training parameter, and a cosine value to be combined with the training parameter, and a cosine value is combined with the final training parameter, and a training parameter is calculated by using the correlation value and the training parameter.
The training method has the advantages that the training process is traceable through the video recording mode, repeated watching and analysis of a coach is facilitated, the coach can see own training actions through the videos, the gestures of training objects in the videos are analyzed and the action standard degree of the training objects is evaluated through the videos recorded by the depth cameras in combination with image depth information, the differences caused by subjective factors of the coach can be avoided, the training targets are locked by means of target tracking, action recognition and training space calibration, interference factors of irrelevant people on video analysis in a plurality of scenes are reduced, the locked coach can be analyzed in real time based on the target locking method, the videos are finally divided into single video segments of the respective coach, the respective evaluation reports are generated, and the time for waiting for analysis results is shortened and training efficiency is improved.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the main flow of the present invention includes:
1. the cameras are opposite to the training facilities and the first prepared trainee, so that the facilities of the three training projects are different, the placing distance of the cameras is 5-8 meters, the placing distance of the cameras is 4-6 meters, and the placing distance of the cameras is 18-20 meters;
Starting a camera after finishing the whole queue of i trainees (i is a positive integer), configuring training space calibration parameters, and limiting the most suitable and minimum training movable range;
3. The depth camera acquires video stream and point cloud information corresponding to each frame after training, and the data form continuous time sequence data;
4. as shown in fig. 2, when the training target is not locked, the camera captures each frame after:
1) Tracking all people (scholars and irrelevant people) appearing in a current picture by using a target tracking model, endowing each person with unique ID j, and obtaining a Mask which can only cover the person, wherein the Mask is a binary matrix with the same resolution as a picture acquired by a camera and is used for coding an area outside a target, and simultaneously acquiring the spatial position P j of the person according to point cloud information, wherein the spatial position is based on three-dimensional coordinates (x j,yj,zj) with the camera as an origin;
2) Selecting personnel appearing in the calibration space, coding non-personnel areas by using the Mask, identifying two-dimensional coordinates (x p',yp') of key points of each personnel by using a gesture key point identification model, wherein the key points comprise 25 points (p E [0,24 ]) including top of head, eyes, ears, nose, neck, shoulders, elbows, wrists, hips, knees, ankles, heels, small feet and thumbs, and finally the gesture of each personnel is represented by a matrix of 25 x 2;
3) And converting the 25 x 2 matrix into a vector, and judging the training stage of the vector through an action stage identification model, wherein the action stage identification model comprises a classification label, namely a preparation stage is 1, a starting stage is 2, a landing stage is 3 and an irrelevant action is 0.
4) Searching the personnel with the highest confidence from the personnel in the calibration space, wherein the personnel are judged to be in a preparation stage by the action stage identification model, namely classified as 1;
5) Recording 25 x 2 zero matrix, 25 x 3 zero matrix, and said extraneous action class 0 in said time series data if said personnel in preparation phase is not satisfied, continuing to execute said 1) to 4) in each subsequent frame;
5. as shown in fig. 4, after each frame is acquired by the camera when locked on the training target:
1) Obtaining a Mask and a spatial position P t(xt,yt,zt of the target according to the ID t, and coding the image only by using the Mask;
2) Acquiring a key point coordinate matrix of the 25 x 2 targets through the gesture key point recognition model;
3) Mapping the two-dimensional key point coordinates to point cloud information to obtain a corresponding 25 x 3 three-dimensional key point coordinate matrix;
4) Identifying the training stage of the action stage identification model through the action stage identification model;
5) The two-dimensional coordinate matrix, the three-dimensional coordinate matrix and the stage are recorded in the time sequence data;
6) If the target does not leave the calibration space, each frame continues to execute 1) to 5), if the target leaves the calibration space, tracking is not continued, the video is recorded to acquire the N e th frame and the recording ID t is canceled, meanwhile, a subtask is started to score the training process of the target, the subtask does not block other processes, and then each frame continues to search for the training target, namely the training target is executed 4;
6. the subtask step of scoring the target comprises the following steps:
1) Taking out a subsequence S of the time sequence data in the interval [ N s,Ne ];
2) As shown in fig. 5, the phase sequence in S is smoothed:
Setting a new sequence S 'for storing a stage smoothing result, utilizing a sliding window with the size of 5 to slide from beginning to end in the stage sequence, counting the number of various stages in the sliding window, such as 2 irrelevant actions, 2 initial stages and 1 jump stages in [0,0,1,1,2], removing 10 if 2 or 30 exist in the sliding window, counting the remaining four stages, changing to [1,0, 1] after removing [0,1,0,0,1], recording the most number of stages in the corresponding position in the sliding window in the third position in the middle of the sliding window, recording 1 in the S' if the most number of stages in [0,1,2,1,1] is 1, taking stages in the middle of the S in the sliding window, actually corresponding to N 3, so that the smoothing result is recorded in the N 3 position of the S ', recording a larger number of stages if two stages with the same number of 0 are removed but not performing the above-mentioned "1" if the number of stages is removed, recording the most number of stages is recorded in the corresponding position in the sliding window in the S', recording 1 if the number of [0,1,0,0,1] is removed 1 and 0 and 1 is not performed, recording the most number of stages in the sliding window is recorded in the S 'and recording the same number of stages is recorded in the final sequence of [ 621';
3) Carrying out 2) on the S 'again to obtain S';
4) Taking out the longest subsequence with the first number of 1 and the last number of 3 in the S 'according to the arrangement from small to large, taking out the subsequence [1,1,1,2,2,3,3,3] according to the rule in [0,0,1,1,1,2,2,3,3,3,0,0,0,0], and recording the N is' frame and the N ie 'frame of the actual position corresponding to the first position and the last position of the subsequence, wherein the video and the data sequence of the interval [ N is',Nie' ] are the actual fragments of the tracked training target, and if the subsequence does not have the rule, the subsequence is regarded as a related process which is recognized by mistake, and the interval is not recorded;
5) The data sequence S i of the time sequence data interval [ N s',Ne' ] is extracted, wherein the data sequence comprises the 25 x2 two-dimensional key point coordinate matrix, the 25 x 3 three-dimensional key point coordinate matrix and the stage; the method comprises the steps of processing the data and calculating the scores according to index requirements, wherein the indexes mainly comprise three aspects of body overall stability, leg stability and action standard degree, the indexes can be dynamically adjusted according to experience, scores and report suggestions R i are finally generated according to index calculation results, the body overall stability index calculation method comprises the steps of traversing correlations of adjacent data before and after the whole sequence calculation, the correlation calculation is based on vectors formed by included angles among limbs calculated by three-dimensional key point coordinates, and then accumulating each correlation value to serve as a judging parameter, the leg stability index calculation method comprises the steps of calculating a ratio of a distance between double hips and a distance between double knees in each frame and a ratio of a distance between double ankles and the distance between double knees to serve as judging parameters, the standard degree calculation method comprises the steps of calculating similarity between standard action sequences and to-be-evaluated sequences by using a DWT (Dynamic Time Warping) algorithm, a matrix A (N, M is the length of a standard action sequence and the length of to-be-evaluated sequence) of N is the cosine degree between three-dimensional key point coordinate conversion vectors calculated by three-dimensional key point coordinates of corresponding positions of the two sequences, a path is calculated from a matrix A (1, a cosine value and a cosine value) to a cosine value to be finally calculated from the corresponding value to a final value (1, a cosine value) and a cosine value to be combined with the final evaluation parameter is calculated, and the final evaluation parameter is calculated, the weight of each scoring parameter can be trained by using a history score through a machine learning algorithm, and the problem position and the problem degree corresponding to each scoring parameter value are determined by combining training experience and opinion;
7. After the training is finished, the video is divided into training fragments of the i students according to the recorded interval [ N is',Nie' ] after each target locking, and the score reports R i are associated.
In order to facilitate the overall understanding of the process of the present invention, the process of the present invention is described below as being broken down into the following more detailed steps:
1. Collecting training videos and point cloud images through a depth camera, and enabling a lens to face a training facility at a relatively short distance and queuing students to wait for training in sequence;
2. Calibrating the approximate spatial position (distance from a camera) of a trainee to be trained (a trainee to be prepared) in initial preparation and the region in a two-dimensional image, wherein the calibrated region is as small as possible but can cover the trainee to be trained in the whole training process;
3. Tracking all people appearing in front of the lens through a target tracking model, generating a unique identification ID and a Mask matrix capable of independently digging out the people from the image by the model for each person, and obtaining the positions of the people according to the point cloud image, wherein the positions are three-dimensional coordinates based on the camera as an origin;
4. and processing the image into black areas except the positions of the personnel in the calibration area through the Mask provided by the target tracking. Then, recognizing personnel in the processed image by utilizing a gesture key point recognition model, and extracting two-dimensional image coordinates of 25 key points of each personnel;
5. And identifying the vector converted by the 25 x 2 key point matrix through the action phase identification model to obtain the action phase in which the gesture is located, locking the person and recording the person ID for subsequent tracking if the identification result is the most likely preparation phase and is in the marked area, otherwise, considering the person as an irrelevant person in a non-training state. The action phase identification model is a classifier trained by a machine learning algorithm through key point data with action phase labels, the action phase label types are set to be positive integers from small to large according to the sequence of preparation, take-off and landing, and the rest irrelevant action labels are set to be 0, and the action phase identification model can be used as an identification phase and can be used as an irrelevant person for distinguishing between training personnel and non-training personnel;
6. If the target is locked, recording a N s frame when the locking is started, recording three-dimensional coordinates corresponding to the gesture key point coordinates of each frame of the target, and recording the stage of the action of each frame of the target, which is identified by the action identification model, until the target leaves a calibration area, and recording the N e frame at the moment when the target leaves;
7. if the target is not locked, recording the stage in which the target is positioned as irrelevant stage 0 in a data sequence synchronous with the video, wherein the key point data is a 0 matrix of 25 x 3;
8. When the locked training target leaves the calibration area, carrying out phase sequence smoothing on the data in the acquired data sequence interval [ N s,Ne ];
9. the step of smoothing the segment sequence is as follows:
1) Counting the stage of the current position and the stages of the first two and the last two positions (the stages of 5 positions) respectively obtain the number of classification of the preparation, the take-off and the landing, and the number of classification of the four stages is irrelevant
2) If the 5 stage species have 2 or 3 irrelevant stages, namely are classified as 0, 10 is removed, and only the stages of the remaining four positions are counted;
3) Taking the most stages as the stages of the current position;
4) Taking the stage with the larger classification label as the current stage if the statistical result is that there are two stages with the same number and the step 2) is performed, and taking the stage with the smaller classification label as the current stage if only two stages with the same number are not performed;
5) Repeating 1) to 4) until the sequence of the interval N s to N e is traversed to obtain a new equal-length sequence S'. The phases that are directly considered as a position when there are less than 5 statistical positions at the beginning or end of the sequence are irrelevant;
10. carrying out the above-mentioned phase sequence smoothing process on the S 'once again to obtain a final phase sequence S';
11. The longest subsequence which changes from small to large in sequence is found from the first appearance preparation stage to the last appearance landing stage, and the sequence interval between the two stages is orderly arranged as a training process interval [ N s',Ne' ] of the training target, if not found, the sequence of the interval [ N s,Ne ] is considered as a misrecognized non-training process;
12. And processing and calculating the three-dimensional key point coordinates, the two-dimensional key point coordinates and the stage recorded by each frame in the [ N s',Ne' ] interval according to the quantization indexes of the corresponding training items to finally obtain a scoring and evaluating report.