WO2009061283A2 - Human motion analysis system and method - Google Patents
Human motion analysis system and method Download PDFInfo
- Publication number
- WO2009061283A2 WO2009061283A2 PCT/SG2008/000428 SG2008000428W WO2009061283A2 WO 2009061283 A2 WO2009061283 A2 WO 2009061283A2 SG 2008000428 W SG2008000428 W SG 2008000428W WO 2009061283 A2 WO2009061283 A2 WO 2009061283A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- human
- motion
- posture
- candidates
- postures
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000036544 posture Effects 0.000 claims abstract description 152
- 210000000746 body region Anatomy 0.000 claims abstract description 53
- 230000002123 temporal effect Effects 0.000 claims description 21
- 238000001514 detection method Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000013500 data storage Methods 0.000 claims description 5
- 238000007670 refining Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 description 9
- 238000003860 storage Methods 0.000 description 7
- 239000003550 marker Substances 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 210000000988 bone and bone Anatomy 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- QBWCMBCROVPCKQ-UHFFFAOYSA-N chlorous acid Chemical compound OCl=O QBWCMBCROVPCKQ-UHFFFAOYSA-N 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B24/00—Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
- A63B24/0003—Analysing the course of a movement or motion sequences during an exercise or trainings sequence, e.g. swing for golf or tennis
- A63B24/0006—Computerised comparison for qualitative assessment of motion sequences or the course of a movement
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
- A61B5/1113—Local tracking of patients, e.g. in a hospital or private home
- A61B5/1114—Tracking parts of the body
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
- A61B5/1116—Determining posture transitions
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
- A61B5/1121—Determining geometric values, e.g. centre of rotation or angular range of movement
- A61B5/1122—Determining geometric values, e.g. centre of rotation or angular range of movement of movement trajectories
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
- A61B5/1126—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique
- A61B5/1128—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique using image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/162—Segmentation; Edge detection involving graph-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B24/00—Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
- A63B24/0003—Analysing the course of a movement or motion sequences during an exercise or trainings sequence, e.g. swing for golf or tennis
- A63B24/0006—Computerised comparison for qualitative assessment of motion sequences or the course of a movement
- A63B2024/0012—Comparing movements or motion sequences with a registered reference
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B2102/00—Application of clubs, bats, rackets or the like to the sporting activity ; particular sports involving the use of balls and clubs, bats, rackets, or the like
- A63B2102/32—Golf
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B2220/00—Measuring of physical parameters relating to sporting activity
- A63B2220/80—Special sensors, transducers or devices therefor
- A63B2220/806—Video cameras
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B2225/00—Miscellaneous features of sport apparatus, devices or equipment
- A63B2225/20—Miscellaneous features of sport apparatus, devices or equipment with means for remote communication, e.g. internet or the like
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B2225/00—Miscellaneous features of sport apparatus, devices or equipment
- A63B2225/50—Wireless data transmission, e.g. by radio transmitters or telemetry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present invention relates broadly to a method and system for human motion analysis.
- 2D video-based software such as V1 Pro [V1 Pro, swing analysis software, www.v1golf.com1 , MotionView [MotionView, golf swing video and motion analysis software, www.golfcoachsystems.com/qolf-swinq- software, html
- MotionCoach MotionCoach, golf swing analysis system, www. motioncoach . com]
- cSwing 2008 video swing analysis program, www.cswing.coml
- 3D motion capture systems such as Vicon [Vicon 3D motion capture system, www.vicon.com/applications/sports.html] and MAC Eagle [Motion Analysis Corporation, Eagle motion capture system, www.motionanalysis.com] capture 3D human motion by tracking reflective markers attached to the human body and computing the markers' positions in 3D. Using specialized cameras, these systems can capture 3D motion efficiently and accurately. Given the captured 3D motion, it is relatively easy for an addon algorithm to compute the motion discrepancies of the user's motion relative to domain-specific reference motion. However, they are not equipped with an intelligent software for automatic assessment of the motion discrepancies based on domain- specific assessment criteria. They are very expensive systems requiring six or more cameras to function effectively. They are also cumbersome to set up and difficult to use. These are passive marker-based systems.
- the markers are LEDs that each blink a special code that uniquely identifies the marker.
- Such systems can resolve some tracking difficulties of passive marker-based system.
- the LEDs are connected by cables which supply electricity for them to operate.
- Such a tethered system places restriction on the kind of motion that can be captured. So, it is less versatile than untethered systems.
- U.S. Patents US 4891748, US 7095388, disclose systems that capture the video of a person performing a physical skill, project the reference video of an expert scaled according to the body size of the person, and compare the motion in the videos of the person and the expert. In these systems, motion comparison is performed only in 2D videos. They are not accurate enough and may fail due to depth ambiguity in 3D motion and self-occlusions of body parts.
- Japanese Patent JP 2794018 discloses a golf swing analysis system that attaches a large number of markers onto a golfer's body and club, and captures a sequence of golf swing images using a camera. The system then computes the makers' coordinates in 2D, and compares the coordinate data with those in a selected reference data.
- US Patent Publication US 2006/0211522 discloses a system of colored markers placed on a baseball player's arms, legs, bat, pitching mat, etc. for manually facilitating the proper form of the player's body. No computerized analysis and comparison is described in the patent.
- US Patent US 5907819 discloses a golf swing analysis system that attaches motion sensors on the golfer's body. The sensors record the player's motion and send the data to a computer through connecting cables to analyze the player's motion.
- Japanese Patents JP 9-154996, JP 2001-614, and European Patent EP 1688746 describe similar systems that attach sensors to the human body.
- US Patent Publication 2002/0115046 and US Patent 6567536 disclose similar systems except that a video camera is also used to capture video information which is synchronized with the sensor data. Since the sensors are connected to the computer by cables, the motion type that can be captured is restricted. These are tethered systems, as opposed to the marker- based systems described above, which are untethered.
- US Patent US 7128675 discloses a method of analyzing a golf swing by attaching two lasers to the putter. A camera connected to a computer records the laser traces and provides feedback to the golfer regarding his putting swing. For the same reason as the methods that use motion sensors, the motion type that can be captured is restricted.
- a method for human motion analysis comprising the steps of capturing one or more 2D input videos, of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
- the method may further comprise the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
- the method may further comprise the step of visualizing said differences to a user.
- Extracting the sets of 2D body regions may comprise one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
- Determining the 3D human posture candidates may comprise the steps of generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
- Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
- Selecting the sequence of 3D human postures from the 3D human posture candidates may be based on a least cost path among the 3D human posture candidates for the respective frames.
- Selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
- a system for human motion analysis comprising the steps of means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the 2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the ' human motion in 3D.
- the system may further comprise means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
- the system may further comprise means for visualizing said differences to a user.
- the means for extracting the sets of 2D body regions may perform one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
- the means for determining the 3D human posture candidates may generate a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
- Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
- the means for selecting the sequence of 3D human postures from the 3D human posture candidates may determine a least cost path among the 3D human posture candidates for the respective frames.
- the means for selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
- a data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of.3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
- Figure 1 illustrates the block diagram of a human motion analysis system with the camera connected directly to the computer, according to an example embodiment.
- Figure 2 shows a schematic top-down view drawing of an example embodiment comprising a camera.
- Figure 3(a) illustrates the performer standing in a standard posture.
- Figure 3(b) illustrates a 3D model of the performer standing in a standard posture according to an example embodiment.
- the dots denote joints, straight lines denote bones connecting the joints, and gray scaled regions denote body parts.
- Figure 4 illustrates an example of body region extraction.
- Figure 4(a) shows an input image and
- Figure 4(b) shows the extracted body regions, according to an example embodiment.
- Figure 5 illustrates the flipping of the depth orientation of body part b in the z- direction to the new orientation denoted by a dashed line, according to an example embodiment.
- Figure 6 illustrates an example result of posture candidate estimation according to an example embodiment
- Figure 6(a) shows the input image with a posture candidate overlaid.
- Figure 6(b) shows the skeletons of the posture candidates viewed from the front. At this viewing angle, all the posture candidates overlap exactly.
- Figure 6(c) shows the skeletons of the posture candidates viewed from the side. Each candidate is shown with a different gray scale.
- Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures (dark gray scale) with the corresponding expert's postures (lighter gray scale) according to an example embodiment.
- the overlapping postures can be rotated in 3D to show different views.
- the estimated performer's postures can also be overlapped with the input images for visual verification of their correctness.
- Figure 8 illustrates an example display of colored-coded regions overlapped with an input image for quick assessment according to an example embodiment.
- the darker gray scale regions indicate large error, the lighter gray scale regions indicate moderate error, and the transparent regions indicate negligible or no error.
- Figure 9 illustrates the block diagram of a human motion analysis system with the camera and output device connected to the computer through a computer network, according to an example embodiment.
- Figure 10 illustrates the block diagram of a human motion analysis system with the wireless input and output device, such as a hand phone or Personal Digital Assistant equipped with a camera, connected to the computer through a wireless network, according to an example embodiment.
- Figure 11 shows a schematic top-down view of an example . embodiment comprising multiple cameras arranged in a straight line.
- Figure 12 shows a schematic top view of an example embodiment comprising multiple cameras placed around the performer.
- Figure 13 shows a flow chart illustrating a method for human motion detection according to an example embodiment.
- Figure 14 shows a schematic drawings of a computer system for implementing the method and system of an example embodiment.
- the described example embodiments provide a system and method for acquiring a human performer's motion in one or more 2D videos, analyzing the 2D videos, comparing the performer's motion in the 2D videos and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
- the system in example embodiments comprises one or more 2D cameras, a computer, an external storage device, and a display device. In a single camera configuration, the camera acquires the performer's motion in a 2D video and passes the 2D video to a computing device. In a multiple camera configuration, the cameras acquire the performer's motion simultaneously in multiple 2D videos and pass the 2D videos to the computing device.
- calculating, “determining”, “generating”, “initializing”, “outputting”, or the like refer to the action and processes of a computer system,, or similar electronic device, that manipulates and transforms data represented as physical quantities within the the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.
- the present specification also discloses apparatus for performing the operations of the methods.
- Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer.
- the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
- Various general purpose machines may be used with programs in accordance with the teachings herein.
- the construction of more specialized apparatus to perform the required method steps may be appropriate.
- the structure of a conventional general purpose computer will appear from the description below.
- the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in. the art that the individual steps of the method described herein may be put into effect by computer code.
- the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
- the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
- the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer.
- the computer readable medium may also include a hard-wired medium such as exemplified in the internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
- the invention may also be implemented as hardware modules. More particular, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules,.
- ASIC Application Specific Integrated Circuit
- the 3D difference can include 3D joint angle difference, 3D velocity difference, etc. depending on the requirements of the application domain. 7. Visualizing and highlighting the 3D difference in a display device.
- An example embodiment of the present invention provides a system and method for acquiring a human performer's motion in one 2D video, analyzing the 2D video, comparing the performer's motion in the 2D video and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
- FIG. 1 shows a schematic block diagram of the example embodiment of a human motion analysis system 100.
- the system 100 comprises a camera unit 102 coupled to a processing unit, here in the form of a computer 104.
- the computer 104 is further coupled to an output device 106, and an external storage device 108.
- the example embodiment comprises a stationary camera 200 with a fixed lens, which is used to acquire a 2D video m' of the performer's 202 entire motion.
- the 2D video is then analyzed and compared with a 3D reference motion M of an expert.
- the difference between the performer's 202 2D motion and the expert's 3D reference motion is computed.
- the system displays and highlights the difference in an output device 106 ( Figure 1).
- the software component implemented on the computer 104 ( Figure 1) in the example embodiment comprises the following processing stages:
- the method for Stage 1 in an example embodiment comprises a background subtraction technique described in [C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998], an iterative graph-cut segmentation technique described in [C. Rother, V. Kolmogorov, and A. Blake. Grabcut - interactive foreground extraction using iterated graph cuts. In Proceedings of ACM SIGGRAPH, 2004], and a skin detection technique described in [MJ. Jones and J.M. Rehg. Statistical color models with application to skin detection. International. Journal of Computer Vision, 46:81-96, 2002]. The contents of those references are hereby incorporated by cross references.
- Figure 4 illustrates an example result of body region extraction.
- Figure 4(a) shows an input image
- Figure 4(b) shows the extracted body region.
- the lighter gray scale region is extracted by the iterative graph-cut segmentation technique
- the darker gray scale parts are extracted using skin detection and iterative graph-cut segmentation techniques.
- the method for Stage 2 in the example embodiment comprises computing the parameters of a scaled-orthographic camera projection, which include the camera's 3D rotation angle ( ⁇ , ⁇ , ⁇ ), camera position (c , c ), and scale factor s. It is assumed that the performer's posture at the first image frame of the video is the same as a standard calibration posture (for example, Figure 3).
- the method comprises the following steps:
- 3D model of the performer Projecting a 3D model of the performer at calibration posture under the default camera parameters and render as a 2D projected body region. This step can be performed using OpenGL [OpenGL, www.opengl.org] in the example embodiment. The content of that reference is hereby incorporated by cross-reference.
- the 3D model of the performer can be provided in different forms. For example, a template 3D model may be used, that has been generated to function as a generic template for a large cross section of possible performers.
- a 3D model of an actual performer may first be generated, which will involve an additional pre-processing step for generation of the customized 3D model, as will be appreciated and is understood by a person skilled in the art.
- PCA principal component analysis
- Compute the camera position as the difference between the centers, i.e. /s and c y (p' y - p ⁇ f s. 7.
- the calibration method for stage 2 in the example embodiment thus derives the camera parameters for the particular human motion analysis system in question. It will be appreciated by a person skilled in the art that the same parameters can later be used for human motion analysis of a different performer, provided that the camera settings remain the same for the different performer. On the other hand, as mentioned above, a customized calibration using customized 3D models of an actual performer may be performed for each performer if desired, in different embodiments,.
- the method for stage S2 may comprise using other existing algorithms for the camera calibration, such as for example the "camera calibration tool box for MatLab” [www.vision.Caltech.edu/bouguetj/calib_doc/], the contents of which are hereby incorporated by cross-reference.
- the method for Stage 3 in the example embodiment comprises estimating the approximate temporal correspondence C(f) and the approximate rigid transformation T 1 that best align the posture ⁇ C( o in the 3D reference motion to the extracted body region
- each transformation T 1 at time V can be determined by finding the best match between extracted body region S' and 2D projected model body region P(T(SC(O)):
- T v a ig mmds ⁇ P ⁇ T(B c ⁇ t> ) )), S t ',) where the optimal T, is computed using a sampling technique described in
- the method of computing the optimal temporal correspondence C(t) comprises the application of dynamic programming as follows. Let d(F, C(F)) denote the difference d :
- (F, f) corresponds to the possible frame correspondence between f and t, and the correspondence cost is d(F, t).
- a path in D is a sequence of frame correspondences for F
- the least cost path is obtained by tracing back the path from D(L',L) to D(O, 0).
- the optimal C(t) is given by the least cost path.
- the method for stage 4 in the example embodiment estimates 3D posture candidates that align with the extracted body regions. That is, for each time F, find a set (S' , ⁇ of 3D posture candidates whose 2D projected model body regions
- the example embodiment uses a nonparametric implementation of the Belief Propagation (BP) technique described in [E.B. Sudderth, AT. Ihler, WT. Freeman, and A.S. Willsky. Nonparametric belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 605-612, 2003. M. Isard. Pampas: Real-valued graphical models for computer vision.
- BP Belief Propagation
- Stage 3 the temporary align posture in the 3D reference motion forms the initial estimate for each frame.
- each body part at each pose sample projects each body part at each pose sample to compute the mean image positions of its joints. Then, starting from the root body part, generate a pose sample for each body part such that the body part at the pose sample is connected to its parent body part, and the projected image positions of its joints match the computed mean positions of its joints.
- Figure 6 illustrates example posture candidates in Figures 6(b) and (c) generated from an input image in Figure 6(a).
- the skeletons of the posture candidates are viewed from the front. At this viewing angle, all the posture candidates overlap exactly, given the nature of how they have been derived explained above for the example embodiment.
- Figure 6(c) shows the different skeletons of the posture candidates viewed from the side, illustrating the differences between the respective posture candidates.
- the method for Stage 5 in the example embodiment comprises refining the estimate of temporal correspondence C(O and selecting the best posture candidates B' , that best match the corresponding reference postures B C ( t %
- the method of computing the optimal refined temporal correspondence C(P) comprises the application of dynamic programming as follows. Let d (P, t, P) denote the
- D denote a (L' + 1) * (L + 1) x N correspondence matrix, where N is the maximum number of posture candidates at any time t'.
- N is the maximum number of posture candidates at any time t'.
- Each matrix element at (P, t, I) corresponds to the possible correspondence between t ⁇ t, and /', and the correspondence cost is d(f, t, P).
- the least cost path is obtained by tracing back the path from D(L',L, 1(L)) to D(O, 0, /(O)).
- the optimal C(O and /(O are given by the least cost path.
- the method for Stage 6 in the example embodiment comprises computing the 3D difference between the selected 3D posture candidate 6W « and the corresponding 3D reference posture ⁇ C(f) at each time F.
- the 3D difference can include 3D joint angle difference, 3D joint velocity difference, etc. depending on the specific coaching requirements of the sports.
- the method for Stage 7 in the example embodiment comprises displaying and highlighting the 3D difference in , a display device.
- An example display of detailed 3D difference is illustrated in Figure 7.
- Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures e.g. 700 (dark gray scale) ' with the corresponding expert's postures e.g. 702 (lighter gray scale) according to an example embodiment.
- the overlapping postures can be rotated in 3D to show different views (compare rows 704 and 706).
- the estimated performer's postures can also. be overlapped with the input images (row 708) for visual verification of their correctness.
- Figure 8 illustrates an example display of colored-coded regions e.g. 800, 802 overlapped with an input image 804 for quick assessment according to an example embodiment.
- the darker gray scale regions e.g. 800 indicates large error
- the lighter gray scale regions e.g. 802 indicates moderate error
- the transparent regions e.g. 806 indicate negligible or no error.
- the 2D input video is first segmented into the corresponding performer's motion segments.
- the method of determining the corresponding performer's segment boundary for each reference segment boundary t comprises the following steps:
- r can be determined as follows,
- T* ⁇ k .
- the input body region is extracted with the help of colored markers.
- the appendages carried by the performer e.g., a golf club
- the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input video acquired in a previous session.
- the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input videos acquired in previous sessions that best matches the 3D reference motion of the expert.
- the camera 900 and output device 902 are connected to a computer 904 through a computer network 906, as shown in Figure 9.
- the computer 904 is coupled to. an external storage device 908 directly in this example.
- a wireless input and output device 1000 such as a hand phone or Personal Digital Assistant equipped with a camera, is connected to a computer 1002 through a wireless network 1004, as shown in Figure 10.
- the computer 1002 is coupled to an external storage device 1006 directly in this example.
- multiple cameras 1101-1103 are arranged along a straight line, as shown in Figure 11. Each camera acquires a portion of the performers 1104 entire motion when the performer 1104 passes in front of the respective camera. This embodiment also allows the system to acquire high-resolution video of a user whose body motion spans a large arena.
- multiple cameras 1201-1204 are placed around the performer 1206, as shown in Figure 12. This arrangement allows different cameras to capture the frontal view of the performer 1206 when he faces different cameras.
- the calibration method for the stage S2 processing in addition to calibration of each of the individual cameras as described above for the single camera embodiment, further comprises computing the relative positions and orientations between the cameras using an inter-relation algorithm between the cameras, as will be appreciated by a person skilled in the art.
- inter-relation algorithms are understood in the art, and will not be described in more detail herein. Reference is made for example to [R. Jain, R. -Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill 1995] and [R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000.] for example algorithms for use in such an embodiment. The contents of those references are hereby incorporated by cross-reference.
- This stage segments the human body in each image frame of the input video.
- the human body, the arms, and the background are assumed to have different colors so that they can be separated. This assumption is reasonable and easiiy satisfied, for instance, for a user who wears short-sleeved colored shirt and stands in front of a background of a different color.
- the background can be a natural scene which is nonuniform in color.
- This stage is achieved using a combination of background removal, graph-cut algorithm and skin color detection. In case the background is uniform, the segmentation algorithm can be simplified.
- This stage computes the camera's extrinsic parameters, assuming that its intrinsic parameters have already been pre-computed. This stage can be achieved using existing camera calibration algorithms.
- This stage estimates the approximate temporal correspondence between 3D reference motion and 2D input video.
- Dynamic Programming technique is used to estimate the temporal correspondence between the input video and the reference motion by matching the 2D projections of 3D postures in the reference motion with the segmented human body in the 2D input video.
- This stage also estimates the approximate global rotation and translation of the user's body relative to the 3D reference motion.
- This stage selects the best posture candidates that form smooth motion over time. It also refines the temporal correspondence estimated in Stage 2. This stage is accomplished using Dynamic Programming.
- the framework of the example embodiments can be applied to analyze various types of motion by adopting appropriate 3D reference motion. It will be appreciated by a person skilled in the art that by adapting the system and method to handle specific application domains, these stages can be refined and optimized to reduce computational costs and improve efficiency.
- Figure 13 shows a flow chart 1300 illustrating a method for human motion detection according to an example embodiment.
- one or more 2D input videos of the human motion are captured.
- sets of 2D body regions are extracted from respective frames of the 2D input videos.
- 3D human posture candidates are determined for each of the extracted sets of 2D body regions.
- a sequence of 3D human postures from the 3D human posture candidates for the respective frames is selected as representing the human motion in 3D.
- the method and system of the example embodiment can be implemented on a computer system 1400, schematically shown in Figure 14. It may be implemented as software, such as a computer program being executed within the computer system 1400, and instructing the computer system 1400 to conduct the method of the example embodiment.
- the computer system 1400 comprises a computer module 1402, input modules such as a keyboard 1404 and mouse 1406 and a plurality of output devices such as a display 1408, and printer 1410.
- the computer module 1402 is connected to a computer network 1412 via a suitable transceiver device 1414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
- LAN Local Area Network
- WAN Wide Area Network
- the computer module 1402 in the example includes a processor 1418, a Random Access Memory (RAM) 1420 and a Read Only Memory (ROM) 1422.
- the computer module 1402 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 1424 to the display 1408, and I/O interface 1426 to the keyboard 1404.
- I/O Input/Output
- the components of the computer module 1402 typically communicate via an interconnected bus 1428 and in a manner known to the person skilled in the relevant art.
- the application program is typically supplied to the user of the computer system 1400 encoded on a data storage medium such as a CD-ROM or flash memory carrier and read utilising a corresponding data storage medium drive of a data storage device 1430.
- the application program is read and controlled in its execution by the processor 1418.
- Intermediate storage of program data maybe accomplished using RAM 1420.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Dentistry (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Veterinary Medicine (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Animal Behavior & Ethology (AREA)
- Physiology (AREA)
- Surgery (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Medical Informatics (AREA)
- Heart & Thoracic Surgery (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Geometry (AREA)
- Physical Education & Sports Medicine (AREA)
- Image Analysis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
A method and system for human motion analysis. The method comprises the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
Description
Human Motion Analysis System and Method
FIELD OF INVENTION
The present invention relates broadly to a method and system for human motion analysis.
BACKGROUND
There are two general types of systems that can be used for motion analysis: 2D video- based software and 3D motion capture systems. 2D video-based software such as V1 Pro [V1 Pro, swing analysis software, www.v1golf.com1 , MotionView [MotionView, golf swing video and motion analysis software, www.golfcoachsystems.com/qolf-swinq- software, html, MotionCoach [MotionCoach, golf swing analysis system, www. motioncoach . com] , and cSwing 2008 [cSwing 2008, video swing analysis program, www.cswing.coml provide a set of tools for the user to manually assess his performance. It is affordable but lacks the intelligence to perform the assessment automatically. The assessment accuracy depends on the user's competence in using the software. Such systems perform assessment only in 2D, which is less accurate than 3D assessment. For example, accuracy may be reduced due to depth ambiguity in 3D motion and self-occlusions of body parts.
3D motion capture systems such as Vicon [Vicon 3D motion capture system, www.vicon.com/applications/sports.html] and MAC Eagle [Motion Analysis Corporation, Eagle motion capture system, www.motionanalysis.com] capture 3D human motion by tracking reflective markers attached to the human body and computing the markers' positions in 3D. Using specialized cameras, these systems can capture 3D motion efficiently and accurately. Given the captured 3D motion, it is relatively easy for an addon algorithm to compute the motion discrepancies of the user's motion relative to domain-specific reference motion. However, they are not equipped with an intelligent software for automatic assessment of the motion discrepancies based on domain-
specific assessment criteria. They are very expensive systems requiring six or more cameras to function effectively. They are also cumbersome to set up and difficult to use. These are passive marker-based systems.
There is also available an active marker-based system. In the system, the markers are LEDs that each blink a special code that uniquely identifies the marker. Such systems can resolve some tracking difficulties of passive marker-based system. However, the LEDs are connected by cables which supply electricity for them to operate. Such a tethered system places restriction on the kind of motion that can be captured. So, it is less versatile than untethered systems.
U.S. Patents US 4891748, US 7095388, disclose systems that capture the video of a person performing a physical skill, project the reference video of an expert scaled according to the body size of the person, and compare the motion in the videos of the person and the expert. In these systems, motion comparison is performed only in 2D videos. They are not accurate enough and may fail due to depth ambiguity in 3D motion and self-occlusions of body parts.
Japanese Patent JP 2794018 discloses a golf swing analysis system that attaches a large number of markers onto a golfer's body and club, and captures a sequence of golf swing images using a camera. The system then computes the makers' coordinates in 2D, and compares the coordinate data with those in a selected reference data.
US Patents US 2004/0209698 and US 7097459 disclose similar systems as JP
2794018 except that two or more cameras are used to capture multiple simultaneous image sequences. Therefore, they have the potential to compute 3D coordinates. These are essentially marker-based motion capture systems.
US Patent Publication US 2006/0211522 discloses a system of colored markers placed on a baseball player's arms, legs, bat, pitching mat, etc. for manually facilitating the proper form of the player's body. No computerized analysis and comparison is described in the patent.
US Patent US 5907819 discloses a golf swing analysis system that attaches motion sensors on the golfer's body. The sensors record the player's motion and send the data to a computer through connecting cables to analyze the player's motion.
Japanese Patents JP 9-154996, JP 2001-614, and European Patent EP 1688746 describe similar systems that attach sensors to the human body. US Patent Publication 2002/0115046 and US Patent 6567536 disclose similar systems except that a video camera is also used to capture video information which is synchronized with the sensor data. Since the sensors are connected to the computer by cables, the motion type that can be captured is restricted. These are tethered systems, as opposed to the marker- based systems described above, which are untethered.
US Patent US 7128675 discloses a method of analyzing a golf swing by attaching two lasers to the putter. A camera connected to a computer records the laser traces and provides feedback to the golfer regarding his putting swing. For the same reason as the methods that use motion sensors, the motion type that can be captured is restricted.
A need therefore exists to provide a human motion analysis system and method that seek to address at least one of the above-mentioned problems.
SUMMARY
In accordance with a first aspect of the present invention there is provided a method for human motion analysis, the method comprising the steps of capturing one or more 2D input videos, of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
The method may further comprise the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
The method may further comprise the step of visualizing said differences to a user.
Extracting the sets of 2D body regions may comprise one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
Determining the 3D human posture candidates may comprise the steps of generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
Selecting the sequence of 3D human postures from the 3D human posture candidates may be based on a least cost path among the 3D human posture candidates for the respective frames.
Selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
In accordance with a second aspect of the present invention there is provided a system for human motion analysis, the method comprising the steps of means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the 2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D
human posture candidates for the respective frames as representing the' human motion in 3D.
The system may further comprise means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
The system may further comprise means for visualizing said differences to a user.
The means for extracting the sets of 2D body regions may perform one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
The means for determining the 3D human posture candidates may generate a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
The means for selecting the sequence of 3D human postures from the 3D human posture candidates may determine a least cost path among the 3D human posture candidates for the respective frames.
The means for selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
In accordance with a third aspect of the present invention there is provided a data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of.3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:
Figure 1 illustrates the block diagram of a human motion analysis system with the camera connected directly to the computer, according to an example embodiment.
Figure 2 shows a schematic top-down view drawing of an example embodiment comprising a camera. Figure 3(a) illustrates the performer standing in a standard posture. Figure 3(b) illustrates a 3D model of the performer standing in a standard posture according to an example embodiment. The dots denote joints, straight lines denote bones connecting the joints, and gray scaled regions denote body parts.
Figure 4 illustrates an example of body region extraction. Figure 4(a) shows an input image and Figure 4(b) shows the extracted body regions, according to an example embodiment.
Figure 5 illustrates the flipping of the depth orientation of body part b in the z- direction to the new orientation denoted by a dashed line, according to an example embodiment. Figure 6 illustrates an example result of posture candidate estimation according to an example embodiment Figure 6(a) shows the input image with a posture candidate overlaid. Figure 6(b) shows the skeletons of the posture candidates viewed from the front. At this viewing angle, all the posture candidates overlap exactly. Figure 6(c) shows
the skeletons of the posture candidates viewed from the side. Each candidate is shown with a different gray scale.
Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures (dark gray scale) with the corresponding expert's postures (lighter gray scale) according to an example embodiment. The overlapping postures can be rotated in 3D to show different views. The estimated performer's postures can also be overlapped with the input images for visual verification of their correctness.
Figure 8 illustrates an example display of colored-coded regions overlapped with an input image for quick assessment according to an example embodiment. The darker gray scale regions indicate large error, the lighter gray scale regions indicate moderate error, and the transparent regions indicate negligible or no error.
Figure 9 illustrates the block diagram of a human motion analysis system with the camera and output device connected to the computer through a computer network, according to an example embodiment.
Figure 10 illustrates the block diagram of a human motion analysis system with the wireless input and output device, such as a hand phone or Personal Digital Assistant equipped with a camera, connected to the computer through a wireless network, according to an example embodiment. Figure 11 shows a schematic top-down view of an example . embodiment comprising multiple cameras arranged in a straight line.
Figure 12 shows a schematic top view of an example embodiment comprising multiple cameras placed around the performer.
Figure 13 shows a flow chart illustrating a method for human motion detection according to an example embodiment.
Figure 14 shows a schematic drawings of a computer system for implementing the method and system of an example embodiment.
DETAILED DESCRIPTION
The described example embodiments provide a system and method for acquiring a human performer's motion in one or more 2D videos, analyzing the 2D videos, comparing the performer's motion in the 2D videos and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's
motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion. The system in example embodiments comprises one or more 2D cameras, a computer, an external storage device, and a display device. In a single camera configuration, the camera acquires the performer's motion in a 2D video and passes the 2D video to a computing device. In a multiple camera configuration, the cameras acquire the performer's motion simultaneously in multiple 2D videos and pass the 2D videos to the computing device.
Some portions of the description which follows are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.
Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as
"calculating", "determining", "generating", "initializing", "outputting", or the like, refer to the action and processes of a computer system,, or similar electronic device, that manipulates and transforms data represented as physical quantities within the the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.
The present specification also discloses apparatus for performing the operations of the methods. Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose machines may be used with programs in accordance with the teachings herein. Alternatively, the construction of more specialized
apparatus to perform the required method steps may be appropriate. The structure of a conventional general purpose computer will appear from the description below.
In addition, the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in. the art that the individual steps of the method described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
Furthermore, one or more of the steps of the computer program may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer. The computer readable medium may also include a hard-wired medium such as exemplified in the internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
The computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the preferred method.
The invention may also be implemented as hardware modules. More particular, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules,.
The motion analysis and comparison is performed in the following stages in an example embodiment:
1. Extracting the performer's body regions in each image frame of the 2D videos.
2. Calibrating the parameters of the cameras.
3. Estimating the temporal correspondence and rigid transformations that best align the postures in a 3D reference motion to the body regions in the image frames.
4. Estimating the 3D posture candidates that produce the human body regions in the image frames, using the results obtained in Stage 3 as the initial estimates.
5. Selecting the 3D posture candidate that best matches the human body region in each time instant of the 2D video and refine the temporal correspondence between the 2D video and the 3D reference motion. In the case of multiple-camera configuration, the selected 3D posture candidate simultaneously best matches the human body regions in each time instant of the multiple 2D videos
6. Computing the 3D difference between the selected 3D posture candidates and the corresponding 3D reference posture. The 3D difference can include 3D joint angle difference, 3D velocity difference, etc. depending on the requirements of the application domain. 7. Visualizing and highlighting the 3D difference in a display device.
An example embodiment of the present invention provides a system and method for acquiring a human performer's motion in one 2D video, analyzing the 2D video, comparing the performer's motion in the 2D video and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
Figure 1 shows a schematic block diagram of the example embodiment of a human motion analysis system 100. The system 100 comprises a camera unit 102 coupled to a processing unit, here in the form of a computer 104. The computer 104 is further coupled to an output device 106, and an external storage device 108.
With reference to Figure 2, the example embodiment comprises a stationary camera 200 with a fixed lens, which is used to acquire a 2D video m' of the performer's 202 entire motion. The 2D video is then analyzed and compared with a 3D reference motion M of an expert. The difference between the performer's 202 2D motion and the expert's 3D reference motion is computed. The system displays and highlights the difference in an output device 106 (Figure 1).
The software component implemented on the computer 104 (Figure 1) in the example embodiment comprises the following processing stages:
1. Extracting the input body region S' in each image /', at time V of the video m'.
2. Calibrating the parameters of the camera 200. 3. Estimating the temporal correspondence C(f) between input video time f and reference time t and rigid transformations Tf that best align the posture SC(f) in the 3D reference motion to the body region SV in image /V for each time f.
4. Estimating the 3D posture candidates B\, that align with the input body regions B' , in the input images /',, using the results obtained in Stage 3 as the initial estimates.
5. Selecting the 3D posture candidate that best matches the input body region S',, for each time t', and refine the temporal correspondence C[F).
6. Computing the 3D difference between the selected 3D posture candidate β', and the corresponding 3D reference posture BC(η at each time f. 7. Visualizing and highlighting the 3D difference in the display device 106 (Figure
1).
The method for Stage 1 in an example embodiment comprises a background subtraction technique described in [C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998], an iterative graph-cut segmentation technique described in [C. Rother, V. Kolmogorov, and A. Blake. Grabcut - interactive foreground extraction using iterated graph cuts. In Proceedings of ACM SIGGRAPH, 2004], and a skin detection technique described in [MJ. Jones and J.M. Rehg. Statistical color models with application to skin detection. International. Journal of Computer Vision, 46:81-96, 2002]. The contents of those references are hereby incorporated by cross references. In different example embodiments, for videos with simple background, background subtraction technique is sufficient. For videos with complex background, iterative graph-cut and skin detection techniques should be used. Figure 4 illustrates an example result of body region extraction. Figure 4(a) shows an input image and Figure 4(b) shows the extracted body region. The lighter gray scale region is extracted by the iterative graph-cut segmentation technique, and the darker gray scale parts are extracted using skin detection and iterative graph-cut segmentation techniques.
The method for Stage 2 in the example embodiment comprises computing the parameters of a scaled-orthographic camera projection, which include the camera's 3D rotation angle (θ , θ , θ ), camera position (c , c ), and scale factor s. It is assumed that the performer's posture at the first image frame of the video is the same as a standard calibration posture (for example, Figure 3). The method comprises the following steps:
1. Setting the camera parameters to default values: θ = θ = θ = 0, c = c = 0, s a r x y z x y
=1
2. Projecting a 3D model of the performer at calibration posture under the default camera parameters and render as a 2D projected body region. This step can be performed using OpenGL [OpenGL, www.opengl.org] in the example embodiment. The content of that reference is hereby incorporated by cross-reference. It is noted that in different example embodiments, the 3D model of the performer can be provided in different forms. For example, a template 3D model may be used, that has been generated to function as a generic template for a large cross section of possible performers. In another embodiment a 3D model of an actual performer may first be generated, which will involve an additional pre-processing step for generation of the customized 3D model, as will be appreciated and is understood by a person skilled in the art.
3. Computing the principal direction and the principal length h of the 2D projected model body region by applying principal component analysis (PCA) on the pixel positions in the projected model body region. The principal direction is the first eigenvector computed by PCA, and the principal length is the maximum length of the model body region along the principal direction.
4. Computing the principal direction and the principal length h' of the extracted captured body region in the first image frame of the video in a similar way.
5. Computing the camera scale s = h' I h.
6. Computing the camera position (c , c ).
Compute the center (p'χ , p' ) of the extracted body region and the center (p , p ) of the 2D projected model body region. Compute the camera position as the difference between the centers, i.e. /s and cy = (p'y - p} f s.
7. Computing the camera rotation angle θ about Z-axis as the angular difference between the principal directions of the extracted body region and the 2D projected model body region. Camera rotation angles θ and θ are omitted.
The calibration method for stage 2 in the example embodiment thus derives the camera parameters for the particular human motion analysis system in question. It will be appreciated by a person skilled in the art that the same parameters can later be used for human motion analysis of a different performer, provided that the camera settings remain the same for the different performer. On the other hand, as mentioned above, a customized calibration using customized 3D models of an actual performer may be performed for each performer if desired, in different embodiments,.
It is noted that in different embodiments, the method for stage S2 may comprise using other existing algorithms for the camera calibration, such as for example the "camera calibration tool box for MatLab" [www.vision.Caltech.edu/bouguetj/calib_doc/], the contents of which are hereby incorporated by cross-reference.
The method for Stage 3 in the example embodiment comprises estimating the approximate temporal correspondence C(f) and the approximate rigid transformation T1 that best align the posture βC(o in the 3D reference motion to the extracted body region
Sr in image /r for each time t' = 0 L', where U +1 is the length of the video sequence. The length of the 3D reference motion is L+1 , for t = 0, ..., L The estimation is subjected to a temporal order constraint: for any two temporally ordered postures in the performer's motion, the two corresponding postures in the reference motion have the same temporal order. That is, for any ^1 and I2, such that ^1 < £2, C[U) < C[V2)-
Given a particular C, each transformation T1 at time V can be determined by finding the best match between extracted body region S' and 2D projected model body region P(T(SC(O)):
Tv = aig mmds{P{T(Bc{t>))), St',) where the optimal T, is computed using a sampling technique described in
[Sampling methods, www.statpac.com/surveys/sampling/htm]. The content' of that reference is hereby incorporated by cross reference.
The method for computing the difference d s (vS, S) between two image regions S and S' comprises computing two parts: ds(S, S') = λAdA(A, A') + λEdE(E, E) where d is the amount of overlap between the set A of pixels in the silhouette of the 2D projected model body region and the set and A' of pixels in the silhouette of the extracted body region in the video image, d is the Chamfer distance described in [M.A.
Butt and P. Maragos, Optimum design of chamfer distance transforms, IEEE
Transactions on Image Processing, 7(10), 1998, 1477-1484] between the set E of edges in the 2D projected model body region and the set E of edges in the extracted body region, and λ and λ are constant parameters. The content of that reference is hereby incorporated by cross-reference..
The method of computing the optimal temporal correspondence C(t) comprises the application of dynamic programming as follows. Let d(F, C(F)) denote the difference d :
S d(t, Ctf)) = ds(P(Tt,(Bcii>))), Sl,)
Let D denote a (L' + 1) * (L + 1) correspondence matrix. Each matrix element at
(F, f) corresponds to the possible frame correspondence between f and t, and the correspondence cost is d(F, t). A path in D is a sequence of frame correspondences for F
= 0 L' such that each F has a unique corresponding t = C(t). It is assumed that C(O) = 0 and C(L) = L Let D(F, t) denote the least cost from the frame pair (0, 0) up to (F, f) on the least cost path, and D(O, 0) = c/(0, 0). Then, the optimal solution given by D(U1L) can be recursively computed using dynamic programming as follows:
W
D(t', t) = d(t', t) + mm D(t' - l, t - l - i)
?i=o
Once D(L',L) is computed, the least cost path is obtained by tracing back the path from D(L',L) to D(O, 0). The optimal C(t) is given by the least cost path.
The method for stage 4 in the example embodiment estimates 3D posture candidates that align with the extracted body regions. That is, for each time F, find a set (S', } of 3D posture candidates whose 2D projected model body regions
P(Aj.'|'(Tt-|'(B'f|.))) match the extracted body region B'r in the input images l'r . The computation of the 3D posture candidates is subjected to the joint angle limit constraint The valid joint rotation of each body part is limited to physically possible ranges.
The example embodiment uses a nonparametric implementation of the Belief Propagation (BP) technique described in [E.B. Sudderth, AT. Ihler, WT. Freeman, and A.S. Willsky. Nonparametric belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 605-612, 2003. M. Isard. Pampas: Real-valued graphical models for computer vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 613-620, 2003. G. Hua and Y. Wu. Multi-scale visual tracking by sequential belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 826-833, 2004. E.B. Sudderth, M.I. Mandel, WT. Freeman, and A.S. Willsky. Visual hand tracking using nonparametric belief propagation. In IEEE CVPR Workshop on Generative Model based Vision, 2004.]. The contents of those references are hereby incorporated by cross reference.
It comprises the following steps:
1. Run the nonparametric BP algorithm to generate pose samples for each body part using the results in Stage 3 as the initial estimates. That is, based on the results in
Stage 3, the temporary align posture in the 3D reference motion forms the initial estimate for each frame.
2. Determine a best matching pose for each body part.
• If the pose samples of each body part converge to a single state, choose any pose sample as the best pose for this body part.
• If the pose samples of each body part do not converge to a single state, project each body part at each pose sample to compute the mean image positions of its joints. Then, starting from the root body part, generate a pose sample for each body part such that the body part at the pose sample is connected to its parent body part, and the projected image positions of its joints match the computed mean positions of its joints.
3. Generate the first posture candidate. For each body parts starting from the root body part, modify the depth orientation of the best pose sample such that it has the same depth orientation as that in the corresponding reference posture. All the pose samples are combined into a posture candidate by translating the depth coordinate in each sample, if necessary, such that the neighboring body parts are connected.
4. Generate new 3D posture candidates. Starting from the first 3D posture candidate, flip the depth orientation of n body parts about their parent joints, starting with /7 = 1, while keeping the body parts connected at the joints. Figure 5 illustrates flipping of body part b from a position /c' to k, around a parent joint at/
5. The above step is repeated for n = 1 , 2, . . . , until N posture candidates are generated.
Figure 6 illustrates example posture candidates in Figures 6(b) and (c) generated from an input image in Figure 6(a). In Figure 6(b) the skeletons of the posture candidates are viewed from the front. At this viewing angle, all the posture candidates overlap exactly, given the nature of how they have been derived explained above for the example embodiment. Figure 6(c) shows the different skeletons of the posture candidates viewed from the side, illustrating the differences between the respective posture candidates. The method for Stage 5 in the example embodiment comprises refining the estimate of temporal correspondence C(O and selecting the best posture candidates B' , that best match the corresponding reference postures BC(t%
1 2
The refinement is subjected to temporal ordering constraint: for any t' and t , such that f < ta, C(P1) < C(P2), and a constraint of small rate of change of posture errors: for each f, Aεr/ Af = (εf - εf-At) I Af is small.
The method of computing the optimal refined temporal correspondence C(P) comprises the application of dynamic programming as follows. Let d (P, t, P) denote the
3D posture difference between the posture candidate S^y and the reference posture Bt which is measured as the mean difference between the orientations of the bones in the postures. Let ds(t\ t, s, I1, k') denote the change of posture difference between the corresponding pairs (S V;-, Bf) and (6V-i,/c', βs).
Let D denote a (L' + 1) * (L + 1) x N correspondence matrix, where N is the maximum number of posture candidates at any time t'. Each matrix element at (P, t, I) corresponds to the possible correspondence between t\ t, and /', and the correspondence cost is d(f, t, P). A path in D is a sequence of correspondences for t' = 0, . . . ,L'such that each t' has a unique corresponding t = C{t) and a unique corresponding posture candidate /' = /(P). It is assumed that C(O) = 0 and C(L) = L Let D(t\ t, P) denote the least cost from the triplet (0, 0, l'o) up to (t\ t, P) on the least cost path, and D(O, 0, /' )
= d (0, 0, /'o). Then, the optimal solution given by D{L\L, 1(L1)) can be recursively computed using dynamic programming as follows:
D(tl, t,l(t')) = τmn D(t', tJ')
£(t') = argmmD{t', t, ϊ)
L vyhere
D(t',t, l>) = dc(i\ t, l') + min{D{if - l,t - l - i, k') + dB{t'Λ, t - l - i, l\ k!)}
Once D{L',L, /(L)) is computed, the least cost path is obtained by tracing back the path from D(L',L, 1(L)) to D(O, 0, /(O)). The optimal C(O and /(O are given by the least cost path.
The method for Stage 6 in the example embodiment comprises computing the 3D difference between the selected 3D posture candidate 6W« and the corresponding 3D reference posture βC(f) at each time F. The 3D difference can include 3D joint angle difference, 3D joint velocity difference, etc. depending on the specific coaching requirements of the sports.
The method for Stage 7 in the example embodiment comprises displaying and highlighting the 3D difference in, a display device. An example display of detailed 3D difference is illustrated in Figure 7. Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures e.g. 700 (dark gray scale)' with the corresponding expert's postures e.g. 702 (lighter gray scale) according to an example embodiment. The overlapping postures can be rotated in 3D to show different views (compare rows 704 and 706). The estimated performer's postures can also. be overlapped with the input images (row 708) for visual verification of their correctness.
An example display of color-coded errors for quick assessment is illustrated in Figure 8. Figure 8 illustrates an example display of colored-coded regions e.g. 800, 802 overlapped with an input image 804 for quick assessment according to an example embodiment. The darker gray scale regions e.g. 800 indicates large error, the lighter gray scale regions e.g. 802 indicates moderate error, and the transparent regions e.g. 806 indicate negligible or no error.
In another embodiment where the 3D reference motion contains multiple predefined motion segments, such as Taichi motion, the 2D input video is first segmented
into the corresponding performer's motion segments. The method of determining the corresponding performer's segment boundary for each reference segment boundary t, comprises the following steps:
1. Determine initial estimate of the performer's motion segment boundary t ' by C(O = t
2. Obtain a temporal window [f - ω, t' + ω], where ω is the window size.
3. Find one or more smooth sequences of posture candidates in the temporal window.
• Correct posture candidates should change smoothly over time. Suppose B' ; and S' , are correct posture candidates, then the 3D posture difference between them
Cf5(Br I>, β'r +i,/t'), which is measured as the mean difference between the orientations of the bones in the postures, is small for any Te [V - ω, t' ÷ ω].
• Choose a posture candidate for each r e [f - ω, f + ω] to obtain a sequence of posture candidates that satisfy the condition that dB(B'τy, B'T^ιk) is small for each r. 4. Find candidate segment boundaries.
• For each smooth sequence of posture candidates, find the candidate segment boundary r e [f - ω, t' + ω] and the corresponding posture candidate at r that satisfies the segment boundary condition: At a segment boundary, there are large changes of motion directions for some joints. • Denote a candidate segment boundary found above as r. and the corresponding posture candidate as B'..
5. Identify the optimal segment boundary r .
The posture candidate at the optimal segment boundary r* should be the most similar to the corresponding reference posture Bt. Therefore, r can be determined as follows,
T* = τk .
In another example embodiment, the input body region is extracted with the help of colored markers.
In another example embodiment, the appendages carried by the performer, e.g., a golf club, is also segmented.
In another example embodiment, the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input video acquired in a previous session.
In another example embodiment, the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input videos acquired in previous sessions that best matches the 3D reference motion of the expert.
In another example embodiment, the camera 900 and output device 902 are connected to a computer 904 through a computer network 906, as shown in Figure 9. The computer 904 is coupled to. an external storage device 908 directly in this example.
In another example embodiment, a wireless input and output device 1000, such as a hand phone or Personal Digital Assistant equipped with a camera, is connected to a computer 1002 through a wireless network 1004, as shown in Figure 10. The computer 1002 is coupled to an external storage device 1006 directly in this example.
In another example embodiment, multiple cameras 1101-1103 are arranged along a straight line, as shown in Figure 11. Each camera acquires a portion of the performers 1104 entire motion when the performer 1104 passes in front of the respective camera. This embodiment also allows the system to acquire high-resolution video of a user whose body motion spans a large arena.
In another example embodiment, multiple cameras 1201-1204 are placed around the performer 1206, as shown in Figure 12. This arrangement allows different cameras to capture the frontal view of the performer 1206 when he faces different cameras.
In another example embodiment, the arrangements of the cameras discussed above are combined.
In the multi-camera configurations in different example embodiments, for example those shown in Figures 11 and 12, the calibration method for the stage S2 processing, in addition to calibration of each of the individual cameras as described
above for the single camera embodiment, further comprises computing the relative positions and orientations between the cameras using an inter-relation algorithm between the cameras, as will be appreciated by a person skilled in the art. Such inter-relation algorithms are understood in the art, and will not be described in more detail herein. Reference is made for example to [R. Jain, R. -Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill 1995] and [R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000.] for example algorithms for use in such an embodiment. The contents of those references are hereby incorporated by cross-reference.
Example embodiments of the method and system for human motion analysis can have the following framework of stages:
1. Input Video Segmentation
This stage segments the human body in each image frame of the input video. The human body, the arms, and the background are assumed to have different colors so that they can be separated. This assumption is reasonable and easiiy satisfied, for instance, for a user who wears short-sleeved colored shirt and stands in front of a background of a different color. The background can be a natural scene which is nonuniform in color. This stage is achieved using a combination of background removal, graph-cut algorithm and skin color detection. In case the background is uniform, the segmentation algorithm can be simplified.
2. Camera Calibration
This stage computes the camera's extrinsic parameters, assuming that its intrinsic parameters have already been pre-computed. This stage can be achieved using existing camera calibration algorithms.
3. Estimation of Approximate Temporal Correspondence
This stage estimates the approximate temporal correspondence between 3D reference motion and 2D input video. Dynamic Programming technique is used to estimate the temporal correspondence between the input video and the reference motion by matching the 2D projections of 3D postures in the reference motion with the segmented human body in the 2D input video. This stage also estimates the approximate global rotation and translation of the user's body relative to the 3D reference motion.
4. Estimation of Posture Candidates
This stage estimates, for each 2D input video frame, a set of 3D posture candidates that can produce 2D projections that are the same as that in the input video frame. This is performed using an improved version of Belief Propagation method. In a single-camera system, these sets typically, have more than one posture candidate each due to depth ambiguity and occlusion. In a multiple-camera system, the number of posture candidates may be reduced.
5. Selection of best posture candidates
This stage selects the best posture candidates that form smooth motion over time. It also refines the temporal correspondence estimated in Stage 2. This stage is accomplished using Dynamic Programming.
The framework of the example embodiments can be applied to analyze various types of motion by adopting appropriate 3D reference motion. It will be appreciated by a person skilled in the art that by adapting the system and method to handle specific application domains, these stages can be refined and optimized to reduce computational costs and improve efficiency.
Figure 13 shows a flow chart 1300 illustrating a method for human motion detection according to an example embodiment. At step 1302, one or more 2D input videos of the human motion are captured. At step 1304, sets of 2D body regions are extracted from respective frames of the 2D input videos. At step 1306, 3D human posture candidates are determined for each of the extracted sets of 2D body regions. At step 1308, a sequence of 3D human postures from the 3D human posture candidates for the respective frames is selected as representing the human motion in 3D.
The method and system of the example embodiment can be implemented on a computer system 1400, schematically shown in Figure 14. It may be implemented as software, such as a computer program being executed within the computer system 1400, and instructing the computer system 1400 to conduct the method of the example embodiment.
The computer system 1400 comprises a computer module 1402, input modules such as a keyboard 1404 and mouse 1406 and a plurality of output devices such as a display 1408, and printer 1410.
The computer module 1402 is connected to a computer network 1412 via a suitable transceiver device 1414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
The computer module 1402 in the example includes a processor 1418, a Random Access Memory (RAM) 1420 and a Read Only Memory (ROM) 1422. The computer module 1402 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 1424 to the display 1408, and I/O interface 1426 to the keyboard 1404.
The components of the computer module 1402 typically communicate via an interconnected bus 1428 and in a manner known to the person skilled in the relevant art.
The application program is typically supplied to the user of the computer system 1400 encoded on a data storage medium such as a CD-ROM or flash memory carrier and read utilising a corresponding data storage medium drive of a data storage device 1430. The application program is read and controlled in its execution by the processor 1418. Intermediate storage of program data maybe accomplished using RAM 1420.
It will be appreciated by a person skilled in trie art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.
Claims
1. A method for human motion analysis, the method comprising the steps of: capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
2. The method as claimed in claim 1 , further comprising the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
3. The method as claimed in claim 2, further comprising the step of visualizing said differences to a user.
4. The method as claimed in any one of the preceding claims, wherein extracting the sets of 2D body regions comprises one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
5. The method as claimed in any one of the preceding claims, wherein determining the 3D human posture candidates comprises the steps of: generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
6. The method as claimed in claim 5, wherein generating the first 3D human posture candidate comprises temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and ■ adjusting the 3D reference data to match the 2D body portions.
7. The method as claimed in any one of the preceding claims, wherein selecting the sequence of 3D human postures from the 3D human posture candidates is based on a least cost path among the 3D human posture candidates for the respective frames.
8. The method as claimed in claim 7, wherein selecting the sequence of 3D human postures from the 3D human posture candidates further comprises refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
9. A system for human motion analysis, the method comprising the steps of: means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the
2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
10. The system as claimed in claim 9, further comprising means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
11. The system as claimed in claim 10, further comprising means for visualizing said differences to a user.
12. The system as claimed in any one of claims 9 to 11 , wherein the means for extracting the sets of 2D body regions performs one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
13. The system as claimed in any one of claims 9 to 12, wherein the means for determining the 3D human posture candidates generates a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
14. The system as claimed in claim 13, wherein generating the first 3D human posture candidate comprises temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and - adjusting the 3D reference data to match the 2D body portions.
15. The system as claimed in any one of claims 9 to 14, wherein the means for selecting the sequence of 3D human postures from the 3D human posture candidates determines a least cost path among the 3D human posture candidates for the respective frames.
16. The system as claimed in claim 15, wherein the means for selecting the sequence of 3D human postures from the 3D human posture candidates further comprises means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
17. A data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of: capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US262707P | 2007-11-09 | 2007-11-09 | |
US61/002,627 | 2007-11-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2009061283A2 true WO2009061283A2 (en) | 2009-05-14 |
WO2009061283A3 WO2009061283A3 (en) | 2009-07-09 |
Family
ID=40626373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2008/000428 WO2009061283A2 (en) | 2007-11-09 | 2008-11-07 | Human motion analysis system and method |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2009061283A2 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8944939B2 (en) | 2012-02-07 | 2015-02-03 | University of Pittsburgh—of the Commonwealth System of Higher Education | Inertial measurement of sports motion |
CN105664462A (en) * | 2016-01-07 | 2016-06-15 | 北京邮电大学 | Auxiliary training system based on human body posture estimation algorithm |
CN109716354A (en) * | 2016-10-12 | 2019-05-03 | 英特尔公司 | The complexity of human interaction object identification reduces |
US10398359B2 (en) | 2015-07-13 | 2019-09-03 | BioMetrix LLC | Movement analysis system, wearable movement tracking sensors, and associated methods |
JP2021071953A (en) * | 2019-10-31 | 2021-05-06 | 株式会社ライゾマティクス | Recognition processor, recognition processing program, recognition processing method, and visualization system |
CN112998693A (en) * | 2021-02-01 | 2021-06-22 | 上海联影医疗科技股份有限公司 | Head movement measuring method, device and equipment |
EP3933669A1 (en) * | 2020-06-29 | 2022-01-05 | KS Electronics Co., Ltd. | Posture comparison and correction method using application configured to check two golf images and result data in overlapping state |
CN114037729A (en) * | 2021-11-26 | 2022-02-11 | 天津天瞳威势电子科技有限公司 | Target tracking method, device and equipment and vehicle |
EP3911423A4 (en) * | 2019-01-15 | 2022-10-26 | Shane Yang | Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning |
GB2608576A (en) * | 2021-01-07 | 2023-01-11 | Wizhero Ltd | Exercise performance system |
EP4083926A4 (en) * | 2019-12-27 | 2023-07-05 | Sony Group Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM |
US11804076B2 (en) | 2019-10-02 | 2023-10-31 | University Of Iowa Research Foundation | System and method for the autonomous identification of physical abuse |
US20240168563A1 (en) * | 2009-01-29 | 2024-05-23 | Sony Group Corporation | Information processing device and method, program and recording medium for identifying a gesture of a person from captured image data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5111410A (en) * | 1989-06-23 | 1992-05-05 | Kabushiki Kaisha Oh-Yoh Keisoku Kenkyusho | Motion analyzing/advising system |
US5886788A (en) * | 1996-02-09 | 1999-03-23 | Sony Corporation | Apparatus and method for detecting a posture |
US6124862A (en) * | 1997-06-13 | 2000-09-26 | Anivision, Inc. | Method and apparatus for generating virtual views of sporting events |
US6256418B1 (en) * | 1998-04-13 | 2001-07-03 | Compaq Computer Corporation | Method and system for compressing a sequence of images including a moving figure |
WO2006117374A2 (en) * | 2005-05-03 | 2006-11-09 | France Telecom | Method for three-dimensionally reconstructing an articulated member or a set of articulated members |
-
2008
- 2008-11-07 WO PCT/SG2008/000428 patent/WO2009061283A2/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5111410A (en) * | 1989-06-23 | 1992-05-05 | Kabushiki Kaisha Oh-Yoh Keisoku Kenkyusho | Motion analyzing/advising system |
US5886788A (en) * | 1996-02-09 | 1999-03-23 | Sony Corporation | Apparatus and method for detecting a posture |
US6124862A (en) * | 1997-06-13 | 2000-09-26 | Anivision, Inc. | Method and apparatus for generating virtual views of sporting events |
US6256418B1 (en) * | 1998-04-13 | 2001-07-03 | Compaq Computer Corporation | Method and system for compressing a sequence of images including a moving figure |
WO2006117374A2 (en) * | 2005-05-03 | 2006-11-09 | France Telecom | Method for three-dimensionally reconstructing an articulated member or a set of articulated members |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240168563A1 (en) * | 2009-01-29 | 2024-05-23 | Sony Group Corporation | Information processing device and method, program and recording medium for identifying a gesture of a person from captured image data |
US8944939B2 (en) | 2012-02-07 | 2015-02-03 | University of Pittsburgh—of the Commonwealth System of Higher Education | Inertial measurement of sports motion |
US9851374B2 (en) | 2012-02-07 | 2017-12-26 | University of Pittsburgh—of the Commonwealth System of Higher Education | Inertial measurement of sports motion |
US10398359B2 (en) | 2015-07-13 | 2019-09-03 | BioMetrix LLC | Movement analysis system, wearable movement tracking sensors, and associated methods |
CN105664462A (en) * | 2016-01-07 | 2016-06-15 | 北京邮电大学 | Auxiliary training system based on human body posture estimation algorithm |
CN109716354A (en) * | 2016-10-12 | 2019-05-03 | 英特尔公司 | The complexity of human interaction object identification reduces |
CN109716354B (en) * | 2016-10-12 | 2024-01-09 | 英特尔公司 | Complexity reduction for human interactive recognition |
US11638853B2 (en) | 2019-01-15 | 2023-05-02 | Live View Sports, Inc. | Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning |
EP3911423A4 (en) * | 2019-01-15 | 2022-10-26 | Shane Yang | Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning |
US11804076B2 (en) | 2019-10-02 | 2023-10-31 | University Of Iowa Research Foundation | System and method for the autonomous identification of physical abuse |
WO2021085453A1 (en) * | 2019-10-31 | 2021-05-06 | 株式会社ライゾマティクス | Recognition processing device, recognition processing program, recognition processing method, and visualizer system |
JP2021071953A (en) * | 2019-10-31 | 2021-05-06 | 株式会社ライゾマティクス | Recognition processor, recognition processing program, recognition processing method, and visualization system |
JP7281767B2 (en) | 2019-10-31 | 2023-05-26 | 株式会社アブストラクトエンジン | Recognition processing device, recognition processing program, recognition processing method, and visualization system |
US12067677B2 (en) | 2019-12-27 | 2024-08-20 | Sony Group Corporation | Information processing apparatus, information processing method, and computer-readable storage medium |
EP4083926A4 (en) * | 2019-12-27 | 2023-07-05 | Sony Group Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM |
EP3933669A1 (en) * | 2020-06-29 | 2022-01-05 | KS Electronics Co., Ltd. | Posture comparison and correction method using application configured to check two golf images and result data in overlapping state |
CN113926172A (en) * | 2020-06-29 | 2022-01-14 | 韩标电子 | Posture comparison and correction method using application program configured to check two golf images and result data in overlapped state |
GB2608576A (en) * | 2021-01-07 | 2023-01-11 | Wizhero Ltd | Exercise performance system |
CN112998693B (en) * | 2021-02-01 | 2023-06-20 | 上海联影医疗科技股份有限公司 | Head movement measuring method, device and equipment |
CN112998693A (en) * | 2021-02-01 | 2021-06-22 | 上海联影医疗科技股份有限公司 | Head movement measuring method, device and equipment |
CN114037729A (en) * | 2021-11-26 | 2022-02-11 | 天津天瞳威势电子科技有限公司 | Target tracking method, device and equipment and vehicle |
Also Published As
Publication number | Publication date |
---|---|
WO2009061283A3 (en) | 2009-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009061283A2 (en) | Human motion analysis system and method | |
Memo et al. | Head-mounted gesture controlled interface for human-computer interaction | |
US9898651B2 (en) | Upper-body skeleton extraction from depth maps | |
US9235753B2 (en) | Extraction of skeletons from 3D maps | |
EP2707834B1 (en) | Silhouette-based pose estimation | |
US8755569B2 (en) | Methods for recognizing pose and action of articulated objects with collection of planes in motion | |
Van der Aa et al. | Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction | |
US20100208038A1 (en) | Method and system for gesture recognition | |
CN101158883A (en) | A virtual sports system based on computer vision and its implementation method | |
WO2014139079A1 (en) | A method and system for three-dimensional imaging | |
JP6515039B2 (en) | Program, apparatus and method for calculating a normal vector of a planar object to be reflected in a continuous captured image | |
CN111488775A (en) | Apparatus and method for determining gaze degree | |
JP2000251078A (en) | Method and device for estimating three-dimensional posture of person, and method and device for estimating position of elbow of person | |
Gurbuz et al. | Model free head pose estimation using stereovision | |
CN109448105A (en) | Three-dimensional human skeleton generation method and system based on more depth image sensors | |
CN106504283A (en) | Information broadcasting method, apparatus and system | |
Zou et al. | Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking | |
Hori et al. | Silhouette-based 3d human pose estimation using a single wrist-mounted 360 camera | |
US12249015B2 (en) | Joint rotation inferences based on inverse kinematics | |
He | Generation of human body models | |
Zhu et al. | Kinematic motion analysis with volumetric motion capture | |
US8948461B1 (en) | Method and system for estimating the three dimensional position of an object in a three dimensional physical space | |
El-Sallam et al. | Towards a Fully Automatic Markerless Motion Analysis System for the Estimation of Body Joint Kinematics with Application to Sport Analysis. | |
Marcialis et al. | A novel method for head pose estimation based on the “Vitruvian Man” | |
CN115205744A (en) | Intelligent exercise assisting method and device for figure skating |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08847494 Country of ref document: EP Kind code of ref document: A2 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08847494 Country of ref document: EP Kind code of ref document: A2 |