US20230316552A1 - Repairing image depth values for an object with a light absorbing surface - Google Patents
Repairing image depth values for an object with a light absorbing surface Download PDFInfo
- Publication number
- US20230316552A1 US20230316552A1 US17/713,038 US202217713038A US2023316552A1 US 20230316552 A1 US20230316552 A1 US 20230316552A1 US 202217713038 A US202217713038 A US 202217713038A US 2023316552 A1 US2023316552 A1 US 2023316552A1
- Authority
- US
- United States
- Prior art keywords
- depth
- image
- dimensional
- color
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000013528 artificial neural network Methods 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 7
- 230000008439 repair process Effects 0.000 abstract description 29
- 239000000463 material Substances 0.000 abstract description 9
- 230000009466 transformation Effects 0.000 description 8
- 238000011960 computer-aided design Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000000844 transformation Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 230000010363 phase shift Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
- G01S17/894—3D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
- G06V20/647—Three-dimensional objects by matching two-dimensional images to three-dimensional objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/45—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
-
- H04N5/2258—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/08—Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2004—Aligning objects, relative positioning of parts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2016—Rotation, translation, scaling
Definitions
- a depth image may be used to display or reconstruct a three-dimensional environment.
- Some image capture devices use infra-red (IR) technology or other light-based technology to determine depth in a scene and create a depth image (e.g., a depth map).
- IR infra-red
- a camera may use a time-of-flight depth sensor (e.g., an array of time-of-flight pixels) to illuminate a scene with light (e.g., an IR pattern) emitted from an artificial light source and to detect light that is reflected.
- the phase shift between the emitted light and the reflected light is measured and depth information for various pixels in a depth image can be determined based on the phase shift.
- the time-of-flight depth sensor may experience issues with respect to accurately determining depth information for a scene.
- an object in the scene may include a surface that is made of material that absorbs the emitted light (e.g., the IR pattern) so the time-of-flight depth sensor cannot clearly detect, or see, the light that is reflected.
- This lack of detection and/or visibility translates to missing or corrupted depth values in the depth image.
- a head-mounted device e.g., augmented reality device, mixed reality device, etc.
- the techniques disclosed herein enable a system to detect and track the three-dimensional pose of an object (e.g., a head-mounted display device) in a color image using an accessible three-dimensional model of the object.
- the system uses the three-dimensional pose of the object to repair pixel depth values associated with a region (e.g., a surface) of the object that is composed of material that absorbs light emitted by a time-of-flight depth sensor to determine depth. Consequently, a color-depth image can be produced that does not include dark holes on and around the region of the object that is composed of material that absorbs light emitted by the time-of-flight depth sensor.
- the system is configured to obtain image data for a scene that was captured by an image capture device (e.g., a camera).
- the image data may include a sequence of frames that comprise a video (e.g., of a user wearing a head-mounted display device).
- the image capture device includes a color (e.g., Red-Green-Blue or RGB) sensor and a time-of-flight depth sensor, and thus, each frame includes a color image in a color coordinate space and a corresponding depth image in a depth coordinate space.
- RGB Red-Green-Blue
- both coordinate spaces are right handed coordinate systems (e.g., X, Y, Z) with Z pointed out (e.g., towards a camera lens) and Y pointed up, but the coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences.
- the time-of-flight depth sensor cannot clearly detect, or see, the light that is reflected. This lack of detection and/or visibility translates to missing or corrupted depth values in the depth image. That is, the depth image is likely to include dark holes on and around the region that absorbs the light emitted by the time-of-flight depth sensor.
- the system is configured to detect (e.g., recognize) an object in the color image that is known to include a region (e.g., a surface) that absorbs the light emitted by the time-of-flight depth sensor. Once detected, the system predicts a set of two-dimensional points on the object in the color image that correspond to three-dimensional points that are predefined in an accessible three-dimensional model of the object.
- the three-dimensional points on the three-dimensional model of the object may alternatively be referred to as key points or landmarks
- they may be points associated with important/distinctive corners and edges of the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- a first neural network or other form of artificial intelligence can be used to detect the object.
- a Deep Neural Network (DNN) model may be trained using thousands or even millions of color image frames that are each individually annotated to indicate the shape, position, and/or orientation of an object known to cause problems with respect to depth value determination.
- a second neural network or other form of artificial intelligence can be used to predict the two-dimensional points.
- Example three-dimensional models may be readily generated by use of computer-aided design (CAD) software programs, and thus, the three-dimensional model may be a three-dimensional CAD “mesh” model.
- CAD computer-aided design
- the system is configured to apply a prediction algorithm to the color image to compute a three-dimensional pose of the object in the color space.
- the prediction algorithm computes the three-dimensional pose of the object in the color space by positioning and/or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object.
- the prediction algorithm uses a six degrees of freedom (6DoF) approach to predict the alignment.
- 6DoF six degrees of freedom
- a Perspective-n-Point (PnP) algorithm is configured to estimate the pose of the image capture device, and this estimation can be extended to align the three-dimensional landmarks, defined via an accessible three-dimensional mesh model, with the two-dimensional landmarks on the color image.
- the system applies a transform between the color space of the color image and the depth space of the depth image to compute a three-dimensional pose of the object in the depth space of the depth image.
- This transform may be necessary in scenarios where the color and depth coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences.
- the system can then use the three-dimensional pose of the object in the depth space of the depth image to repair depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- the color image and the repaired depth image enable an RGB-Depth (RGB-D) image to be produced.
- RGB-D RGB-Depth
- the system is configured to track the object in subsequent color image frames of the video.
- different neural networks can be used to first detect the object and then to track the object. Tests have shown that a first neural network for detection takes about nine milliseconds per frame to repair the depth values and that a second neural network for tracking (once the object is already detected) takes about three milliseconds per frame to repair depth values.
- FIG. 1 illustrates an example environment in which a depth image repair system can repair a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image.
- FIG. 2 illustrates further components and/or modules useable in the depth image repair system.
- FIG. 3 illustrates how a bounding box can be used to focus a recognition and tracking module on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video).
- a bounding box can be used to focus a recognition and tracking module on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video).
- FIG. 4 illustrates a transform that is used to convert between a color coordinate space and a depth coordinate space so that the depth image can be repaired.
- FIG. 5 illustrates an example environment in which the depth image repair system can be accessed by any one of multiple different applications via an application programming interface.
- FIG. 6 illustrates an example process that repairs a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image.
- FIG. 7 shows additional details of an example computer architecture for a computer, such as such as a server and/or server cluster, capable of executing the program components described herein.
- an image repair system that is configured to detect and track the three-dimensional pose of an object (e.g., a head-mounted display device) in a color image using an accessible three-dimensional model of the object.
- the system uses the three-dimensional pose of the object to repair pixel depth values associated with a region (e.g., a surface) of the object that is composed of material that absorbs light emitted by a time-of-flight depth sensor to determine depth. Consequently, a color-depth image (e.g., a Red-Green-Blue-Depth image or RGB-D image) can be produced that does not include dark holes on and around the region of the object that is composed of material that absorbs light emitted by the time-of-flight depth sensor.
- a color-depth image e.g., a Red-Green-Blue-Depth image or RGB-D image
- FIG. 1 illustrates an example environment 100 in which a depth image repair system 102 can repair a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image.
- the depth image repair system 102 includes an image capture device 104 , or is in some way connected (e.g., via a network connection) to an image capture device 104 .
- the image capture device 104 includes a time-of-flight (ToF) depth sensor 106 (e.g., a ToF depth sensor that emits an infra-red signal) and color sensor 108 (e.g., RGB sensor).
- the image capture device 104 is configured to capture a sequence of frames (e.g., image or video frames) that represent a real-world scene 110 that includes a physical object that is known to absorb the light emitted by the ToF depth sensor 106 .
- this object is a head-mounted display device 112 where the transparent visor prevents the light from being reflected. Consequently, the ToF depth sensor 106 is unable to accurately determine depth values for pixels associated with a region of the object that absorbs the light emitted by the ToF depth sensor 106 .
- the color sensor 108 is configured to generate a color image in a color space 114 and the ToF depth sensor 106 is configured to generate a depth image in a depth space 116 .
- the color image 114 of a person wearing a head-mounted display device 112 does not have any visible problems.
- the depth image 116 of the person wearing the head-mounted display device 112 includes dark holes around the users eyes where the transparent visor is located. Ultimately, this causes an incomplete color-depth (e.g., RGB-D) image 118 to be reproduced that clearly has problems with respect to representing and/or reconstructing the head-mounted display device and/or the user's head and eyes.
- RGB-D incomplete color-depth
- the image capture device 104 or an application that uses the images 114 , 116 captured by the image capture device 104 , is configured to provide the captured image data 120 to a repair module 122 .
- the captured image data 120 may include a sequence of frames that comprise a video (e.g., of a person wearing a head-mounted display device 112 ).
- the image capture device 104 is stationary.
- the image capture device 104 is moveable such that image frames can be captured from multiple different viewpoints within a physical environment.
- the repair module is 122 is configured to use accessible three-dimensional model data 124 (e.g., a CAD mesh model) associated with the object (e.g., the head-mounted display device) to repair the depth image 126 so that it no longer includes the dark holes shown in the initially captured depth image 116 . Accordingly, a complete color-depth image (e.g., RGB-D) image 128 can be reproduced that no longer has problems with respect to representing and/or reconstructing the head-mounted display device and/or the user's head and eyes.
- accessible three-dimensional model data 124 e.g., a CAD mesh model
- the object e.g., the head-mounted display device
- FIG. 2 illustrates further components and/or modules useable in the depth image repair system 102 .
- the depth image repair system 102 obtains the color image(s) in the color space 114 of the image capture device 104 and the depth image(s) in the depth space 116 of the image capture device 104 .
- a recognition and tracking module 202 is configured to detect (e.g., recognize) an object 112 in the color image 114 that is known to include a region (e.g., a surface) that absorbs the light emitted by the time-of-flight depth sensor 106 .
- the recognition and tracking module 202 is configured with a neural network 204 or another form of artificial intelligence which can detect any one of a plurality of objects known to cause the aforementioned problems in the depth image.
- a Deep Neural Network (DNN) model may be trained using thousands or even millions of color image frames that are each individually annotated to indicate the shape, position, and/or orientation of an object known to cause problems with respect to depth value determination.
- DNN Deep Neural Network
- the recognition and tracking module 202 is configured to predict two-dimensional points 206 on the object in the color image. This prediction can be implemented via another neural network 207 .
- the two-dimensional points 206 correspond to three-dimensional points that are predefined in the accessible three-dimensional model of the object 124 .
- the three-dimensional points in the three-dimensional model of the object 124 are manually defined in advance and can be any points on the three-dimensional model of the object 124 . In some instance, they may be points associated with important/distinctive corners and edges of the region of the object 206 that absorbs the light emitted by the time-of-flight depth sensor.
- Example three-dimensional models may be readily generated by use of computer-aided design (CAD) software programs, and thus, the three-dimensional model data that defines the three-dimensional points 124 may be a three-dimensional CAD “mesh” model.
- CAD computer-aided design
- the color image with the two-dimensional points 206 is then passed to an alignment module 208 configured to apply a prediction algorithm 210 to the color image to compute a three-dimensional pose of the object in the color space of the color image 212 .
- the prediction algorithm 210 computes the three-dimensional pose of the object in the color space of the color image 212 by positioning and/or rotating the three-dimensional model of the object 124 until the two-dimensional points on the object in the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object 124 .
- the prediction algorithm 210 uses a six degrees of freedom (6DoF) approach to predict the alignment.
- 6DoF six degrees of freedom
- a Perspective-n-Point (PnP) algorithm is configured to estimate the pose of the image capture device 104 relative to the captured scene, and this estimation can be extended to align the three-dimensional landmarks, defined via an accessible three-dimensional mesh model, with the two-dimensional landmarks on the object in the color image.
- a transformation module 214 applies a transform 216 between the color space of the color image and the depth space of the depth image to compute a three-dimensional pose of the object in the depth space of the depth image 218 .
- This transform 216 may be necessary in scenarios where the color and depth coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences.
- a depth determination module 220 can use the three-dimensional pose of the object in the depth space of the depth image 218 to repair depth values for pixels in the depth image 118 that are associated with the region of the object that absorbs the light emitted by the ToF depth sensor 106 .
- the depth determination module 220 can apply a rasterization algorithm 222 that is configured to determine the distance between the image capture device 104 (e.g., the ToF depth sensor 106 ) and a point (e.g., pixel) on the three-dimensional pose of the object in the depth space of the depth image 218 .
- the rasterization algorithm 222 projects vertices that make up triangles on to a depth plane and uses a technique to fill up the pixels that are covered by a triangle with a new depth value 224 .
- the depth determination module 220 can determine whether a new depth value 224 for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image should replace a previous depth value initially captured and computed for the depth mage 118 .
- the depth determination module 220 may be configured to replace a previous depth value for the pixel with the new depth value 224 if the previous depth value is corrupted (e.g., is completely missing or is greater than the new depth value 224 ).
- the recognition and tracking module 202 is configured to track the object in subsequent color image frames of the video.
- different neural networks can be used to first detect the object and then to track the object. Tests have shown that a first neural network for detection takes about nine milliseconds per frame to repair the depth values and that a second neural network for tracking (once the object is already detected) takes about three milliseconds per frame to repair depth values.
- FIG. 3 illustrates how a bounding box 302 can be used to focus the recognition and tracking module 202 on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video).
- a bounding box 302 can be used to focus the recognition and tracking module 202 on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video).
- FIG. 4 illustrates a transformation function 402 that is used to convert between a color coordinate space 404 and a depth coordinate space 406 so that the depth image can be repaired.
- the transformation function 402 comprises 4 ⁇ 4 matrix multiplication as follows:
- pose_in_color_space pose_in_depth_space*depth_to_color_transform
- the depth_to_color_transform is a 4 ⁇ 4 rigid transform that can be derived or retrieved from calibration information (e.g., calibration functions) associated with the image capture device 104 (e.g., the calibration information may be baked into the image capture device 104 when out-of-factory).
- calibration information e.g., calibration functions
- the calibration information may be baked into the image capture device 104 when out-of-factory.
- FIG. 5 illustrates an example environment 500 in which the depth image repair system 102 can be accessed by any one of multiple different applications 502 via an application programming interface 504 .
- the application 502 may be configured to obtain color and depth image frames 506 of a real-world scene 508 from an image capture device 104 .
- these image frames 506 may include corrupted depth data 510 (e.g., missing or incorrect depth values for pixels) due to an object in the scene that absorbs light emitted by a ToF depth sensor 106 .
- corrupted depth data 510 e.g., missing or incorrect depth values for pixels
- the application 502 calls on the depth image repair system 102 and submits the image frames 506 with the corrupted depth data 510 via the application programming interface 504 .
- the depth image repair system 102 may store or have access to a large number of neural networks 512 and three-dimensional models 514 of objects that are known to absorb the light emitted by a ToF depth sensor 106 .
- the depth image repair system 102 is configured to repair the depth data, as discussed above with respect to FIGS. 1 - 4 , and return the repaired depth data 516 (e.g., the complete RGB-D images) to the application 502 .
- the application 502 is a teleportation application that teleports and reconstructs a user 518 wearing a head-mounted display device in a different real-world scene 520 compared to the scene 508 where the user is actually located.
- a process 600 is describe that facilitates repairing a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image. It should be understood that the operations of the methods disclosed herein are not presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the appended claims.
- Computer-readable instructions and variants thereof, as used in the description and claims, is used expansively herein to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like.
- Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
- image data comprised of a color image and a depth image is obtained.
- the color image and the depth image are captured by an image capture device configured with a color sensor and a time-of-flight depth sensor.
- an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor is detected.
- a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image is applied.
- the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with corresponding three-dimensional points that are predefined in the three-dimensional model of the object.
- a transform between the color space and the depth space is applied to the three-dimensional pose of the object in the color space of the color image to compute a three-dimensional pose of the object in the depth space of the depth image.
- depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor are repaired using the three-dimensional pose of the object in the depth space of the depth image.
- FIG. 7 shows additional details of an example computer architecture 700 for a computer, such as such as a server and/or server cluster, capable of executing the program components described herein.
- the computer architecture 700 illustrated in FIG. 7 illustrates an architecture for a server computer, a mobile phone, a PDA, a smart phone, a desktop computer, a netbook computer, a tablet computer, and/or a laptop computer.
- the computer architecture 700 may be utilized to execute any aspects of the software components presented herein.
- the computer architecture 700 illustrated in FIG. 7 includes a central processing unit 702 (“CPU”), a system memory 704 , including a random-access memory 706 (“RAM”) and a read-only memory (“ROM”) 708 , and a system bus 710 that couples the memory 704 to the CPU 702 .
- the computer architecture 700 further includes a mass storage device 712 for storing an operating system 707 , other data, and one or more applications.
- the mass storage device 712 can also store computer-executable instruction for implementing the image depth repair system 102 .
- the mass storage device 712 is connected to the CPU 702 through a mass storage controller connected to the bus 710 .
- the mass storage device 712 and its associated computer-readable media provide non-volatile storage for the computer architecture 700 .
- computer-readable media can be any available computer storage media or communication media that can be accessed by the computer architecture 700 .
- Communication media includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media.
- modulated data signal means a signal that has one or more of its characteristics changed or set in a manner so as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
- computer media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid-state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer architecture 700 .
- DVD digital versatile disks
- HD-DVD high definition digital versatile disks
- BLU-RAY blue ray
- computer storage medium does not include waves, signals, and/or other transitory and/or intangible communication media, per se.
- the computer architecture 700 may operate in a networked environment using logical connections to remote computers through the network 756 and/or another network.
- the computer architecture 700 may connect to the network 756 through a network interface unit 714 connected to the bus 710 . It should be appreciated that the network interface unit 714 also may be utilized to connect to other types of networks and remote computer systems.
- the computer architecture 700 also may include an input/output controller 716 for receiving and processing input from a number of other devices, including a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 716 may provide output to a display screen, a printer, or other type of output device.
- the software components described herein may, when loaded into the CPU 702 and executed, transform the CPU 702 and the overall computer architecture 700 from a general-purpose computing system into a special-purpose computing system customized to facilitate the functionality presented herein.
- the CPU 702 may be constructed from any number of transistors or other discrete circuit elements, which may individually or collectively assume any number of states. More specifically, the CPU 702 may operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions may transform the CPU 702 by specifying how the CPU 702 transitions between states, thereby transforming the transistors or other discrete hardware elements constituting the CPU 702 .
- Encoding the software modules presented herein also may transform the physical structure of the computer-readable media presented herein.
- the specific transformation of physical structure may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the computer-readable media, whether the computer-readable media is characterized as primary or secondary storage, and the like.
- the computer-readable media is implemented as semiconductor-based memory
- the software disclosed herein may be encoded on the computer-readable media by transforming the physical state of the semiconductor memory.
- the software may transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory.
- the software also may transform the physical state of such components in order to store data thereupon.
- the computer-readable media disclosed herein may be implemented using magnetic or optical technology.
- the software presented herein may transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations may include altering the magnetic characteristics of particular locations within given magnetic media. These transformations also may include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- the computer architecture 700 may include other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that the computer architecture 700 may not include all of the components shown in FIG. 7 , may include other components that are not explicitly shown in FIG. 7 , or may utilize an architecture completely different than that shown in FIG. 7 .
- Example Clause A a method comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object; applying, to the three-
- Example Clause B the method of Example Clause A, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause C the method of Example Clause A or Example Clause B, wherein the color image and the depth image are configured to generate an RGB-D image.
- Example Clause D the method of any one of Example Clauses A through C, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause E the method of any one of Example Clauses A through D, further comprising using the first neural network to configure a bounding box to track movement of the object in a scene.
- Example Clause F the method of any one of Example Clauses A through E, wherein the transform between the color space and the depth space comprises a four-by-four matrix multiplication rigid transform.
- Example Clause G the method of Example Clause F, wherein the four-by-four matrix multiplication rigid transform is defined via a calibration function defined for the color sensor and the time-of-flight depth sensor.
- Example Clause H the method of any one of Example Clauses A through G, wherein: the image frame is obtained from an application via an application programming interface as part of a sequence of image frames; repairing the depth values associated with the region of the object that absorbs the infra-red signal emitted by the time-of-flight depth sensor enables a corrected RGB-D image to be produced; the method further comprises providing the corrected RGB-D image to the application.
- Example Clause I a system comprising: one or more processing units; and computer storage media storing instructions that, when executed by the one or more processing units, cause the system to perform operations comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional
- Example Clause J the system of Example Clause I, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause K the system of Example Clause I or Example Clause J, wherein the color image and the depth image are configured to generate an RGB-D image.
- Example Clause L the system of any one of Example Clauses I through K, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause M the system of any one of Example Clauses I through L, wherein the operations further comprise using the first neural network to configure a bounding box to track movement of the object in a scene.
- Example Clause N the system of any one of Example Clauses I through M, wherein the transform between the color space and the depth space comprises a four-by-four matrix multiplication rigid transform.
- Example Clause O the system of Example Clauses N, wherein the four-by-four matrix multiplication rigid transform is defined via a calibration function defined for the color sensor and the time-of-flight depth sensor.
- Example Clause P the system of any one of Example Clauses I through O, wherein: the image frame is obtained from an application via an application programming interface as part of a sequence of image frames; repairing the depth values associated with the region of the object that absorbs the infra-red signal emitted by the time-of-flight depth sensor enables a corrected RGB-D image to be produced; the operations further comprise providing the corrected RGB-D image to the application.
- Example Clause Q computer storage media storing instructions that, when executed by one or more processing units, cause a system to perform operations comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional
- Example Clause R the computer storage media of Example Clause Q, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause S the computer storage media of Example Clause Q or Example Clause R, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause T the computer storage media of any one of Example Clauses Q through S, wherein the operations further comprise using the first neural network to configure a bounding box to track movement of the object in a scene.
- any reference to “first,” “second,” etc. elements within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first,” “second,” etc. elements of the claims. Rather, any use of “first” and “second” within the Summary, Detailed Description, and/or claims may be used to distinguish between two different instances of the same element (e.g., two different images).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Radar, Positioning & Navigation (AREA)
- Software Systems (AREA)
- Remote Sensing (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- Computer Graphics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Architecture (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Description
- Applications may use a depth image to display or reconstruct a three-dimensional environment. Some image capture devices use infra-red (IR) technology or other light-based technology to determine depth in a scene and create a depth image (e.g., a depth map). For example, a camera may use a time-of-flight depth sensor (e.g., an array of time-of-flight pixels) to illuminate a scene with light (e.g., an IR pattern) emitted from an artificial light source and to detect light that is reflected. The phase shift between the emitted light and the reflected light is measured and depth information for various pixels in a depth image can be determined based on the phase shift.
- Unfortunately, the time-of-flight depth sensor may experience issues with respect to accurately determining depth information for a scene. For instance, an object in the scene may include a surface that is made of material that absorbs the emitted light (e.g., the IR pattern) so the time-of-flight depth sensor cannot clearly detect, or see, the light that is reflected. This lack of detection and/or visibility translates to missing or corrupted depth values in the depth image. In one example, a head-mounted device (e.g., augmented reality device, mixed reality device, etc.) includes an a transparent visor that is composed of material that absorbs the emitted light. Consequently, the time-of-flight depth sensor is unable to accurately determine the depth values for the pixels that are associated with the transparent eye screen, and therefore, the resulting depth image includes dark holes on and around the user's eyes.
- It is with respect to these and other considerations that the disclosure made herein is presented.
- The techniques disclosed herein enable a system to detect and track the three-dimensional pose of an object (e.g., a head-mounted display device) in a color image using an accessible three-dimensional model of the object. The system uses the three-dimensional pose of the object to repair pixel depth values associated with a region (e.g., a surface) of the object that is composed of material that absorbs light emitted by a time-of-flight depth sensor to determine depth. Consequently, a color-depth image can be produced that does not include dark holes on and around the region of the object that is composed of material that absorbs light emitted by the time-of-flight depth sensor.
- The system is configured to obtain image data for a scene that was captured by an image capture device (e.g., a camera). The image data may include a sequence of frames that comprise a video (e.g., of a user wearing a head-mounted display device). The image capture device includes a color (e.g., Red-Green-Blue or RGB) sensor and a time-of-flight depth sensor, and thus, each frame includes a color image in a color coordinate space and a corresponding depth image in a depth coordinate space. In various examples, both coordinate spaces are right handed coordinate systems (e.g., X, Y, Z) with Z pointed out (e.g., towards a camera lens) and Y pointed up, but the coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences.
- As described above, if the scene includes an object that has a region composed of material that absorbs emitted light (e.g., the IR pattern), then the time-of-flight depth sensor cannot clearly detect, or see, the light that is reflected. This lack of detection and/or visibility translates to missing or corrupted depth values in the depth image. That is, the depth image is likely to include dark holes on and around the region that absorbs the light emitted by the time-of-flight depth sensor.
- To resolve the depth issues, the system is configured to detect (e.g., recognize) an object in the color image that is known to include a region (e.g., a surface) that absorbs the light emitted by the time-of-flight depth sensor. Once detected, the system predicts a set of two-dimensional points on the object in the color image that correspond to three-dimensional points that are predefined in an accessible three-dimensional model of the object. For instance, the three-dimensional points on the three-dimensional model of the object (may alternatively be referred to as key points or landmarks) are manually defined in advance and can be any points on the three-dimensional model. In some instance, they may be points associated with important/distinctive corners and edges of the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- A first neural network or other form of artificial intelligence can be used to detect the object. For example, a Deep Neural Network (DNN) model may be trained using thousands or even millions of color image frames that are each individually annotated to indicate the shape, position, and/or orientation of an object known to cause problems with respect to depth value determination. A second neural network or other form of artificial intelligence can be used to predict the two-dimensional points. Example three-dimensional models may be readily generated by use of computer-aided design (CAD) software programs, and thus, the three-dimensional model may be a three-dimensional CAD “mesh” model.
- Next, the system is configured to apply a prediction algorithm to the color image to compute a three-dimensional pose of the object in the color space. The prediction algorithm computes the three-dimensional pose of the object in the color space by positioning and/or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object. In various examples, the prediction algorithm uses a six degrees of freedom (6DoF) approach to predict the alignment. For instance, a Perspective-n-Point (PnP) algorithm is configured to estimate the pose of the image capture device, and this estimation can be extended to align the three-dimensional landmarks, defined via an accessible three-dimensional mesh model, with the two-dimensional landmarks on the color image.
- Now that the system has predicted the three-dimensional pose of the object in the color image, the system applies a transform between the color space of the color image and the depth space of the depth image to compute a three-dimensional pose of the object in the depth space of the depth image. This transform may be necessary in scenarios where the color and depth coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences. The system can then use the three-dimensional pose of the object in the depth space of the depth image to repair depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- In various examples, the color image and the repaired depth image enable an RGB-Depth (RGB-D) image to be produced. Moreover, once the object has been detected in a first color image frame of a video, the system is configured to track the object in subsequent color image frames of the video. In various examples, different neural networks can be used to first detect the object and then to track the object. Tests have shown that a first neural network for detection takes about nine milliseconds per frame to repair the depth values and that a second neural network for tracking (once the object is already detected) takes about three milliseconds per frame to repair depth values.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “techniques,” for instance, may refer to system(s), method(s), computer-readable instructions, module(s), algorithms, hardware logic, and/or operation(s) as permitted by the context described above and throughout the document.
- The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items. References made to individual items of a plurality of items can use a reference number with a letter of a sequence of letters to refer to each individual item. Generic references to the items may use the specific reference number without the sequence of letters.
-
FIG. 1 illustrates an example environment in which a depth image repair system can repair a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image. -
FIG. 2 illustrates further components and/or modules useable in the depth image repair system. -
FIG. 3 illustrates how a bounding box can be used to focus a recognition and tracking module on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video). -
FIG. 4 illustrates a transform that is used to convert between a color coordinate space and a depth coordinate space so that the depth image can be repaired. -
FIG. 5 illustrates an example environment in which the depth image repair system can be accessed by any one of multiple different applications via an application programming interface. -
FIG. 6 illustrates an example process that repairs a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image. -
FIG. 7 shows additional details of an example computer architecture for a computer, such as such as a server and/or server cluster, capable of executing the program components described herein. - The following Detailed Description discloses an image repair system that is configured to detect and track the three-dimensional pose of an object (e.g., a head-mounted display device) in a color image using an accessible three-dimensional model of the object. The system uses the three-dimensional pose of the object to repair pixel depth values associated with a region (e.g., a surface) of the object that is composed of material that absorbs light emitted by a time-of-flight depth sensor to determine depth. Consequently, a color-depth image (e.g., a Red-Green-Blue-Depth image or RGB-D image) can be produced that does not include dark holes on and around the region of the object that is composed of material that absorbs light emitted by the time-of-flight depth sensor. Various examples, scenarios, and aspects of the disclosed techniques are described below with reference to
FIGS. 1-7 . -
FIG. 1 illustrates anexample environment 100 in which a depthimage repair system 102 can repair a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image. The depthimage repair system 102 includes animage capture device 104, or is in some way connected (e.g., via a network connection) to animage capture device 104. - The
image capture device 104 includes a time-of-flight (ToF) depth sensor 106 (e.g., a ToF depth sensor that emits an infra-red signal) and color sensor 108 (e.g., RGB sensor). Theimage capture device 104 is configured to capture a sequence of frames (e.g., image or video frames) that represent a real-world scene 110 that includes a physical object that is known to absorb the light emitted by theToF depth sensor 106. In one example, this object is a head-mounteddisplay device 112 where the transparent visor prevents the light from being reflected. Consequently, theToF depth sensor 106 is unable to accurately determine depth values for pixels associated with a region of the object that absorbs the light emitted by theToF depth sensor 106. - To this end, the
color sensor 108 is configured to generate a color image in acolor space 114 and theToF depth sensor 106 is configured to generate a depth image in adepth space 116. As shown inFIG. 1 , thecolor image 114 of a person wearing a head-mounteddisplay device 112 does not have any visible problems. However, thedepth image 116 of the person wearing the head-mounteddisplay device 112 includes dark holes around the users eyes where the transparent visor is located. Ultimately, this causes an incomplete color-depth (e.g., RGB-D)image 118 to be reproduced that clearly has problems with respect to representing and/or reconstructing the head-mounted display device and/or the user's head and eyes. - To resolve this problem, the
image capture device 104, or an application that uses theimages image capture device 104, is configured to provide the capturedimage data 120 to arepair module 122. As described above, the capturedimage data 120 may include a sequence of frames that comprise a video (e.g., of a person wearing a head-mounted display device 112). In one example, theimage capture device 104 is stationary. However, in another example, theimage capture device 104 is moveable such that image frames can be captured from multiple different viewpoints within a physical environment. - The repair module is 122 is configured to use accessible three-dimensional model data 124 (e.g., a CAD mesh model) associated with the object (e.g., the head-mounted display device) to repair the
depth image 126 so that it no longer includes the dark holes shown in the initially captureddepth image 116. Accordingly, a complete color-depth image (e.g., RGB-D)image 128 can be reproduced that no longer has problems with respect to representing and/or reconstructing the head-mounted display device and/or the user's head and eyes. -
FIG. 2 illustrates further components and/or modules useable in the depthimage repair system 102. As shown, the depthimage repair system 102 obtains the color image(s) in thecolor space 114 of theimage capture device 104 and the depth image(s) in thedepth space 116 of theimage capture device 104. - A recognition and
tracking module 202 is configured to detect (e.g., recognize) anobject 112 in thecolor image 114 that is known to include a region (e.g., a surface) that absorbs the light emitted by the time-of-flight depth sensor 106. In one example, the recognition andtracking module 202 is configured with aneural network 204 or another form of artificial intelligence which can detect any one of a plurality of objects known to cause the aforementioned problems in the depth image. For example, a Deep Neural Network (DNN) model may be trained using thousands or even millions of color image frames that are each individually annotated to indicate the shape, position, and/or orientation of an object known to cause problems with respect to depth value determination. - Once detected, the recognition and
tracking module 202 is configured to predict two-dimensional points 206 on the object in the color image. This prediction can be implemented via anotherneural network 207. The two-dimensional points 206 correspond to three-dimensional points that are predefined in the accessible three-dimensional model of theobject 124. For instance, the three-dimensional points in the three-dimensional model of theobject 124 are manually defined in advance and can be any points on the three-dimensional model of theobject 124. In some instance, they may be points associated with important/distinctive corners and edges of the region of theobject 206 that absorbs the light emitted by the time-of-flight depth sensor. Example three-dimensional models may be readily generated by use of computer-aided design (CAD) software programs, and thus, the three-dimensional model data that defines the three-dimensional points 124 may be a three-dimensional CAD “mesh” model. - The color image with the two-
dimensional points 206 is then passed to analignment module 208 configured to apply aprediction algorithm 210 to the color image to compute a three-dimensional pose of the object in the color space of thecolor image 212. Theprediction algorithm 210 computes the three-dimensional pose of the object in the color space of thecolor image 212 by positioning and/or rotating the three-dimensional model of theobject 124 until the two-dimensional points on the object in the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of theobject 124. - In various examples, the
prediction algorithm 210 uses a six degrees of freedom (6DoF) approach to predict the alignment. For instance, a Perspective-n-Point (PnP) algorithm is configured to estimate the pose of theimage capture device 104 relative to the captured scene, and this estimation can be extended to align the three-dimensional landmarks, defined via an accessible three-dimensional mesh model, with the two-dimensional landmarks on the object in the color image. - Now that the
alignment module 208 has computed a predicted three-dimensional pose of the object in thecolor image 212, atransformation module 214 applies atransform 216 between the color space of the color image and the depth space of the depth image to compute a three-dimensional pose of the object in the depth space of thedepth image 218. Thistransform 216 may be necessary in scenarios where the color and depth coordinate spaces do not have the same origin and the axes are not colinear due to camera/sensor differences. - Next, a
depth determination module 220 can use the three-dimensional pose of the object in the depth space of thedepth image 218 to repair depth values for pixels in thedepth image 118 that are associated with the region of the object that absorbs the light emitted by theToF depth sensor 106. For instance, thedepth determination module 220 can apply arasterization algorithm 222 that is configured to determine the distance between the image capture device 104 (e.g., the ToF depth sensor 106) and a point (e.g., pixel) on the three-dimensional pose of the object in the depth space of thedepth image 218. In one example, therasterization algorithm 222 projects vertices that make up triangles on to a depth plane and uses a technique to fill up the pixels that are covered by a triangle with anew depth value 224. - The
depth determination module 220 can determine whether anew depth value 224 for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image should replace a previous depth value initially captured and computed for thedepth mage 118. Thedepth determination module 220 may be configured to replace a previous depth value for the pixel with thenew depth value 224 if the previous depth value is corrupted (e.g., is completely missing or is greater than the new depth value 224). - Consequently, the original depth image is repaired with more accurate depth values. This allows for an improved RGB-Depth (RGB-D) image to be produced. Moreover, once the object has been detected in a first color image frame of a video, the recognition and
tracking module 202 is configured to track the object in subsequent color image frames of the video. In various examples, different neural networks can be used to first detect the object and then to track the object. Tests have shown that a first neural network for detection takes about nine milliseconds per frame to repair the depth values and that a second neural network for tracking (once the object is already detected) takes about three milliseconds per frame to repair depth values. -
FIG. 3 illustrates how abounding box 302 can be used to focus the recognition andtracking module 202 on an area of a color image frame in which an object is located and/or is likely to move from one color image frame to the next in a sequence of color image frames (e.g., a video). This enables theneural network 204 to operate more efficiently with regard to tracking the movement of the object within the color image frames since a whole image frame does not need to be analyzed for tracking purposes. -
FIG. 4 illustrates atransformation function 402 that is used to convert between a color coordinatespace 404 and a depth coordinatespace 406 so that the depth image can be repaired. In one example, thetransformation function 402 comprises 4×4 matrix multiplication as follows: -
pose_in_color_space=pose_in_depth_space*depth_to_color_transform - Here, the depth_to_color_transform is a 4×4 rigid transform that can be derived or retrieved from calibration information (e.g., calibration functions) associated with the image capture device 104 (e.g., the calibration information may be baked into the
image capture device 104 when out-of-factory). -
FIG. 5 illustrates anexample environment 500 in which the depthimage repair system 102 can be accessed by any one of multipledifferent applications 502 via anapplication programming interface 504. For example, theapplication 502 may be configured to obtain color and depth image frames 506 of a real-world scene 508 from animage capture device 104. As described above, these image frames 506 may include corrupted depth data 510 (e.g., missing or incorrect depth values for pixels) due to an object in the scene that absorbs light emitted by aToF depth sensor 106. - Accordingly, the
application 502 calls on the depthimage repair system 102 and submits the image frames 506 with the corrupteddepth data 510 via theapplication programming interface 504. The depthimage repair system 102 may store or have access to a large number ofneural networks 512 and three-dimensional models 514 of objects that are known to absorb the light emitted by aToF depth sensor 106. The depthimage repair system 102 is configured to repair the depth data, as discussed above with respect toFIGS. 1-4 , and return the repaired depth data 516 (e.g., the complete RGB-D images) to theapplication 502. In one example, theapplication 502 is a teleportation application that teleports and reconstructs a user 518 wearing a head-mounted display device in a different real-world scene 520 compared to thescene 508 where the user is actually located. - Turning now to
FIG. 6 , aprocess 600 is describe that facilitates repairing a depth image using a corresponding color image and an accessible three-dimensional model of an object that causes corrupt or missing depth values to exist in the depth image. It should be understood that the operations of the methods disclosed herein are not presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the appended claims. - It also should be understood that the term “computer-readable instructions,” and variants thereof, as used in the description and claims, is used expansively herein to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
- At
operation 602, image data comprised of a color image and a depth image is obtained. As described above, the color image and the depth image are captured by an image capture device configured with a color sensor and a time-of-flight depth sensor. - At
operation 604, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor is detected. - Next, at
operation 606, a three-dimensional model of the object is accessed. - Moving to
operation 608, two-dimensional points on the color image that corresponds to three-dimensional points that are predefined in the three-dimensional object model are predicted. - At
operation 610, a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image is applied. In various examples, the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with corresponding three-dimensional points that are predefined in the three-dimensional model of the object. - At
operation 612, a transform between the color space and the depth space is applied to the three-dimensional pose of the object in the color space of the color image to compute a three-dimensional pose of the object in the depth space of the depth image. - At
operation 614, depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor are repaired using the three-dimensional pose of the object in the depth space of the depth image. -
FIG. 7 shows additional details of anexample computer architecture 700 for a computer, such as such as a server and/or server cluster, capable of executing the program components described herein. Thus, thecomputer architecture 700 illustrated inFIG. 7 illustrates an architecture for a server computer, a mobile phone, a PDA, a smart phone, a desktop computer, a netbook computer, a tablet computer, and/or a laptop computer. Thecomputer architecture 700 may be utilized to execute any aspects of the software components presented herein. - The
computer architecture 700 illustrated inFIG. 7 includes a central processing unit 702 (“CPU”), asystem memory 704, including a random-access memory 706 (“RAM”) and a read-only memory (“ROM”) 708, and asystem bus 710 that couples thememory 704 to theCPU 702. A basic input/output system containing the basic routines that help to transfer information between elements within thecomputer architecture 700, such as during startup, is stored in theROM 708. Thecomputer architecture 700 further includes amass storage device 712 for storing anoperating system 707, other data, and one or more applications. Themass storage device 712 can also store computer-executable instruction for implementing the imagedepth repair system 102. - The
mass storage device 712 is connected to theCPU 702 through a mass storage controller connected to thebus 710. Themass storage device 712 and its associated computer-readable media provide non-volatile storage for thecomputer architecture 700. Although the description of computer-readable media contained herein refers to a mass storage device, such as a solid state drive, a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available computer storage media or communication media that can be accessed by thecomputer architecture 700. - Communication media includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner so as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- By way of example, and not limitation, computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid-state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the
computer architecture 700. For purposes of the claims, the phrase “computer storage medium,” “computer-readable storage medium” and variations thereof, does not include waves, signals, and/or other transitory and/or intangible communication media, per se. - According to various configurations, the
computer architecture 700 may operate in a networked environment using logical connections to remote computers through thenetwork 756 and/or another network. Thecomputer architecture 700 may connect to thenetwork 756 through anetwork interface unit 714 connected to thebus 710. It should be appreciated that thenetwork interface unit 714 also may be utilized to connect to other types of networks and remote computer systems. Thecomputer architecture 700 also may include an input/output controller 716 for receiving and processing input from a number of other devices, including a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 716 may provide output to a display screen, a printer, or other type of output device. - It should be appreciated that the software components described herein may, when loaded into the
CPU 702 and executed, transform theCPU 702 and theoverall computer architecture 700 from a general-purpose computing system into a special-purpose computing system customized to facilitate the functionality presented herein. TheCPU 702 may be constructed from any number of transistors or other discrete circuit elements, which may individually or collectively assume any number of states. More specifically, theCPU 702 may operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions may transform theCPU 702 by specifying how theCPU 702 transitions between states, thereby transforming the transistors or other discrete hardware elements constituting theCPU 702. - Encoding the software modules presented herein also may transform the physical structure of the computer-readable media presented herein. The specific transformation of physical structure may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the computer-readable media, whether the computer-readable media is characterized as primary or secondary storage, and the like. For example, if the computer-readable media is implemented as semiconductor-based memory, the software disclosed herein may be encoded on the computer-readable media by transforming the physical state of the semiconductor memory. For example, the software may transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory. The software also may transform the physical state of such components in order to store data thereupon.
- As another example, the computer-readable media disclosed herein may be implemented using magnetic or optical technology. In such implementations, the software presented herein may transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations may include altering the magnetic characteristics of particular locations within given magnetic media. These transformations also may include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- In light of the above, it should be appreciated that many types of physical transformations take place in the
computer architecture 700 in order to store and execute the software components presented herein. It also should be appreciated that thecomputer architecture 700 may include other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art. It is also contemplated that thecomputer architecture 700 may not include all of the components shown inFIG. 7 , may include other components that are not explicitly shown inFIG. 7 , or may utilize an architecture completely different than that shown inFIG. 7 . - The disclosure presented herein also encompasses the subject matter set forth in the following clauses.
- Example Clause A, a method comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object; applying, to the three-dimensional pose of the object in the color space of the color image, a transform between the color space and the depth space to compute a three-dimensional pose of the object in the depth space of the depth image; and repairing, using the three-dimensional pose of the object in the depth space of the depth image, depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- Example Clause B, the method of Example Clause A, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause C, the method of Example Clause A or Example Clause B, wherein the color image and the depth image are configured to generate an RGB-D image.
- Example Clause D, the method of any one of Example Clauses A through C, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause E, the method of any one of Example Clauses A through D, further comprising using the first neural network to configure a bounding box to track movement of the object in a scene.
- Example Clause F, the method of any one of Example Clauses A through E, wherein the transform between the color space and the depth space comprises a four-by-four matrix multiplication rigid transform.
- Example Clause G, the method of Example Clause F, wherein the four-by-four matrix multiplication rigid transform is defined via a calibration function defined for the color sensor and the time-of-flight depth sensor.
- Example Clause H, the method of any one of Example Clauses A through G, wherein: the image frame is obtained from an application via an application programming interface as part of a sequence of image frames; repairing the depth values associated with the region of the object that absorbs the infra-red signal emitted by the time-of-flight depth sensor enables a corrected RGB-D image to be produced; the method further comprises providing the corrected RGB-D image to the application.
- Example Clause I, a system comprising: one or more processing units; and computer storage media storing instructions that, when executed by the one or more processing units, cause the system to perform operations comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object; applying, to the three-dimensional pose of the object in the color space of the color image, a transform between the color space and the depth space to compute a three-dimensional pose of the object in the depth space of the depth image; and repairing, using the three-dimensional pose of the object in the depth space of the depth image, depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- Example Clause J, the system of Example Clause I, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause K, the system of Example Clause I or Example Clause J, wherein the color image and the depth image are configured to generate an RGB-D image.
- Example Clause L, the system of any one of Example Clauses I through K, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause M, the system of any one of Example Clauses I through L, wherein the operations further comprise using the first neural network to configure a bounding box to track movement of the object in a scene.
- Example Clause N, the system of any one of Example Clauses I through M, wherein the transform between the color space and the depth space comprises a four-by-four matrix multiplication rigid transform.
- Example Clause O, the system of Example Clauses N, wherein the four-by-four matrix multiplication rigid transform is defined via a calibration function defined for the color sensor and the time-of-flight depth sensor.
- Example Clause P, the system of any one of Example Clauses I through O, wherein: the image frame is obtained from an application via an application programming interface as part of a sequence of image frames; repairing the depth values associated with the region of the object that absorbs the infra-red signal emitted by the time-of-flight depth sensor enables a corrected RGB-D image to be produced; the operations further comprise providing the corrected RGB-D image to the application.
- Example Clause Q, computer storage media storing instructions that, when executed by one or more processing units, cause a system to perform operations comprising: obtaining image data comprised of a color image and a depth image captured by an image capture device configured with a color sensor and a time-of-flight depth sensor; detecting, using a first neural network, an object in the color image that is known to include a region that absorbs light emitted by the time-of-flight depth sensor; accessing a three-dimensional model of the object; in response to the detected object, predicting, using a second neural network, two-dimensional points on the color image that correspond to three-dimensional points that are predefined in the three-dimensional object model; applying a prediction algorithm to compute a three-dimensional pose of the object in a color space of the color image, wherein application of the prediction algorithm computes the three-dimensional pose of the object in the color space of the color image by at least one of positioning or rotating the three-dimensional model of the object until the two-dimensional points on the color image align with the corresponding three-dimensional points that are predefined in the three-dimensional model of the object; applying, to the three-dimensional pose of the object in the color space of the color image, a transform between the color space and the depth space to compute a three-dimensional pose of the object in the depth space of the depth image; and repairing, using the three-dimensional pose of the object in the depth space of the depth image, depth values for pixels in the depth image that are associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor.
- Example Clause R, the computer storage media of Example Clause Q, wherein repairing the depth values associated with the region of the object that absorbs the light emitted by the time-of-flight depth sensor comprises: applying a rasterization algorithm to determine a new depth value for a pixel associated with the three-dimensional pose of the object in the depth space of the depth image; and replacing a previous depth value for the pixel with the new depth value if the previous depth value is missing or is greater than the new depth value.
- Example Clause S, the computer storage media of Example Clause Q or Example Clause R, wherein the prediction algorithm comprises a perspective-n-point algorithm.
- Example Clause T, the computer storage media of any one of Example Clauses Q through S, wherein the operations further comprise using the first neural network to configure a bounding box to track movement of the object in a scene.
- While certain example embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of the inventions disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions disclosed herein. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of certain of the inventions disclosed herein.
- It should be appreciated that any reference to “first,” “second,” etc. elements within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first,” “second,” etc. elements of the claims. Rather, any use of “first” and “second” within the Summary, Detailed Description, and/or claims may be used to distinguish between two different instances of the same element (e.g., two different images).
- In closing, although the various configurations have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended representations is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed subject matter.
Claims (20)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/713,038 US12190537B2 (en) | 2022-04-04 | 2022-04-04 | Repairing image depth values for an object with a light absorbing surface |
PCT/US2023/013441 WO2023196057A1 (en) | 2022-04-04 | 2023-02-21 | Repairing image depth values for an object with a light absorbing surface |
CN202380027164.0A CN119013689A (en) | 2022-04-04 | 2023-02-21 | Repairing image depth values of an object using light absorbing surfaces |
EP23711303.0A EP4505393A1 (en) | 2022-04-04 | 2023-02-21 | Repairing image depth values for an object with a light absorbing surface |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/713,038 US12190537B2 (en) | 2022-04-04 | 2022-04-04 | Repairing image depth values for an object with a light absorbing surface |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230316552A1 true US20230316552A1 (en) | 2023-10-05 |
US12190537B2 US12190537B2 (en) | 2025-01-07 |
Family
ID=85641114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/713,038 Active 2042-05-29 US12190537B2 (en) | 2022-04-04 | 2022-04-04 | Repairing image depth values for an object with a light absorbing surface |
Country Status (4)
Country | Link |
---|---|
US (1) | US12190537B2 (en) |
EP (1) | EP4505393A1 (en) |
CN (1) | CN119013689A (en) |
WO (1) | WO2023196057A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220222852A1 (en) * | 2020-12-03 | 2022-07-14 | Tata Consultancy Services Limited | Methods and systems for generating end-to-end model to estimate 3-dimensional(3-d) pose of object |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314613A1 (en) * | 2015-04-21 | 2016-10-27 | Microsoft Technology Licensing, Llc | Time-of-flight simulation of multipath light phenomena |
US20170041585A1 (en) * | 2015-08-06 | 2017-02-09 | Intel Corporation | Depth image enhancement for hardware generated depth images |
US20190200902A1 (en) * | 2017-12-28 | 2019-07-04 | Colgate-Palmolive Company | Systems and Methods for Estimating a Three-Dimensional Pose |
US20190258225A1 (en) * | 2017-11-17 | 2019-08-22 | Kodak Alaris Inc. | Automated 360-degree dense point object inspection |
US10504274B2 (en) * | 2018-01-05 | 2019-12-10 | Microsoft Technology Licensing, Llc | Fusing, texturing, and rendering views of dynamic three-dimensional models |
US20200000552A1 (en) * | 2018-06-29 | 2020-01-02 | Align Technology, Inc. | Photo of a patient with new simulated smile in an orthodontic treatment review software |
US20200226786A1 (en) * | 2019-01-11 | 2020-07-16 | Microsoft Technology Licensing, Llc | Detecting pose using floating keypoint(s) |
US10769411B2 (en) * | 2017-11-15 | 2020-09-08 | Qualcomm Technologies, Inc. | Pose estimation and model retrieval for objects in images |
US20210009080A1 (en) * | 2019-02-28 | 2021-01-14 | Shanghai Sensetime Lingang Intelligent Technology Co., Ltd. | Vehicle door unlocking method, electronic device and storage medium |
US20210035303A1 (en) * | 2019-07-30 | 2021-02-04 | Microsoft Technology Licensing, Llc | Pixel classification to reduce depth-estimation error |
CN113888702A (en) * | 2021-10-29 | 2022-01-04 | 四川正昊微科科技有限公司 | Indoor high-precision real-time modeling and space positioning device and method based on multi-TOF laser radar and RGB camera |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120327116A1 (en) | 2011-06-23 | 2012-12-27 | Microsoft Corporation | Total field of view classification for head-mounted display |
US9438878B2 (en) | 2013-05-01 | 2016-09-06 | Legend3D, Inc. | Method of converting 2D video to 3D video using 3D object models |
WO2015185537A1 (en) | 2014-06-03 | 2015-12-10 | Thomson Licensing | Method and device for reconstruction the face of a user wearing a head mounted display |
US20180101989A1 (en) | 2016-10-06 | 2018-04-12 | Google Inc. | Headset removal in virtual, augmented, and mixed reality using an eye gaze database |
-
2022
- 2022-04-04 US US17/713,038 patent/US12190537B2/en active Active
-
2023
- 2023-02-21 EP EP23711303.0A patent/EP4505393A1/en active Pending
- 2023-02-21 WO PCT/US2023/013441 patent/WO2023196057A1/en active Application Filing
- 2023-02-21 CN CN202380027164.0A patent/CN119013689A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314613A1 (en) * | 2015-04-21 | 2016-10-27 | Microsoft Technology Licensing, Llc | Time-of-flight simulation of multipath light phenomena |
US20170041585A1 (en) * | 2015-08-06 | 2017-02-09 | Intel Corporation | Depth image enhancement for hardware generated depth images |
US10769411B2 (en) * | 2017-11-15 | 2020-09-08 | Qualcomm Technologies, Inc. | Pose estimation and model retrieval for objects in images |
US20190258225A1 (en) * | 2017-11-17 | 2019-08-22 | Kodak Alaris Inc. | Automated 360-degree dense point object inspection |
US20190200902A1 (en) * | 2017-12-28 | 2019-07-04 | Colgate-Palmolive Company | Systems and Methods for Estimating a Three-Dimensional Pose |
US10504274B2 (en) * | 2018-01-05 | 2019-12-10 | Microsoft Technology Licensing, Llc | Fusing, texturing, and rendering views of dynamic three-dimensional models |
US20200000552A1 (en) * | 2018-06-29 | 2020-01-02 | Align Technology, Inc. | Photo of a patient with new simulated smile in an orthodontic treatment review software |
US20200226786A1 (en) * | 2019-01-11 | 2020-07-16 | Microsoft Technology Licensing, Llc | Detecting pose using floating keypoint(s) |
US20210009080A1 (en) * | 2019-02-28 | 2021-01-14 | Shanghai Sensetime Lingang Intelligent Technology Co., Ltd. | Vehicle door unlocking method, electronic device and storage medium |
US20210035303A1 (en) * | 2019-07-30 | 2021-02-04 | Microsoft Technology Licensing, Llc | Pixel classification to reduce depth-estimation error |
CN113888702A (en) * | 2021-10-29 | 2022-01-04 | 四川正昊微科科技有限公司 | Indoor high-precision real-time modeling and space positioning device and method based on multi-TOF laser radar and RGB camera |
Non-Patent Citations (1)
Title |
---|
CN1138888702A (Machine Translation on 9/23/2023) (Year: 2022) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220222852A1 (en) * | 2020-12-03 | 2022-07-14 | Tata Consultancy Services Limited | Methods and systems for generating end-to-end model to estimate 3-dimensional(3-d) pose of object |
US12033352B2 (en) * | 2020-12-03 | 2024-07-09 | Tata Consultancy Limited Services | Methods and systems for generating end-to-end model to estimate 3-dimensional(3-D) pose of object |
Also Published As
Publication number | Publication date |
---|---|
CN119013689A (en) | 2024-11-22 |
US12190537B2 (en) | 2025-01-07 |
WO2023196057A1 (en) | 2023-10-12 |
EP4505393A1 (en) | 2025-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113643378B (en) | Active rigid body pose positioning method in multi-camera environment and related equipment | |
CN111328396B (en) | Pose estimation and model retrieval for objects in images | |
Guerry et al. | Snapnet-r: Consistent 3d multi-view semantic labeling for robotics | |
Anguelov et al. | Discriminative learning of markov random fields for segmentation of 3d scan data | |
Whelan et al. | Real-time large-scale dense RGB-D SLAM with volumetric fusion | |
CN113689578B (en) | Human body data set generation method and device | |
US10872227B2 (en) | Automatic object recognition method and system thereof, shopping device and storage medium | |
US11222237B2 (en) | Reinforcement learning model for labeling spatial relationships between images | |
US20230274400A1 (en) | Automatically removing moving objects from video streams | |
EP3326156B1 (en) | Consistent tessellation via topology-aware surface tracking | |
CN116997941A (en) | Keypoint-based sampling for pose estimation | |
KR20200080970A (en) | Semantic segmentation method of 3D reconstructed model using incremental fusion of 2D semantic predictions | |
US12190537B2 (en) | Repairing image depth values for an object with a light absorbing surface | |
CN115564639A (en) | Background blurring method and device, computer equipment and storage medium | |
Zhang et al. | End-to-end learning of self-rectification and self-supervised disparity prediction for stereo vision | |
US12086965B2 (en) | Image reprojection and multi-image inpainting based on geometric depth parameters | |
CN116188573A (en) | Object gesture recognition method, object gesture recognition device, computer equipment, storage medium and product | |
US20250139873A1 (en) | Generation of Dense Three-Dimensional Point Clouds | |
Liang et al. | Learning cross-modality interaction for robust depth perception of autonomous driving | |
US20240362802A1 (en) | Systems and methods for determining motion models for aligning scene content captured by different image sensors | |
US20240362891A1 (en) | Systems and methods for selecting motion models for aligning scene content captured by different image sensors | |
Peng et al. | Vision-Based Manipulator Grasping | |
Pham | Integrating a Neural Network for Depth from Defocus with a Single MEMS Actuated Camera | |
Dai et al. | Bidirectional joint estimation of optical flow and scene flow with Gaussian-guided attention | |
Rojtberg | Automation for camera-only 6D object detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHEN, JINGJING;WOOD, ERROLL WILLIAM;RAZUMENIC, IVAN;AND OTHERS;SIGNING DATES FROM 20220404 TO 20221205;REEL/FRAME:062414/0621 |
|
AS | Assignment |
Owner name: MICROSOFT RESEARCH LIMITED, WASHINGTON Free format text: EMPLOYMENT AGREEMENT;ASSIGNOR:SHARP, TOBY;REEL/FRAME:065019/0912 Effective date: 20050630 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT RESEARCH LIMITED;REEL/FRAME:065140/0993 Effective date: 20231003 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |