+

WO2011080669A1 - System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method - Google Patents

System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method Download PDF

Info

Publication number
WO2011080669A1
WO2011080669A1 PCT/IB2010/056004 IB2010056004W WO2011080669A1 WO 2011080669 A1 WO2011080669 A1 WO 2011080669A1 IB 2010056004 W IB2010056004 W IB 2010056004W WO 2011080669 A1 WO2011080669 A1 WO 2011080669A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
scene
range image
cost
Prior art date
Application number
PCT/IB2010/056004
Other languages
French (fr)
Inventor
Tomer Avidor
Gil Briskin
Omri Peleg
Original Assignee
Rafael Advanced Defense Systems Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rafael Advanced Defense Systems Ltd. filed Critical Rafael Advanced Defense Systems Ltd.
Publication of WO2011080669A1 publication Critical patent/WO2011080669A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • G06T7/596Depth or shape recovery from multiple images from stereo images from three or more stereo images

Definitions

  • the present embodiment generally relates to the field of image processing and computer vision, and in particular, concerns a system and method for reconstruction of range images.
  • 3D models are used in many applications including automatic analysis of a scene, understanding navigability of a scene, visibility of points from different areas, and segmentation of 3D models (both in civilian and military applications).
  • One popular application uses a three-dimensional model to generate a view of a scene for a user.
  • One or more cameras, or more generally image capture devices can capture i mages of a real scene.
  • the application must render a view of the scene from the desired viewpoint of the user.
  • a variety of conventional techniques exists for generating views from a new viewpoint, also known as a virtual location, or a virtual camera angle.
  • One of the challenges in generating new views from images is accurately rendering objects in a view from a new (virtual) camera angle.
  • Using a three-dimensional model of the scene to be rendered is a known method for improving view generation.
  • Techniques for using a three- dimensional model to facilitate the generation of views of a scene are known in the industry.
  • Range images can also be used for many other applications.
  • 3D model refers to a model that includes descriptions of surfaces.
  • range image also known as a range map, or 3D map, refers to a collection of information including ranges from a viewpoint to points in a scene.
  • LIDAR Light Detection and Ranging
  • a known approach to generating 3D models is to use feature identification from one image to other images and then use linear or non-linear multi-view triangulation of the feature to determine the distance from the reference camera location to the feature.
  • the images are correlated by tracking features such as corner points (edges with gradients in multiple directions) from one image to the next.
  • the feature trajectories over time are then used to reconstruct the 3D positions of the features.
  • Triangulation using two or more camera views provided 3D positions for the features, and these 3D positions are combined to create a 3D model or a range image.
  • Disadvantages of feature matching and triangulation techniques include known limitations when a scene (and the corresponding 2D images of the scene) has smooth areas and/or areas of low texture.
  • Feature correlation generally fails for smooth and low texture areas, resulting in the failure of this family of algorithms to reconstruct the 3D positions of smooth and low texture points in the scene.
  • Correlation is also dependant on the size of the aperture window.
  • the aperture window includes multiple pixels to facilitate correlation of areas between images. A consequence of this correlation aperture is the loss of accuracy in reconstruction of an object, particularly the edges of objects, and possible fattening of objects.
  • Optical flow is an approximation to image motion, defined as the projection of velocities of 3D surface points onto the imaging plane of a visual sensor.
  • 2D image motion is the projection of the 3D motion of objects, relative to a visual sensor, onto the 2D image plane.
  • Sequences of time-ordered images allow the estimation of projected 2D image motion as either instantaneous image velocities or discrete image displacements. These are usually called the optical flow field or the image velocity field.
  • optical flow may then be used to generate a 3D model of the surface structure (shape or relative depth) through assumptions concerning the structure of the optical flow field, the 3D environment, and the motion of the sensor.
  • the optical flow field together with the known camera angles of the two images, is used to create a range map (or a 3D point cloud) using triangulation techniques.
  • optical flow is a solution to the problem of determining how points (pixels) move in relation to each other in a pair of images.
  • optical flow The motion of pixels in a 2D image, in other words motion in the (x, y) plane, is determined.
  • Advantages of optical flow include being able to reconstruct the 3D positions of smooth and low texture areas in a scene.
  • 3D information for smooth and low texture areas is derived from the 3D information of features relatively local to the smooth and low texture areas.
  • Disadvantages of optical flow include the sensitivity of optical flow techniques to large differences between the input images, and being limited to processing using only one pair of images at a time. Optical flow techniques are not able to use optimally all of the available images.
  • Active techniques require the generation of a known input, such as radio waves or light, to be captured and analyzed to generate 3D position information of the points in a scene, generally in the form of a range image.
  • Advantages of active techniques include reconstruction of 3D positions of smooth and low texture areas of a scene.
  • Disadvantages of active techniques include having to provide an active input and lack of texture information (necessary for generating novel views of the scene).
  • Techniques for generating 3D models from range images are known in the art.
  • a method for generating range images including the steps of: providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the 2D images of the scene including texture values associated with pixels of the 2D image; defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost- functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with the viewing direction; and updating the generated range image using a variational method including: calculating a cost
  • the method defines the reference 2D image of the scene by choosing an image from the plurality of 2D images of a scene
  • the method defines the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using optical flow on images from the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene. In another optional embodiment, the method defines the initial generated range image by assigning a uniform range value to all points in the initial generated range image.
  • SFM structure from motion
  • the method defines the initial generated range image by using optical flow on images from the plurality of 2D images of a scene.
  • the method defines the initial generated range image by using
  • the cost- functional further includes a second cost sub- functional that processes the generated range image to calculate a smoothness value.
  • the method further includes updating the generated range image when additional 2D images become available.
  • the range image is used to generate a three-dimensional (3D) model of the scene.
  • the method is repeated, defining different reference 2D images thereby generating a plurality of range images.
  • one or more of the plurality of range images are used to generate a three-dimensional (3D) model of the scene.
  • the method further includes updating the 3D model when additional 2D images become available.
  • a system for generating range images including: one or more image providing devices configured for providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the plurality of 2D images of a scene including texture values associated with pixels of the 2D image; a processing system containing one or more processors configured for: defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost-functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with
  • the one or more image providing devices includes a digital picture camera. In another optional embodiment, the one or more image providing devices includes a digital video camera. In another optional embodiment, the one or more image providing devices includes a storage system. In an optional embodiment, the plurality of 2D images of a scene are infrared (IR) images.
  • IR infrared
  • the processing system is configured to define the reference
  • the processing system is configured to define the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene.
  • the processing system is configured to define the initial generated range image by using optical flow on images from the plurality of 2D images of a scene.
  • the processing system is configured, to define the initial, generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene.
  • SLAM simultaneous location and mapping
  • the processing system is configured to define the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene.
  • the processing system is configured to define the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene.
  • DTM digital terrain map
  • the processing system is configured to define the initial generated range image by assigning a uniform range value to all points in the initial generated range image.
  • the processing system is configured to provide a second cost sub-functional that processes the generated range image to calculate a smoothness value.
  • the processing system is configured to update the generated range image when additional 2D images become available
  • the processing system is further configured to generate a three-dimensional (3D) model of the scene.
  • the processing is repeated defining different reference 2D images thereby generating a plurality of range images
  • the processing system is further configured to generate a three-dimensional (3D) model of the scene using one or more of the plurality of range images.
  • processing system is further configured to update the 3D model when additional 2D images become available.
  • FIGURE 1 a flowchart of a method for reconstruction of range images from multiple two-dimensional images using a range based variational method.
  • FIGURE 2 an example diagram of a cost sub-functional with a constraint on the data.
  • FIGURE 3 a diagram of a system for generating range images.
  • One embodiment of a system and method for reconstruction of range images from multiple two-dimensional (2D) images using a range based variational method overcomes the limitations of conventional techniques and provides increased success in generation of range images from 2D images.
  • This innovative technique generates range images similar to LIDAR generated range images but does not require the use of an active input, in contrast using passively collected 2D images. Additionally this technique has access to the original texture in the 2D images, in contrast to LIDAR that does not provide texture information.
  • This technique is not limited to only processing a pair of images at a time, in contrast processing multiple images.
  • This technique is not limited to implementations requiring a local aperture window, in contrast being able to compare single pixels hence improving processing and reconstruction of the 3D positions of edges in a scene.
  • An additional advantage is that this method can be implemented without requiring non-volatile storage of data during processing, facilitating implementation on a graphics card or similar hardware. The range images can then be used to generate a 3D model of the scene.
  • 2D images 300 of a scene are optionally preprocessed 302 and camera angles are calculated 304.
  • the 2D images are chosen 306 key frames 308.
  • From the 2D images or using another method is defined 307 a reference 2D image 308.
  • a generated range image is generated 310 by first defining 312 an initial generated range image associated with the reference 2D image.
  • the initial generated range image includes depth information for at least a portion of the pixels of the reference 2D image.
  • a cost-functional is provided to calculate a cost 314 reflecting how well the generated range image matches a set of criteria.
  • Each criterion is evaluated as a cost sub-functional of the cost-functional, together with a factor that reflects the criterion's importance to the overall cost.
  • a variational method is used to find an optimal generated range image in regards to this cost.
  • the variational method includes repeatedly updating 316 the generated range image and re-calculating 314 the cost using the cost-functional until a given stopping criterion is reached.
  • the resulting generated range image for the reference 2D image 318 can be post-processed 320 to provide a final 3D model 322.
  • Each of the plurality of 2D images 300 of a scene includes texture values associated with pixels of the 2D image.
  • texture refers to a value corresponding to the image content of the pixel, in other words representing the intensity of a surface, or representing how the pixel can be viewed.
  • texture corresponds to the color or grayscale (of the pixel) in the form of a red-green-blue (RGB) value, although other image content such as infrared (IR) can be represented.
  • the 2D images may be optionally preprocessed 302, including changing the data format, size, normalization, calculating image related information, and other image processing necessary to prepare the image or related information.
  • the 2D images can be provided with associated camera angle information.
  • camera angle information can include information such as the position and orientation of the image capture device (camera) in relation to the scene being imaged.
  • a camera angle viewing direction
  • camera angles can be generated using known techniques such as structure from motion (SFM) techniques.
  • SFM structure from motion
  • Key frames are chosen in block 306 and a reference 2D image is defined in block 307.
  • key frames are 2D images chosen, based on a criteria, from the plurality of 2D images, for further processing.
  • criteria can include an image baseline, image similarity, the difference between images, and computation limitations.
  • a reference 2D image of the scene is defined, the reference 2D image including texture values associated with pixels of the 2D image.
  • a preferred implementation is to choose the reference 2D image from the plurality of 2D images. Testing has shown that a preferable
  • implementation is for the reference 2D image to be a middle frame from a sequence of video images or a middle image from a range of still cameras.
  • the key frames and the reference 2D image 308 can be used to generate a generated range image 310.
  • an initial generated range image is defined 31.2 using the reference 2D image and at least one of the key 2D images.
  • the generated range image includes depth information for pixels of at least a portion of the reference 2D image.
  • the initial generated range image can be from an existing camera angle or from a new
  • SFM structure from motion
  • SFM post-processing can be used to increase range image detail.
  • Optical flow, linear triangulation, and non-linear triangulation are other conventional techniques that can be used to generate a range image.
  • the technique of simultaneous location and mapping (SLAM) can be used to generate a range image of an unknown environment (without a priori knowledge) or a known environment (with a priori knowledge) while at the same time keeping track of the current location,
  • SLAM simultaneous location and mapping
  • a simple technique for calculating an initial range image to initialize the algorithm is to use a range estimate.
  • Another technique is to calculate an average plane from sparse feature matches (for example, given a sparse matching of points between two images, calculate a plane in three dimensions that best approximates the matches).
  • a disadvantage of calculating an average plane is the lack of 3D geometry in the generated range image.
  • the position information can be used to calculate an initial range image by calculating the range of each pixel to a provided digital terrain map.
  • Another technique is to interpolate a sparse range map (a range map calculated on sparsely selected points) to generate a dense range map.
  • a cost is calculated 314 for the range image to provide a cost (score) for the range image.
  • a variational method is a term known in the field and refers to finding the parameters that minimize a cost- functional.
  • the parameter of interest is the range image (which gives a range for each pixel)
  • the cost-functional is a provided cost-functional.
  • Variational methods are used to solve the minimization problem for the cost-functional, which includes one or more cost sub-functionals to find the generated range image (which in this case may be more easily understood as a 3D map or even more simply as a collection of pixel depths) that minimizes the cost.
  • the cost-functional can be expressed as:
  • L is a cost- functional calculated on a range image "r”.
  • '.Ld is one of the at least one cost sub-functionals with a constraint on the data, in this case the grey scale data, or texture, as will be described below, "a” is a scale factor (weight).
  • Ls is an optional second cost sub-functional with a constraint on smoothness, as will be described below. Additional scale factors and cost sub-functionals can be added to the cost-functional as appropriate to the application. The calculated costs of at least two cost sub-functionals are combined to generate a cost (score) for the current range image. Multiple images can be used to calculate a cost for a pixel.
  • One technique for finding a minimum to a cost-functional is to transform the cost- functional into a set of equations, and then solve the set of equations for the parameters that minimize the cost-functional.
  • the set of equations is commonly reached by the Buler- LaGrange method.
  • common practice is to solve the set of equations iteratively. Iteratively solving the set of equations can be done by linearizing the equations (as necessary) using a technique such as Newton's method, and solving the linear set of equations by using an iterative solver, for example Jacobi or Gauss-Seidel. Techniques of solving functions to find a local minimum are known in the art and other techniques can be used as appropriate to the application.
  • the range image is then updated 316 and a new cost calculated 314 using the provided cost-functional for the updated generated range image.
  • How the generated range image is updated depends on the application. Experimentation has shown that knowing how to update the generated range image to improve the cost may not be possible initially.
  • the updates and costs can be used to determine a more methodical approach to updates. Updating of the generated range image and calculating of a cost are repeated until a given stopping criteria is reached. This stopping criterion can be provided, determined by the specific application of the method, or determined manually by examining costs and/or the range image as costs and/or the range image are calculated and updated.
  • the value of a cost (the score) is only relevant in relation to the value of other costs.
  • the comparison of the costs will indicate the improvement or degradation in the updated range image.
  • the stopping criteria is that the difference between the previous range map and the updated range map is within a given limit.
  • the stopping a criterion is that the cost has decreased significantly compared to the initial cost.
  • the provided cost sub-functional uses the depth information in the generated range image to project the pixels in the generated range image onto an image other than the reference 2D image. Then the texture values of the pixels in the reference 2D image are compared to the texture value of the projected pixels in the one or more 2D images other than the reference 2D image.
  • a scene 401 contains an object 416.
  • a point P2 in the scene is associated with a data point P1 in reference 2D image 410 and a corresponding point in the generated range image.
  • Data point P1 also known as pixel PI, includes information from the generated range image including the depth 418 from camera angle 400 and information from the reference 2D image including the texture or grayscale of point P2.
  • this is an example of a cost sub-functional with a constraint on the data.
  • the camera angle 400 for reference 2D image 410 can be an existing camera angle associated with a provided 2D image, or the reference 2D image can be created from a virtual camera angle. The calculation of this cost sub-functional starts by using the known camera angle 400 of the reference 2D image 410 and the calculated depth 418 of a pixel PI in the generated range image.
  • a pixel P1 in the reference 2D image 410 can be projected onto a 2D image 412 as pixel P3.
  • the texture value of pixel PI in the reference 2D image 410 is compared to the texture value of pixel P3 in the 2D image 412, and the difference between the two textures is used as a value in the cost sub-functional.
  • a particularly successful implementation is using a grayscale to represent the pixels.
  • the calculation and comparison is repeated for other pixels in the reference 2D image.
  • the calculation and comparison is also repeated for additional images (for example, image 414 from camera angle 404).
  • Using multiple images provides additional information for each pixel, including multiple camera angles to increase robustness of the calculation, facilitating reducing the errors and noise inherent in determining the depth of a pixel in the generated range image.
  • the cost sub-functional with a constraint on the data further includes comparing the texture values of all possible pairs of 2D images from the plurality of 2D images of a scene.
  • the cost sub- functional with a constraint on the data can be calculated using other methods of comparing the texture values of 2D images from the plurality of 2D images of a scene.
  • the value of a cost sub-functional with a constraint on the data can be calculated by simply summing the differences in the values of the textures for corresponding pixels. Other methods for calculating the value of a cost function can be used depending on the specific application.
  • a provided cost sub-functional has a constraint on smoothness and compares the depth information for each of the pixels in the generated range image for piecewise smoothness. The calculation of this cost sub-functional compares the depth information of a pixel in the current generated range image to the depth information of neighboring pixels in the same current generated range image. Conventionally, a similar type of comparison is done by optical flow techniques to enforce smoothness on the motion field that describes the 2D movement of pixels between images.
  • the comparison is referred to as piecewise smooth because edges of objects in the range image are areas where the depth of neighboring pixels does not change smoothly, rather there is an abrupt change of depth at an edge.
  • the smoothness cost sub-functional can be defined as piecewise smooth, totally smooth, or other types of smoothness appropriate for the particular application.
  • the closeness in depth of a pixel to the neighbors of the pixel is used as a value in the cost sub-functional.
  • the comparison is repeated for other pixels in the current range image. Smoothness of depth can be used to correct noise and enforce constraints on the generated range image.
  • optical flow techniques do not use aperture windows, in contrast looking at individual pixels. Pixel comparison, improves the accuracy in reconstruction of an object, particularly the edges of objects, and reduces possible fattening of objects.
  • the value of a cost sub-functional with a constraint on the smoothness can be calculated by simply summing the differences in the values of the depths for neighboring pixels. Other methods for calculating the value of a cost function can be used depending on the specific application. Convergence of the range image is limited by practical considerations. Convergence of a majority of the range image is normally desirable, and practically not all of the range image will converge.
  • the resulting range images 318 can be post-processed 320 by conventional means depending on the application.
  • additional 2D images are provided and used to update the generated range image.
  • the above-described method is repeated on the provided 2D images to generate a plurality of range images.
  • the plurality of range images are processed to combine multiple range images of a scene, or of portions of a scene, to produce a single 3D model.
  • range image merging refer to Poisson Surface Reconstruction by Michael Kazhdan, Matthew Bolitho and Hugues Hoppe, Eurographics Symposium on Geometry Processing (2006) and A Volumetric Method for Building Complex Models from Range images, Brian Curless and Marc Levoy, Stanford University, presented at SIGGRAPH '96.
  • Another non-limiting example is the case where the location of the range image in the world is provided, and the range image is used to generate a 3D map, which can be in the form, of a digital terrain map (DTM) or digital surface map (DSM).
  • DTM digital terrain map
  • DSM digital surface map
  • the results of postprocessing 320 include a 3D model 322 and/or other results depending on the application.
  • additional 2D images are provided and used to update or extend the 3D model.
  • New range images can also be used to increase the level of detail of a 3D model.
  • a processing system 502 includes one or more processors 504 configured with a variety of processing modules, depending on the implementation.
  • 2D images 500A, 500B, 500C are sent to an image-preprocessing module 506.
  • an image selection module 508 chooses key frames and provides a reference 2D image of the scene.
  • a range image generation module 510 generates a range image that can be optionally post-processed in post-processing module 512. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.
  • One or more image providing devices are configured for providing a plurality of 2D images of a scene from a plurality of camera angles, each of the 2D images of a scene including pixels and associated texture values.
  • the image providing device is a digital picture camera 500A providing still images.
  • the image capture device is a digital video camera 500B providing video images.
  • the amount of provided video images is greater than required for the specific application., the video images can be decimated as appropriate for the given application.
  • One or more image providing devices can provide the 2D images simultaneously.
  • An image-providing device is using a camera function on a cellular phone to capture an image.
  • the 2D images are provided from storage 500C.
  • the 2D images are provided from a combination of sources.
  • the types of images include, but are not limited to visible and infrared (IR), from sources including, but not limited to aerial photographs, video from vehicle mounted cameras, photographs taken from street-level, and satellite imagery (raw images with projection information or
  • 2D images are sent to a processing system 502 containing one or more processors 504 configured with a variety of processing modules, depending on the implementation.
  • 2D images can be preprocessed by image preprocessing module 506, as described above.
  • an image selection module 508 chooses key frames and provides a reference 2D image of the scene.
  • the reference 2D image of the scene includes pixels and associated texture values.
  • the chosen 2D images and reference 2D image are used by the range image generation module 510 to calculate a generated range image.
  • An initial generated range image is provided which includes pixels and associated depth information.
  • the pixels of the generated range image correspond to the pixels of the reference 2D image.
  • image is used for clarity in this description, it should be understood that image also refers to a portion of the image, subsaraple of the image, or other sub-set of data from the 2D image.
  • the reference 2D image of the scene is provided by choosing an image from the plurality of 2D images of a scene.
  • the initial generated range image can be defined using a variety of techniques depending on the application.
  • the initial generated range image is defined by using a structure from motion (SFM) technique with the provided 2D images.
  • the initial generated range image is defined by using optical flow with the 2D images.
  • the initial generated range image is defined by using the technique of simultaneous location and mapping (SLAM) with the 2D images.
  • SLAM simultaneous location and mapping
  • the initial generated range image is defined by calculating an average plane from sparse feature matches of the 2D images.
  • the position and orientation of the initial generated range image is known in world coordinates, a DTM or DSM is given, and the initial generated range image is calculated using range calculations to the DTM or DSM.
  • a cost is calculated for the initial generated range image using a variational method with at least one cost sub-functional, as described above.
  • the at least one cost sub-functional further includes a cost sub-functional that compares the depth information of the pixels from the generated range image to calculate a smoothness value.
  • the costs for all the cost sub-functionals are combined to provide a cost (score) for the initial generated range image.
  • the initial generated range image is then updated based on the calculated cost and a new cost (score) calculated on the updated generated range image.
  • the process of updating the generated range image and calculating a new cost is repeated until a given stopping criterion is reached, as described above.
  • additional 2D images are provided and the processing system uses the additional 2D images to update the generated range image.
  • the above-described process is repeated on the provided 2D images to generate a plurality of generated range images.
  • a postprocessing module 512 is configured to use one or more generated range images to generate a three-dimensional (3D) model of the scene.
  • additional 2D images are provided and used to update the 3D model. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.
  • modules and processing are possible, depending on the application.
  • multiple image preprocessing modules 506 can be used, where each image-preprocessing module configured to process one or more types of 2D images.
  • the image selection module 508 can be implemented as two modules, where a first module chooses key frames and a second module provides a reference 2D image of the scene.
  • image preprocessing can be done after the 2D images have been selected 508. Based on this description, further variations will be obvious to one skilled in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Reconstruction of range images from multiple two-dimensional images uses 2D images of a scene, defining a reference 2D image and then defining an initial generated range image associated with the reference 2D image. The initial generated range image includes depth information for at least a portion of the pixels of the reference 2D image. A cost-functional is provided to calculate a cost reflecting how well the generated range image matches a set of criteria. Each criterion is evaluated as a cost sub-functional of the cost-functional, together with a factor that reflects the criterion's importance to the overall cost. A variational method is used to find an optimal generated range image in regards to this cost. The variational method includes repeatedly updating the generated range image and re-calculating the cost using the cost-functional until a given stopping criterion is reached.

Description

System and method for reconstruction of range images from multiple two- dimensional images using a range based variational method
FIELD OF THE INVENTION
The present embodiment generally relates to the field of image processing and computer vision, and in particular, concerns a system and method for reconstruction of range images.
BACKGROUND OF THE INVENTION
In the field of image processing, generating three-dimensional (3D) models has many uses. 3D models are used in many applications including automatic analysis of a scene, understanding navigability of a scene, visibility of points from different areas, and segmentation of 3D models (both in civilian and military applications). One popular application uses a three-dimensional model to generate a view of a scene for a user. One or more cameras, or more generally image capture devices, can capture i mages of a real scene. In a case where the user desires to view the scene from an angle other than the angle from which the original images were captured, the application must render a view of the scene from the desired viewpoint of the user. A variety of conventional techniques exists for generating views from a new viewpoint, also known as a virtual location, or a virtual camera angle. One of the challenges in generating new views from images is accurately rendering objects in a view from a new (virtual) camera angle. Using a three-dimensional model of the scene to be rendered is a known method for improving view generation. Techniques for using a three- dimensional model to facilitate the generation of views of a scene are known in the industry.
Given the existence of well-known techniques to combine range images to create three-dimensional (3D) models, a challenge is to generate more accurate range images from two-dimensional (2D) images. Range images can also be used for many other applications. In the context of this document, the term "3D model" refers to a model that includes descriptions of surfaces. The term "range image", also known as a range map, or 3D map, refers to a collection of information including ranges from a viewpoint to points in a scene. One non-limiting example of a range image is the output of a LIDAR (Light Detection and Ranging) system that generates ranges from the viewpoint of the LIDAR to objects in the scene being captured. A known approach to generating 3D models is to use feature identification from one image to other images and then use linear or non-linear multi-view triangulation of the feature to determine the distance from the reference camera location to the feature. The images are correlated by tracking features such as corner points (edges with gradients in multiple directions) from one image to the next. The feature trajectories over time are then used to reconstruct the 3D positions of the features. Triangulation using two or more camera views provided 3D positions for the features, and these 3D positions are combined to create a 3D model or a range image. An advantage of feature matching and triangulation techniques is being able to use multiple images to improve processing and reconstruction of the 3D positions of features in a scene. Disadvantages of feature matching and triangulation techniques include known limitations when a scene (and the corresponding 2D images of the scene) has smooth areas and/or areas of low texture. Feature correlation generally fails for smooth and low texture areas, resulting in the failure of this family of algorithms to reconstruct the 3D positions of smooth and low texture points in the scene. Correlation is also dependant on the size of the aperture window. The aperture window includes multiple pixels to facilitate correlation of areas between images. A consequence of this correlation aperture is the loss of accuracy in reconstruction of an object, particularly the edges of objects, and possible fattening of objects.
Another known approach for 3D reconstruction is to use optical flow techniques. Optical flow is an approximation to image motion, defined as the projection of velocities of 3D surface points onto the imaging plane of a visual sensor. In other words, 2D image motion is the projection of the 3D motion of objects, relative to a visual sensor, onto the 2D image plane. Sequences of time-ordered images allow the estimation of projected 2D image motion as either instantaneous image velocities or discrete image displacements. These are usually called the optical flow field or the image velocity field. Provided that optical flow is a reliable approximation to 2D image motion, optical flow may then be used to generate a 3D model of the surface structure (shape or relative depth) through assumptions concerning the structure of the optical flow field, the 3D environment, and the motion of the sensor. Refer to The Computation of Optical Flow - Beauchemin and Barron, ACM Computing Surveys, Vol 27, No. 3, September 1995 for additional background information on optical flow. The optical flow field, together with the known camera angles of the two images, is used to create a range map (or a 3D point cloud) using triangulation techniques. By definition, optical flow is a solution to the problem of determining how points (pixels) move in relation to each other in a pair of images. The motion of pixels in a 2D image, in other words motion in the (x, y) plane, is determined. Advantages of optical flow include being able to reconstruct the 3D positions of smooth and low texture areas in a scene. 3D information for smooth and low texture areas is derived from the 3D information of features relatively local to the smooth and low texture areas. Disadvantages of optical flow include the sensitivity of optical flow techniques to large differences between the input images, and being limited to processing using only one pair of images at a time. Optical flow techniques are not able to use optimally all of the available images.
Other known approaches for 3D reconstruction includes using RADAR (RAdio
Detection and Ranging), LIDAR (Light Detection and Ranging), and structured lighting to generate range images. These approaches are also known as active techniques. Active techniques require the generation of a known input, such as radio waves or light, to be captured and analyzed to generate 3D position information of the points in a scene, generally in the form of a range image. Advantages of active techniques include reconstruction of 3D positions of smooth and low texture areas of a scene. Disadvantages of active techniques include having to provide an active input and lack of texture information (necessary for generating novel views of the scene). Techniques for generating 3D models from range images are known in the art.
It is desirable to have a system and method for reconstruction of 3D models of a scene, particularly in cases where the scene includes smooth areas and/or areas of low texture. It is further desirable for this system and method to be able to reconstruct 3D models passively from 2D images of a scene. SUMMARY
According to the teachings of the present embodiment there is provided a method for generating range images, including the steps of: providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the 2D images of the scene including texture values associated with pixels of the 2D image; defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost- functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with the viewing direction; and updating the generated range image using a variational method including: calculating a cost for the cost-functional and updating the generated range image based on the calculated cost until a given stopping criterion has been reached.
In an optional embodiment, the method defines the reference 2D image of the scene by choosing an image from the plurality of 2D images of a scene,
In an optional embodiment, the method defines the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using optical flow on images from the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene. In another optional embodiment, the method defines the initial generated range image by assigning a uniform range value to all points in the initial generated range image.
In an optional embodiment, the cost- functional further includes a second cost sub- functional that processes the generated range image to calculate a smoothness value.
In an optional embodiment, the method further includes updating the generated range image when additional 2D images become available.
In an optional embodiment, the range image is used to generate a three-dimensional (3D) model of the scene.
In an optional embodiment, the method is repeated, defining different reference 2D images thereby generating a plurality of range images.
In an optional embodiment, one or more of the plurality of range images are used to generate a three-dimensional (3D) model of the scene. In an optional embodiment, the method further includes updating the 3D model when additional 2D images become available.
According to the teachings of the present embodiment there is provided a system for generating range images, including: one or more image providing devices configured for providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the plurality of 2D images of a scene including texture values associated with pixels of the 2D image; a processing system containing one or more processors configured for: defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost-functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with the viewing direction; and updating the generated range image using a variational method including: calculating a cost for the cost-functional and updating the generated range image based on the calculated cost until a given stopping criterion has been reached.
In an optional embodiment, the one or more image providing devices includes a digital picture camera. In another optional embodiment, the one or more image providing devices includes a digital video camera. In another optional embodiment, the one or more image providing devices includes a storage system. In an optional embodiment, the plurality of 2D images of a scene are infrared (IR) images.
In an optional embodiment, the processing system is configured to define the reference
2D image of the scene by choosing an image from the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by using optical flow on images from the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured, to define the initial, generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene. In another optional embodiment, the processing system is configured to define the initial generated range image by assigning a uniform range value to all points in the initial generated range image.
In an optional embodiment, the processing system is configured to provide a second cost sub-functional that processes the generated range image to calculate a smoothness value.
In an optional embodiment, the processing system is configured to update the generated range image when additional 2D images become available
In an optional embodiment, the processing system is further configured to generate a three-dimensional (3D) model of the scene.
In an optional embodiment, the processing is repeated defining different reference 2D images thereby generating a plurality of range images
In an optional embodiment, the processing system is further configured to generate a three-dimensional (3D) model of the scene using one or more of the plurality of range images.
In an optional embodiment, the processing system is further configured to update the 3D model when additional 2D images become available.
BRIEF DESCRIPTION OF FIGURES
The embodiment is herein described, by way of example only, with reference to the accompanying drawings, wherein:
FIGURE 1, a flowchart of a method for reconstruction of range images from multiple two-dimensional images using a range based variational method.
FIGURE 2, an example diagram of a cost sub-functional with a constraint on the data.
FIGURE 3, a diagram of a system for generating range images.
DETAILED DESCRIPTION
The principles and operation of this system and method according to the present embodiment may be better understood with reference to the drawings and the accompanying description. One embodiment of a system and method for reconstruction of range images from multiple two-dimensional (2D) images using a range based variational method overcomes the limitations of conventional techniques and provides increased success in generation of range images from 2D images. This innovative technique generates range images similar to LIDAR generated range images but does not require the use of an active input, in contrast using passively collected 2D images. Additionally this technique has access to the original texture in the 2D images, in contrast to LIDAR that does not provide texture information. This technique is not limited to only processing a pair of images at a time, in contrast processing multiple images. This technique is not limited to implementations requiring a local aperture window, in contrast being able to compare single pixels hence improving processing and reconstruction of the 3D positions of edges in a scene. An additional advantage is that this method can be implemented without requiring non-volatile storage of data during processing, facilitating implementation on a graphics card or similar hardware. The range images can then be used to generate a 3D model of the scene.
Conventional feature matching and triangulation techniques do not successfully process smooth and low-texture areas in the input 2D images, and the resulting output has "holes" or non-value areas. In contrast, using an implementation of the present description, smooth and low-texture areas in the input 2D images are processed successfully and grayscale areas are can be determined for all surfaces.
Referring to FIGURE 1 , a flowchart of a method for reconstruction of range images from multiple two-dimensional images using a range based variational method, 2D images 300 of a scene are optionally preprocessed 302 and camera angles are calculated 304. From, the 2D images are chosen 306 key frames 308. From the 2D images or using another method is defined 307 a reference 2D image 308. A generated range image is generated 310 by first defining 312 an initial generated range image associated with the reference 2D image. The initial generated range image includes depth information for at least a portion of the pixels of the reference 2D image. A cost-functional is provided to calculate a cost 314 reflecting how well the generated range image matches a set of criteria. Each criterion is evaluated as a cost sub-functional of the cost-functional, together with a factor that reflects the criterion's importance to the overall cost. A variational method is used to find an optimal generated range image in regards to this cost. The variational method includes repeatedly updating 316 the generated range image and re-calculating 314 the cost using the cost-functional until a given stopping criterion is reached. The resulting generated range image for the reference 2D image 318 can be post-processed 320 to provide a final 3D model 322. Each of the plurality of 2D images 300 of a scene includes texture values associated with pixels of the 2D image. In the context of this document, texture refers to a value corresponding to the image content of the pixel, in other words representing the intensity of a surface, or representing how the pixel can be viewed. Typically, texture corresponds to the color or grayscale (of the pixel) in the form of a red-green-blue (RGB) value, although other image content such as infrared (IR) can be represented. The 2D images may be optionally preprocessed 302, including changing the data format, size, normalization, calculating image related information, and other image processing necessary to prepare the image or related information. The 2D images can be provided with associated camera angle information. In this context, camera angle information, or more simply the camera angle, also known as the viewing direction, can include information such as the position and orientation of the image capture device (camera) in relation to the scene being imaged. In this field, a camera angle (viewing direction) is generally provided with the image, but this provided camera angle is generally not sufficiently accurate for the calculations that need to be performed, and so the camera angle needs to be optionally calculated (corrected) 304. If camera angles are not provided with the images, camera angles can be generated using known techniques such as structure from motion (SFM) techniques. Techniques to calculate camera angles are known in the art. In a case where the camera is moving, or multiple cameras in known locations captures images of a scene, ego motion algorithms can be used to determine camera information from the images. The output of an ego motion algorithm includes the camera information associated with the input image, including the position and orientation of the camera relative to the scene.
Key frames are chosen in block 306 and a reference 2D image is defined in block 307. In this context, key frames are 2D images chosen, based on a criteria, from the plurality of 2D images, for further processing. Depending on the application, criteria can include an image baseline, image similarity, the difference between images, and computation limitations. A reference 2D image of the scene is defined, the reference 2D image including texture values associated with pixels of the 2D image. A preferred implementation is to choose the reference 2D image from the plurality of 2D images. Testing has shown that a preferable
implementation is for the reference 2D image to be a middle frame from a sequence of video images or a middle image from a range of still cameras.
The key frames and the reference 2D image 308 can be used to generate a generated range image 310. In one implementation, an initial generated range image is defined 31.2 using the reference 2D image and at least one of the key 2D images. The generated range image includes depth information for pixels of at least a portion of the reference 2D image. The initial generated range image can be from an existing camera angle or from a new
(virtual) camera angle. Techniques for generating a range image from 2D images are known in the art. One conventional technique is to use structure from motion (SFM) to generate the range image. SFM generates a sparse range image, and SFM post-processing can be used to increase range image detail. Optical flow, linear triangulation, and non-linear triangulation are other conventional techniques that can be used to generate a range image. The technique of simultaneous location and mapping (SLAM) can be used to generate a range image of an unknown environment (without a priori knowledge) or a known environment (with a priori knowledge) while at the same time keeping track of the current location, A simple technique for calculating an initial range image to initialize the algorithm is to use a range estimate. Another technique is to calculate an average plane from sparse feature matches (for example, given a sparse matching of points between two images, calculate a plane in three dimensions that best approximates the matches). A disadvantage of calculating an average plane is the lack of 3D geometry in the generated range image. In a case where sufficiently accurate position information (for example from a global positioning system [GPS]) is available, the position information can be used to calculate an initial range image by calculating the range of each pixel to a provided digital terrain map. Another technique is to interpolate a sparse range map (a range map calculated on sparsely selected points) to generate a dense range map.
Using the initial generated range image and at least one cost sub-functional, a cost is calculated 314 for the range image to provide a cost (score) for the range image. A variational method is a term known in the field and refers to finding the parameters that minimize a cost- functional. In this case, the parameter of interest is the range image (which gives a range for each pixel), and the cost-functional is a provided cost-functional. Variational methods are used to solve the minimization problem for the cost-functional, which includes one or more cost sub-functionals to find the generated range image (which in this case may be more easily understood as a 3D map or even more simply as a collection of pixel depths) that minimizes the cost. The cost-functional can be expressed as:
L(r) = Ld + aLs.
Where "L" is a cost- functional calculated on a range image "r". '.Ld" is one of the at least one cost sub-functionals with a constraint on the data, in this case the grey scale data, or texture, as will be described below, "a" is a scale factor (weight). "Ls" is an optional second cost sub-functional with a constraint on smoothness, as will be described below. Additional scale factors and cost sub-functionals can be added to the cost-functional as appropriate to the application. The calculated costs of at least two cost sub-functionals are combined to generate a cost (score) for the current range image. Multiple images can be used to calculate a cost for a pixel.
One technique for finding a minimum to a cost-functional is to transform the cost- functional into a set of equations, and then solve the set of equations for the parameters that minimize the cost-functional. The set of equations is commonly reached by the Buler- LaGrange method. In cases where the set of equations cannot be solved directly, common practice is to solve the set of equations iteratively. Iteratively solving the set of equations can be done by linearizing the equations (as necessary) using a technique such as Newton's method, and solving the linear set of equations by using an iterative solver, for example Jacobi or Gauss-Seidel. Techniques of solving functions to find a local minimum are known in the art and other techniques can be used as appropriate to the application.
The range image is then updated 316 and a new cost calculated 314 using the provided cost-functional for the updated generated range image. How the generated range image is updated depends on the application. Experimentation has shown that knowing how to update the generated range image to improve the cost may not be possible initially. After calculating a plurality of costs using a pre-defined update strategy, the updates and costs can be used to determine a more methodical approach to updates. Updating of the generated range image and calculating of a cost are repeated until a given stopping criteria is reached. This stopping criterion can be provided, determined by the specific application of the method, or determined manually by examining costs and/or the range image as costs and/or the range image are calculated and updated. Note that the value of a cost (the score) is only relevant in relation to the value of other costs. As generated range-images are updated and costs calculated, the comparison of the costs will indicate the improvement or degradation in the updated range image. When a generated range image has reached a given stopping criteria, the range image can be considered a successful solution. In one implementation, the stopping criteria is that the difference between the previous range map and the updated range map is within a given limit. In another implementation, the stopping a criterion is that the cost has decreased significantly compared to the initial cost.
Referring to FIGURE 2, an example diagram of a cost sub-functional with a constraint on the data, in one implementation, the provided cost sub-functional uses the depth information in the generated range image to project the pixels in the generated range image onto an image other than the reference 2D image. Then the texture values of the pixels in the reference 2D image are compared to the texture value of the projected pixels in the one or more 2D images other than the reference 2D image. A scene 401 contains an object 416. A point P2 in the scene is associated with a data point P1 in reference 2D image 410 and a corresponding point in the generated range image. Data point P1 also known as pixel PI, includes information from the generated range image including the depth 418 from camera angle 400 and information from the reference 2D image including the texture or grayscale of point P2. Note that this is an example of a cost sub-functional with a constraint on the data. Also, note that the camera angle 400 for reference 2D image 410 can be an existing camera angle associated with a provided 2D image, or the reference 2D image can be created from a virtual camera angle. The calculation of this cost sub-functional starts by using the known camera angle 400 of the reference 2D image 410 and the calculated depth 418 of a pixel PI in the generated range image. Given an image 412 and the associated camera angle 402, a pixel P1 in the reference 2D image 410 can be projected onto a 2D image 412 as pixel P3. The texture value of pixel PI in the reference 2D image 410 is compared to the texture value of pixel P3 in the 2D image 412, and the difference between the two textures is used as a value in the cost sub-functional. Experimentation has shown that a particularly successful implementation is using a grayscale to represent the pixels. The calculation and comparison is repeated for other pixels in the reference 2D image. The calculation and comparison is also repeated for additional images (for example, image 414 from camera angle 404). Using multiple images provides additional information for each pixel, including multiple camera angles to increase robustness of the calculation, facilitating reducing the errors and noise inherent in determining the depth of a pixel in the generated range image.
In another implementation, the cost sub-functional with a constraint on the data further includes comparing the texture values of all possible pairs of 2D images from the plurality of 2D images of a scene. Depending on the specific application, the cost sub- functional with a constraint on the data can be calculated using other methods of comparing the texture values of 2D images from the plurality of 2D images of a scene.
In one implementation, the value of a cost sub-functional with a constraint on the data can be calculated by simply summing the differences in the values of the textures for corresponding pixels. Other methods for calculating the value of a cost function can be used depending on the specific application. In another implementation, a provided cost sub-functional has a constraint on smoothness and compares the depth information for each of the pixels in the generated range image for piecewise smoothness. The calculation of this cost sub-functional compares the depth information of a pixel in the current generated range image to the depth information of neighboring pixels in the same current generated range image. Conventionally, a similar type of comparison is done by optical flow techniques to enforce smoothness on the motion field that describes the 2D movement of pixels between images. Defining a cost of smoothness (Ls in the above equation) is known from the field of optical flow. For a description of optical flow and smoothness constraints refer, for example, to B. Horn and B. Schunck, Determining optical flow, Artificial Intelligence, 17; 185-203, 1981 and T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High accuracy optic flow estimation based on a theory for warping, In T. Pajdla and J. Matas, editors, Computer Vision - ECCV 2004, volume 3024 of Lecture Notes in Computer Science, pages 25-36, Springer, Berlin, 2004. Comparing the depth information determines the smoothness of the change in depth for neighboring pixels. The comparison is referred to as piecewise smooth because edges of objects in the range image are areas where the depth of neighboring pixels does not change smoothly, rather there is an abrupt change of depth at an edge. Depending on the application, the smoothness cost sub-functional can be defined as piecewise smooth, totally smooth, or other types of smoothness appropriate for the particular application. The closeness in depth of a pixel to the neighbors of the pixel is used as a value in the cost sub-functional. The comparison is repeated for other pixels in the current range image. Smoothness of depth can be used to correct noise and enforce constraints on the generated range image. In addition, optical flow techniques do not use aperture windows, in contrast looking at individual pixels. Pixel comparison, improves the accuracy in reconstruction of an object, particularly the edges of objects, and reduces possible fattening of objects.
In one implementation, the value of a cost sub-functional with a constraint on the smoothness can be calculated by simply summing the differences in the values of the depths for neighboring pixels. Other methods for calculating the value of a cost function can be used depending on the specific application. Convergence of the range image is limited by practical considerations. Convergence of a majority of the range image is normally desirable, and practically not all of the range image will converge. The resulting range images 318 can be post-processed 320 by conventional means depending on the application. In an optional implementation, additional 2D images are provided and used to update the generated range image. In another optional implementation, the above-described method is repeated on the provided 2D images to generate a plurality of range images. In another optional implementation, the plurality of range images are processed to combine multiple range images of a scene, or of portions of a scene, to produce a single 3D model. For techniques for range image merging, refer to Poisson Surface Reconstruction by Michael Kazhdan, Matthew Bolitho and Hugues Hoppe, Eurographics Symposium on Geometry Processing (2006) and A Volumetric Method for Building Complex Models from Range images, Brian Curless and Marc Levoy, Stanford University, presented at SIGGRAPH '96. Another non-limiting example is the case where the location of the range image in the world is provided, and the range image is used to generate a 3D map, which can be in the form, of a digital terrain map (DTM) or digital surface map (DSM). The results of postprocessing 320 include a 3D model 322 and/or other results depending on the application. In another optional implementation, additional 2D images are provided and used to update or extend the 3D model. New range images can also be used to increase the level of detail of a 3D model.
Referring to FIGURE 3, a diagram of a system for generating range images, the system inputs a plurality of two-dimensional (2D) images of a scene 500A, 500B, and 500C. A processing system 502 includes one or more processors 504 configured with a variety of processing modules, depending on the implementation. In one implementation, 2D images 500A, 500B, 500C are sent to an image-preprocessing module 506. From the pre-processed images 506, or optionally from storage 500C, an image selection module 508 chooses key frames and provides a reference 2D image of the scene. A range image generation module 510 generates a range image that can be optionally post-processed in post-processing module 512. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.
One or more image providing devices are configured for providing a plurality of 2D images of a scene from a plurality of camera angles, each of the 2D images of a scene including pixels and associated texture values. In one implementation, the image providing device is a digital picture camera 500A providing still images. In another implementation, the image capture device is a digital video camera 500B providing video images. In a case where the amount of provided video images is greater than required for the specific application., the video images can be decimated as appropriate for the given application. One or more image providing devices can provide the 2D images simultaneously. One non-limiting example of an image-providing device is using a camera function on a cellular phone to capture an image. In the case where the cellular phone has compass and/or global positioning system (GPS) functionality, location and orientation information can be provided with the captured images. In another implementation, the 2D images are provided from storage 500C. In another implementation, the 2D images are provided from a combination of sources. The types of images include, but are not limited to visible and infrared (IR), from sources including, but not limited to aerial photographs, video from vehicle mounted cameras, photographs taken from street-level, and satellite imagery (raw images with projection information or
orthorectified to orthophotos).
2D images are sent to a processing system 502 containing one or more processors 504 configured with a variety of processing modules, depending on the implementation. 2D images can be preprocessed by image preprocessing module 506, as described above. From the pre-processed images 506, or optionally from storage 500C, an image selection module 508 chooses key frames and provides a reference 2D image of the scene. The reference 2D image of the scene includes pixels and associated texture values.
The chosen 2D images and reference 2D image are used by the range image generation module 510 to calculate a generated range image. An initial generated range image is provided which includes pixels and associated depth information. The pixels of the generated range image correspond to the pixels of the reference 2D image. Note that although the term image is used for clarity in this description, it should be understood that image also refers to a portion of the image, subsaraple of the image, or other sub-set of data from the 2D image. In one implementation, the reference 2D image of the scene is provided by choosing an image from the plurality of 2D images of a scene. The initial generated range image can be defined using a variety of techniques depending on the application. In one implementation, the initial generated range image is defined by using a structure from motion (SFM) technique with the provided 2D images. In another implementation, the initial generated range image is defined by using optical flow with the 2D images. In another implementation, the initial generated range image is defined by using the technique of simultaneous location and mapping (SLAM) with the 2D images. In another implementation, the initial generated range image is defined by calculating an average plane from sparse feature matches of the 2D images. In another implementation, the position and orientation of the initial generated range image is known in world coordinates, a DTM or DSM is given, and the initial generated range image is calculated using range calculations to the DTM or DSM.
A cost is calculated for the initial generated range image using a variational method with at least one cost sub-functional, as described above. In one implementation, the at least one cost sub-functional further includes a cost sub-functional that compares the depth information of the pixels from the generated range image to calculate a smoothness value. The costs for all the cost sub-functionals are combined to provide a cost (score) for the initial generated range image. The initial generated range image is then updated based on the calculated cost and a new cost (score) calculated on the updated generated range image. The process of updating the generated range image and calculating a new cost is repeated until a given stopping criterion is reached, as described above.
In an optional implementation, additional 2D images are provided and the processing system uses the additional 2D images to update the generated range image. In another optional implementation, the above-described process is repeated on the provided 2D images to generate a plurality of generated range images. In another optional implementation, a postprocessing module 512 is configured to use one or more generated range images to generate a three-dimensional (3D) model of the scene. In another optional implementation, additional 2D images are provided and used to update the 3D model. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.
Note that a variety of implementations for modules and processing are possible, depending on the application. Optionally multiple image preprocessing modules 506 can be used, where each image-preprocessing module configured to process one or more types of 2D images. Optionally the image selection module 508 can be implemented as two modules, where a first module chooses key frames and a second module provides a reference 2D image of the scene. Optionally image preprocessing can be done after the 2D images have been selected 508. Based on this description, further variations will be obvious to one skilled in the art.
It will be appreciated that the above descriptions are intended only to serve as examples, and that many other embodiments are possible within the scope of the present invention as defined in the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A method for generating range images, comprising the steps of:
(a) providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of said 2D images of the scene including texture values associated with pixels of the 2D image;
(b) defining a reference 2D image of the scene;
(c) defining an initial generated range image of the scene associated with said reference 2D image of the scene, the generated range image including depth information associated with pixels of said generated range image, and said pixels of said generated range image corresponding to said pixels of said reference 2D image of the scene;
(d) providing a cost-functional including at least one cost sub-functional,
wherein one cost sub-functional uses the generated range image to project pixels from said reference 2D image according to a viewing direction of at least one of said plurality of 2D images of a scene to find projected pixels, and compares the texture values of said projected pixels with texture values of corresponding pixels from the 2D image associated with said viewing direction; and
(e) updating the generated range image using a variational method comprising:
(i) calculating a cost for said cost-functional;
(ii) updating the generated range image based on the calculated cost; and
(iii) repeating steps (i) and (ii) until a given stopping criterion has been reached.
2. The method of claim 1 wherein said reference 2D image of the scene is defined by choosing an image from said plurality of 2D images of a scene.
3. The method of claim 1 wherein said initial generated range image is defined by using a structure from motion (SFM) technique with said plurality of 2D images of a scene.
4. The method of claim 1 wherein said initial generated range image is defined by using optical flow on images from said plurality of 2D images of a scene.
5. The method of claim 1 wherein said initial generated range image is defined by using the technique of simultaneous location and mapping (SLAM) with said, plurality of 2D images of a scene.
6. The method of claim 1 wherein said initial generated range image is defined by calculating an average plane from sparse feature matches of said plurality of 2D images of a scene.
7. The method of claim 1 wherein said initial generated range image is defined using a digital terrain map (DTM) corresponding to at least part of said scene.
8. The method of claim 1 wherein said initial generated range image is defined by assigning a uniform range value to all points in said initial generated range image.
9. The method of claim 1 wherein said cost-functional further includes a second cost sub-functional that processes the generated range image to calculate a smoothness value.
10. The method of claim 1 further comprising updating said generated range image when additional 2D images become available.
11. The method of claim 1 wherein said range image is used to generate a three- dimensional (3D) model of the scene.
12. The method of claim 1 wherein steps (b) to (e)(iii) are repeated defining different reference 2D images thereby generating a plurality of range images.
13. The method of claim 12 wherein one or more of said plurality of range images are used to generate a three-dimensional (3D) model of the scene.
14. The method of claim 13 further comprising updating said 3D model when additional 2D images become available.
15. A system for generating range images, comprising:
(a) one or more image providing devices configured for providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of said plurality of 2D images of a scene including texture values associated with pixels of the 2D image;
(b) a processing system containing one or more processors configured for:
(i) defining a reference 2D image of the scene;
(ii) defining an initial generated range image of the scene associated with said reference 2D image of the scene, the generated range image including depth information associated with pixels of said generated range image, and said pixels of said generated range image corresponding to said pixels of said reference 2D image of the scene;
(iii) providing a cost-functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from said reference 2D image according to a viewing direction of at least one of said plurality of 2D images of a scene to find projected pixels, and compares the texture values of said projected pixels with texture values of corresponding pixels from the 2D image associated with said viewing direction; and
(iv) updating the generated range image using a variational method comprising:
(A) calculating a cost for said cost-functional;
(B) updating the generated range image based on the calculated cost; and
(C) repeating steps (A) and (B) until a given stopping criterion has been reached.
16. The system of claim 15 wherein said one or more image providing devices includes a digital picture camera.
17. The system of claim 15 wherein said one or more image providing devices includes a digital video camera.
18. The system of claim 15 wherein said one or more image providing devices includes a storage system.
19. The system of claim 15 wherein said plurality of 2D images of a scene are infrared (IR) images.
20. The system of claim 15 wherein said processing system is configured to define said reference 2D image of the scene by choosing an image from said plurality of 2D images of a scene.
21. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using a structure from motion (SFM) technique with said plurality of 2D images of a scene.
22. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using optical flow on images from said plurality of 2D images of a scene.
23. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using the technique of simultaneous location and mapping (SLAM) with said plurality of 2D images of a scene.
24. The system of claim 15 wherein said processing system is configured to define said initial generated range image by calculating an average plane from sparse feature matches of said plurality of 2D images of a scene.
25. The system of claim 15 wherein said processing system is configured to define said initial generated range image using a digital terrain map (DTM) corresponding to at least part of said scene.
26. The system of claim 15 wherein said processing system is configured to define said initial generated range image by assigning a uniform range value to all points in said initial generated range image.
27. The system of claim 15 wherein said processing system is configured to provide a second cost sub-functional that processes the generated range image to calculate a smoothness value.
28. The system of claim 15 wherein said processing system is configured to update said generated range image when additional 2D images become available
29. The system of claim 15 wherein said processing system is further configured to generate a three-dimensional (3D) model of the scene.
30. The system of claim 15 wherein the processing step of (b) is repeated defining different reference 2D images thereby generating a plurality of range images
31. The system of claim 30 wherein said processing system is further configured to generate a three-dimensional (3D) model of the scene using one or more of said plurality of range images.
32. The system of claim 31 wherein said processing system is further configured to update said 3D model when additional 2D images become available.
PCT/IB2010/056004 2009-12-31 2010-12-22 System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method WO2011080669A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL203089 2009-12-31
IL203089A IL203089A0 (en) 2009-12-31 2009-12-31 System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method

Publications (1)

Publication Number Publication Date
WO2011080669A1 true WO2011080669A1 (en) 2011-07-07

Family

ID=43570385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/056004 WO2011080669A1 (en) 2009-12-31 2010-12-22 System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method

Country Status (2)

Country Link
IL (1) IL203089A0 (en)
WO (1) WO2011080669A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013026005A1 (en) * 2011-08-18 2013-02-21 Harris Corporation Systems and methods for detecting cracks in terrain surfaces using mobile lidar data
CN113496550A (en) * 2020-03-18 2021-10-12 广州极飞科技股份有限公司 DSM calculation method and device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286307A2 (en) * 2001-08-10 2003-02-26 STMicroelectronics, Inc. Recovering depth from multiple images using multi-plane and spatial propagation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286307A2 (en) * 2001-08-10 2003-02-26 STMicroelectronics, Inc. Recovering depth from multiple images using multi-plane and spatial propagation

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
"Computation of Optical Flow - Beauchemin and Barron", ACM COMPUTING SURVEYS, vol. 27, no. 3, September 1995 (1995-09-01)
"Lecture Notes in Computer Science", 2004, SPRINGER, pages: 25 - 36
B. HORN; B. SCHUNCK: "Determining optical flow", ARTIFICIAL INTELLIGENCE, vol. 17, 1981, pages 185 - 203, XP000195787, DOI: doi:10.1016/0004-3702(81)90024-2
BARRON J L ET AL: "ON OPTICAL FLOW", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIALINTELLIGENCE AND INFORMATION-CONTROL SYSTEMS OF ROBOTS, XX, XX, 1 January 1994 (1994-01-01), pages 3 - 14, XP001121437 *
BRIAN CURLESS; MARC LEVOY, A VOLUMETRIC METHOD FOR BUILDING COMPLEX MODELSFROM RANGE IMAGES
KLEIN G ET AL: "Parallel Tracking and Mapping for Small AR Workspaces", MIXED AND AUGMENTED REALITY, 2007. ISMAR 2007. 6TH IEEE AND ACM INTERNATIONAL SYMPOSIUM ON, IEEE, PISCATAWAY, NJ, USA, 13 November 2007 (2007-11-13), pages 225 - 234, XP031269901, ISBN: 978-1-4244-1749-0 *
LANG ET AL.: "3D scene reconstruction from IR image sequences for image-based navigation update and target detection of an autonomous airborne system", PROC. OF SPIE, vol. 6940, 15 May 2008 (2008-05-15), pages 1 - 9, XP002631461, Retrieved from the Internet <URL:http://spie.org/x648.html?product_id=777628> [retrieved on 20110406], DOI: 10.1117/12.777628 *
MICHAEL GOESELE ET AL: "Multi-View Stereo for Community Photo Collections", COMPUTER VISION, 2007. ICCV 2007. IEEE 11TH INTERNATIONAL CONFERENCE O N, IEEE, PI, 1 October 2007 (2007-10-01), pages 1 - 8, XP031194422, ISBN: 978-1-4244-1630-1, DOI: DOI:10.1109/ICCV.2007.4409200 *
MICHAEL KAZHDAN; MATTHEW BOLITHO; HUGUES HOPPE, EUROGRAPHICS SYMPOSIUM ON GEOMETRY PROCESSING, 2006
SEITZ S M ET AL: "A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms", CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2006 IEEE COMPUTER SOCIETY , NEW YORK, NY, USA 17-22 JUNE 2006, IEEE, PISCATAWAY, NJ, USA, vol. 1, 17 June 2006 (2006-06-17), pages 519 - 528, XP010922864, ISBN: 978-0-7695-2597-6, DOI: DOI:10.1109/CVPR.2006.19 *
SING BING KANG ET AL: "Extracting View-Dependent Depth Maps from a Collection of Images", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, BO, vol. 58, no. 2, 1 July 2004 (2004-07-01), pages 139 - 163, XP019216416, ISSN: 1573-1405, DOI: DOI:10.1023/B:VISI.0000015917.35451.DF *
SUDIPTA N SINHA ET AL: "Piecewise planar stereo for image-based rendering", COMPUTER VISION, 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 29 September 2009 (2009-09-29), pages 1881 - 1888, XP031672747, ISBN: 978-1-4244-4420-5 *
SZELISKI R ET AL: "Stereo matching with transparency and matting", 6TH INTERNATIONAL CONFERENCE ON COMPUTER VISION. ICCV '98. BOMBAY, JAN. 4 - 7, 1998; [IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION], NEW YORK, NY : IEEE, US, 4 January 1998 (1998-01-04), pages 517 - 524, XP002369275, ISBN: 978-0-7803-5098-4 *
SZELISKI R: "Estimating Motion From Sparse Range Data Without Correspondence", 19881205; 19881205 - 19881208, 5 December 1988 (1988-12-05), pages 207 - 215, XP010225210 *
T. BROX; A. BRUHN; N. PAPENBERG; J. WEICKERT: "Computer Vision - ECCV", vol. 3024, 2004, article "High accuracy optic flow estimation based on a theory for warping"

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013026005A1 (en) * 2011-08-18 2013-02-21 Harris Corporation Systems and methods for detecting cracks in terrain surfaces using mobile lidar data
CN113496550A (en) * 2020-03-18 2021-10-12 广州极飞科技股份有限公司 DSM calculation method and device, computer equipment and storage medium
CN113496550B (en) * 2020-03-18 2023-03-24 广州极飞科技股份有限公司 DSM calculation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
IL203089A0 (en) 2010-11-30

Similar Documents

Publication Publication Date Title
US12272020B2 (en) Method and system for image generation
US12182938B2 (en) System and method for virtual modeling of indoor scenes from imagery
RU2642167C2 (en) Device, method and system for reconstructing 3d-model of object
US11210804B2 (en) Methods, devices and computer program products for global bundle adjustment of 3D images
Newcombe et al. Live dense reconstruction with a single moving camera
Dolson et al. Upsampling range data in dynamic environments
US8929645B2 (en) Method and system for fast dense stereoscopic ranging
US8406509B2 (en) Three-dimensional surface generation method
US10477178B2 (en) High-speed and tunable scene reconstruction systems and methods using stereo imagery
US20100111444A1 (en) Method and system for fast dense stereoscopic ranging
Alidoost et al. An image-based technique for 3D building reconstruction using multi-view UAV images
CN110211169B (en) Reconstruction method of narrow baseline parallax based on multi-scale super-pixel and phase correlation
Fei et al. Ossim: An object-based multiview stereo algorithm using ssim index matching cost
Hu et al. IMGTR: Image-triangle based multi-view 3D reconstruction for urban scenes
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
Huang et al. Super resolution of laser range data based on image-guided fusion and dense matching
CN119006678A (en) Three-dimensional Gaussian sputtering optimization method for pose-free input
Saxena et al. 3-d reconstruction from sparse views using monocular vision
Le Besnerais et al. Dense height map estimation from oblique aerial image sequences
Hu et al. 3D map reconstruction using a monocular camera for smart cities
Banno et al. Omnidirectional texturing based on robust 3D registration through Euclidean reconstruction from two spherical images
WO2011080669A1 (en) System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method
CN117893600A (en) A visual localization method for UAV images supporting scene appearance differences
Ni et al. Geo-registering 3D point clouds to 2D maps with scan matching and the Hough Transform
Cabezas et al. Aerial reconstructions via probabilistic data fusion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10813087

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10813087

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载