WO2011080669A1

WO2011080669A1 - System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method

Info

Publication number: WO2011080669A1
Application number: PCT/IB2010/056004
Authority: WO
Inventors: Tomer Avidor; Gil Briskin; Omri Peleg
Original assignee: Rafael Advanced Defense Systems Ltd.
Priority date: 2009-12-31
Filing date: 2010-12-22
Publication date: 2011-07-07
Also published as: IL203089A0

Abstract

Reconstruction of range images from multiple two-dimensional images uses 2D images of a scene, defining a reference 2D image and then defining an initial generated range image associated with the reference 2D image. The initial generated range image includes depth information for at least a portion of the pixels of the reference 2D image. A cost-functional is provided to calculate a cost reflecting how well the generated range image matches a set of criteria. Each criterion is evaluated as a cost sub-functional of the cost-functional, together with a factor that reflects the criterion's importance to the overall cost. A variational method is used to find an optimal generated range image in regards to this cost. The variational method includes repeatedly updating the generated range image and re-calculating the cost using the cost-functional until a given stopping criterion is reached.

Description

System and method for reconstruction of range images from multiple two- dimensional images using a range based variational method

FIELD OF THE INVENTION

The present embodiment generally relates to the field of image processing and computer vision, and in particular, concerns a system and method for reconstruction of range images.

BACKGROUND OF THE INVENTION

In the field of image processing, generating three-dimensional (3D) models has many uses. 3D models are used in many applications including automatic analysis of a scene, understanding navigability of a scene, visibility of points from different areas, and segmentation of 3D models (both in civilian and military applications). One popular application uses a three-dimensional model to generate a view of a scene for a user. One or more cameras, or more generally image capture devices, can capture i mages of a real scene. In a case where the user desires to view the scene from an angle other than the angle from which the original images were captured, the application must render a view of the scene from the desired viewpoint of the user. A variety of conventional techniques exists for generating views from a new viewpoint, also known as a virtual location, or a virtual camera angle. One of the challenges in generating new views from images is accurately rendering objects in a view from a new (virtual) camera angle. Using a three-dimensional model of the scene to be rendered is a known method for improving view generation. Techniques for using a three- dimensional model to facilitate the generation of views of a scene are known in the industry.

Given the existence of well-known techniques to combine range images to create three-dimensional (3D) models, a challenge is to generate more accurate range images from two-dimensional (2D) images. Range images can also be used for many other applications. In the context of this document, the term "3D model" refers to a model that includes descriptions of surfaces. The term "range image", also known as a range map, or 3D map, refers to a collection of information including ranges from a viewpoint to points in a scene. One non-limiting example of a range image is the output of a LIDAR (Light Detection and Ranging) system that generates ranges from the viewpoint of the LIDAR to objects in the scene being captured. A known approach to generating 3D models is to use feature identification from one image to other images and then use linear or non-linear multi-view triangulation of the feature to determine the distance from the reference camera location to the feature. The images are correlated by tracking features such as corner points (edges with gradients in multiple directions) from one image to the next. The feature trajectories over time are then used to reconstruct the 3D positions of the features. Triangulation using two or more camera views provided 3D positions for the features, and these 3D positions are combined to create a 3D model or a range image. An advantage of feature matching and triangulation techniques is being able to use multiple images to improve processing and reconstruction of the 3D positions of features in a scene. Disadvantages of feature matching and triangulation techniques include known limitations when a scene (and the corresponding 2D images of the scene) has smooth areas and/or areas of low texture. Feature correlation generally fails for smooth and low texture areas, resulting in the failure of this family of algorithms to reconstruct the 3D positions of smooth and low texture points in the scene. Correlation is also dependant on the size of the aperture window. The aperture window includes multiple pixels to facilitate correlation of areas between images. A consequence of this correlation aperture is the loss of accuracy in reconstruction of an object, particularly the edges of objects, and possible fattening of objects.

Another known approach for 3D reconstruction is to use optical flow techniques. Optical flow is an approximation to image motion, defined as the projection of velocities of 3D surface points onto the imaging plane of a visual sensor. In other words, 2D image motion is the projection of the 3D motion of objects, relative to a visual sensor, onto the 2D image plane. Sequences of time-ordered images allow the estimation of projected 2D image motion as either instantaneous image velocities or discrete image displacements. These are usually called the optical flow field or the image velocity field. Provided that optical flow is a reliable approximation to 2D image motion, optical flow may then be used to generate a 3D model of the surface structure (shape or relative depth) through assumptions concerning the structure of the optical flow field, the 3D environment, and the motion of the sensor. Refer to The Computation of Optical Flow - Beauchemin and Barron, ACM Computing Surveys, Vol 27, No. 3, September 1995 for additional background information on optical flow. The optical flow field, together with the known camera angles of the two images, is used to create a range map (or a 3D point cloud) using triangulation techniques. By definition, optical flow is a solution to the problem of determining how points (pixels) move in relation to each other in a pair of images. The motion of pixels in a 2D image, in other words motion in the (x, y) plane, is determined. Advantages of optical flow include being able to reconstruct the 3D positions of smooth and low texture areas in a scene. 3D information for smooth and low texture areas is derived from the 3D information of features relatively local to the smooth and low texture areas. Disadvantages of optical flow include the sensitivity of optical flow techniques to large differences between the input images, and being limited to processing using only one pair of images at a time. Optical flow techniques are not able to use optimally all of the available images.

Other known approaches for 3D reconstruction includes using RADAR (RAdio

Detection and Ranging), LIDAR (Light Detection and Ranging), and structured lighting to generate range images. These approaches are also known as active techniques. Active techniques require the generation of a known input, such as radio waves or light, to be captured and analyzed to generate 3D position information of the points in a scene, generally in the form of a range image. Advantages of active techniques include reconstruction of 3D positions of smooth and low texture areas of a scene. Disadvantages of active techniques include having to provide an active input and lack of texture information (necessary for generating novel views of the scene). Techniques for generating 3D models from range images are known in the art.

It is desirable to have a system and method for reconstruction of 3D models of a scene, particularly in cases where the scene includes smooth areas and/or areas of low texture. It is further desirable for this system and method to be able to reconstruct 3D models passively from 2D images of a scene. SUMMARY

According to the teachings of the present embodiment there is provided a method for generating range images, including the steps of: providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the 2D images of the scene including texture values associated with pixels of the 2D image; defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost- functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with the viewing direction; and updating the generated range image using a variational method including: calculating a cost for the cost-functional and updating the generated range image based on the calculated cost until a given stopping criterion has been reached.

In an optional embodiment, the method defines the reference 2D image of the scene by choosing an image from the plurality of 2D images of a scene,

In an optional embodiment, the method defines the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using optical flow on images from the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene. In another optional embodiment, the method defines the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene. In another optional embodiment, the method defines the initial generated range image by assigning a uniform range value to all points in the initial generated range image.

In an optional embodiment, the cost- functional further includes a second cost sub- functional that processes the generated range image to calculate a smoothness value.

In an optional embodiment, the method further includes updating the generated range image when additional 2D images become available.

In an optional embodiment, the range image is used to generate a three-dimensional (3D) model of the scene.

In an optional embodiment, the method is repeated, defining different reference 2D images thereby generating a plurality of range images.

In an optional embodiment, one or more of the plurality of range images are used to generate a three-dimensional (3D) model of the scene. In an optional embodiment, the method further includes updating the 3D model when additional 2D images become available.

According to the teachings of the present embodiment there is provided a system for generating range images, including: one or more image providing devices configured for providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of the plurality of 2D images of a scene including texture values associated with pixels of the 2D image; a processing system containing one or more processors configured for: defining a reference 2D image of the scene; defining an initial generated range image of the scene associated with the reference 2D image of the scene, the generated range image including depth information associated with pixels of the generated range image, and the pixels of the generated range image corresponding to the pixels of the reference 2D image of the scene; providing a cost-functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from the reference 2D image according to a viewing direction of at least one of the plurality of 2D images of a scene to find projected pixels, and compares the texture values of the projected pixels with texture values of corresponding pixels from the 2D image associated with the viewing direction; and updating the generated range image using a variational method including: calculating a cost for the cost-functional and updating the generated range image based on the calculated cost until a given stopping criterion has been reached.

In an optional embodiment, the one or more image providing devices includes a digital picture camera. In another optional embodiment, the one or more image providing devices includes a digital video camera. In another optional embodiment, the one or more image providing devices includes a storage system. In an optional embodiment, the plurality of 2D images of a scene are infrared (IR) images.

In an optional embodiment, the processing system is configured to define the reference

2D image of the scene by choosing an image from the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by using a structure from motion (SFM) technique with the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by using optical flow on images from the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured, to define the initial, generated range image by using the technique of simultaneous location and mapping (SLAM) with the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image by calculating an average plane from sparse feature matches of the plurality of 2D images of a scene. In another optional embodiment, the processing system is configured to define the initial generated range image using a digital terrain map (DTM) corresponding to at least part of the scene. In another optional embodiment, the processing system is configured to define the initial generated range image by assigning a uniform range value to all points in the initial generated range image.

In an optional embodiment, the processing system is configured to provide a second cost sub-functional that processes the generated range image to calculate a smoothness value.

In an optional embodiment, the processing system is configured to update the generated range image when additional 2D images become available

In an optional embodiment, the processing system is further configured to generate a three-dimensional (3D) model of the scene.

In an optional embodiment, the processing is repeated defining different reference 2D images thereby generating a plurality of range images

In an optional embodiment, the processing system is further configured to generate a three-dimensional (3D) model of the scene using one or more of the plurality of range images.

In an optional embodiment, the processing system is further configured to update the 3D model when additional 2D images become available.

BRIEF DESCRIPTION OF FIGURES

The embodiment is herein described, by way of example only, with reference to the accompanying drawings, wherein:

FIGURE 1, a flowchart of a method for reconstruction of range images from multiple two-dimensional images using a range based variational method.

FIGURE 2, an example diagram of a cost sub-functional with a constraint on the data.

FIGURE 3, a diagram of a system for generating range images.

DETAILED DESCRIPTION

The principles and operation of this system and method according to the present embodiment may be better understood with reference to the drawings and the accompanying description. One embodiment of a system and method for reconstruction of range images from multiple two-dimensional (2D) images using a range based variational method overcomes the limitations of conventional techniques and provides increased success in generation of range images from 2D images. This innovative technique generates range images similar to LIDAR generated range images but does not require the use of an active input, in contrast using passively collected 2D images. Additionally this technique has access to the original texture in the 2D images, in contrast to LIDAR that does not provide texture information. This technique is not limited to only processing a pair of images at a time, in contrast processing multiple images. This technique is not limited to implementations requiring a local aperture window, in contrast being able to compare single pixels hence improving processing and reconstruction of the 3D positions of edges in a scene. An additional advantage is that this method can be implemented without requiring non-volatile storage of data during processing, facilitating implementation on a graphics card or similar hardware. The range images can then be used to generate a 3D model of the scene.

Conventional feature matching and triangulation techniques do not successfully process smooth and low-texture areas in the input 2D images, and the resulting output has "holes" or non-value areas. In contrast, using an implementation of the present description, smooth and low-texture areas in the input 2D images are processed successfully and grayscale areas are can be determined for all surfaces.

Referring to FIGURE 1 , a flowchart of a method for reconstruction of range images from multiple two-dimensional images using a range based variational method, 2D images 300 of a scene are optionally preprocessed 302 and camera angles are calculated 304. From, the 2D images are chosen 306 key frames 308. From the 2D images or using another method is defined 307 a reference 2D image 308. A generated range image is generated 310 by first defining 312 an initial generated range image associated with the reference 2D image. The initial generated range image includes depth information for at least a portion of the pixels of the reference 2D image. A cost-functional is provided to calculate a cost 314 reflecting how well the generated range image matches a set of criteria. Each criterion is evaluated as a cost sub-functional of the cost-functional, together with a factor that reflects the criterion's importance to the overall cost. A variational method is used to find an optimal generated range image in regards to this cost. The variational method includes repeatedly updating 316 the generated range image and re-calculating 314 the cost using the cost-functional until a given stopping criterion is reached. The resulting generated range image for the reference 2D image 318 can be post-processed 320 to provide a final 3D model 322. Each of the plurality of 2D images 300 of a scene includes texture values associated with pixels of the 2D image. In the context of this document, texture refers to a value corresponding to the image content of the pixel, in other words representing the intensity of a surface, or representing how the pixel can be viewed. Typically, texture corresponds to the color or grayscale (of the pixel) in the form of a red-green-blue (RGB) value, although other image content such as infrared (IR) can be represented. The 2D images may be optionally preprocessed 302, including changing the data format, size, normalization, calculating image related information, and other image processing necessary to prepare the image or related information. The 2D images can be provided with associated camera angle information. In this context, camera angle information, or more simply the camera angle, also known as the viewing direction, can include information such as the position and orientation of the image capture device (camera) in relation to the scene being imaged. In this field, a camera angle (viewing direction) is generally provided with the image, but this provided camera angle is generally not sufficiently accurate for the calculations that need to be performed, and so the camera angle needs to be optionally calculated (corrected) 304. If camera angles are not provided with the images, camera angles can be generated using known techniques such as structure from motion (SFM) techniques. Techniques to calculate camera angles are known in the art. In a case where the camera is moving, or multiple cameras in known locations captures images of a scene, ego motion algorithms can be used to determine camera information from the images. The output of an ego motion algorithm includes the camera information associated with the input image, including the position and orientation of the camera relative to the scene.

Key frames are chosen in block 306 and a reference 2D image is defined in block 307. In this context, key frames are 2D images chosen, based on a criteria, from the plurality of 2D images, for further processing. Depending on the application, criteria can include an image baseline, image similarity, the difference between images, and computation limitations. A reference 2D image of the scene is defined, the reference 2D image including texture values associated with pixels of the 2D image. A preferred implementation is to choose the reference 2D image from the plurality of 2D images. Testing has shown that a preferable

implementation is for the reference 2D image to be a middle frame from a sequence of video images or a middle image from a range of still cameras.

The key frames and the reference 2D image 308 can be used to generate a generated range image 310. In one implementation, an initial generated range image is defined 31.2 using the reference 2D image and at least one of the key 2D images. The generated range image includes depth information for pixels of at least a portion of the reference 2D image. The initial generated range image can be from an existing camera angle or from a new

(virtual) camera angle. Techniques for generating a range image from 2D images are known in the art. One conventional technique is to use structure from motion (SFM) to generate the range image. SFM generates a sparse range image, and SFM post-processing can be used to increase range image detail. Optical flow, linear triangulation, and non-linear triangulation are other conventional techniques that can be used to generate a range image. The technique of simultaneous location and mapping (SLAM) can be used to generate a range image of an unknown environment (without a priori knowledge) or a known environment (with a priori knowledge) while at the same time keeping track of the current location, A simple technique for calculating an initial range image to initialize the algorithm is to use a range estimate. Another technique is to calculate an average plane from sparse feature matches (for example, given a sparse matching of points between two images, calculate a plane in three dimensions that best approximates the matches). A disadvantage of calculating an average plane is the lack of 3D geometry in the generated range image. In a case where sufficiently accurate position information (for example from a global positioning system [GPS]) is available, the position information can be used to calculate an initial range image by calculating the range of each pixel to a provided digital terrain map. Another technique is to interpolate a sparse range map (a range map calculated on sparsely selected points) to generate a dense range map.

Using the initial generated range image and at least one cost sub-functional, a cost is calculated 314 for the range image to provide a cost (score) for the range image. A variational method is a term known in the field and refers to finding the parameters that minimize a cost- functional. In this case, the parameter of interest is the range image (which gives a range for each pixel), and the cost-functional is a provided cost-functional. Variational methods are used to solve the minimization problem for the cost-functional, which includes one or more cost sub-functionals to find the generated range image (which in this case may be more easily understood as a 3D map or even more simply as a collection of pixel depths) that minimizes the cost. The cost-functional can be expressed as:

L(r) = Ld + aLs.

Where "L" is a cost- functional calculated on a range image "r". '.Ld" is one of the at least one cost sub-functionals with a constraint on the data, in this case the grey scale data, or texture, as will be described below, "a" is a scale factor (weight). "Ls" is an optional second cost sub-functional with a constraint on smoothness, as will be described below. Additional scale factors and cost sub-functionals can be added to the cost-functional as appropriate to the application. The calculated costs of at least two cost sub-functionals are combined to generate a cost (score) for the current range image. Multiple images can be used to calculate a cost for a pixel.

One technique for finding a minimum to a cost-functional is to transform the cost- functional into a set of equations, and then solve the set of equations for the parameters that minimize the cost-functional. The set of equations is commonly reached by the Buler- LaGrange method. In cases where the set of equations cannot be solved directly, common practice is to solve the set of equations iteratively. Iteratively solving the set of equations can be done by linearizing the equations (as necessary) using a technique such as Newton's method, and solving the linear set of equations by using an iterative solver, for example Jacobi or Gauss-Seidel. Techniques of solving functions to find a local minimum are known in the art and other techniques can be used as appropriate to the application.

The range image is then updated 316 and a new cost calculated 314 using the provided cost-functional for the updated generated range image. How the generated range image is updated depends on the application. Experimentation has shown that knowing how to update the generated range image to improve the cost may not be possible initially. After calculating a plurality of costs using a pre-defined update strategy, the updates and costs can be used to determine a more methodical approach to updates. Updating of the generated range image and calculating of a cost are repeated until a given stopping criteria is reached. This stopping criterion can be provided, determined by the specific application of the method, or determined manually by examining costs and/or the range image as costs and/or the range image are calculated and updated. Note that the value of a cost (the score) is only relevant in relation to the value of other costs. As generated range-images are updated and costs calculated, the comparison of the costs will indicate the improvement or degradation in the updated range image. When a generated range image has reached a given stopping criteria, the range image can be considered a successful solution. In one implementation, the stopping criteria is that the difference between the previous range map and the updated range map is within a given limit. In another implementation, the stopping a criterion is that the cost has decreased significantly compared to the initial cost.

Referring to FIGURE 2, an example diagram of a cost sub-functional with a constraint on the data, in one implementation, the provided cost sub-functional uses the depth information in the generated range image to project the pixels in the generated range image onto an image other than the reference 2D image. Then the texture values of the pixels in the reference 2D image are compared to the texture value of the projected pixels in the one or more 2D images other than the reference 2D image. A scene 401 contains an object 416. A point P2 in the scene is associated with a data point P1 in reference 2D image 410 and a corresponding point in the generated range image. Data point P1 also known as pixel PI, includes information from the generated range image including the depth 418 from camera angle 400 and information from the reference 2D image including the texture or grayscale of point P2. Note that this is an example of a cost sub-functional with a constraint on the data. Also, note that the camera angle 400 for reference 2D image 410 can be an existing camera angle associated with a provided 2D image, or the reference 2D image can be created from a virtual camera angle. The calculation of this cost sub-functional starts by using the known camera angle 400 of the reference 2D image 410 and the calculated depth 418 of a pixel PI in the generated range image. Given an image 412 and the associated camera angle 402, a pixel P1 in the reference 2D image 410 can be projected onto a 2D image 412 as pixel P3. The texture value of pixel PI in the reference 2D image 410 is compared to the texture value of pixel P3 in the 2D image 412, and the difference between the two textures is used as a value in the cost sub-functional. Experimentation has shown that a particularly successful implementation is using a grayscale to represent the pixels. The calculation and comparison is repeated for other pixels in the reference 2D image. The calculation and comparison is also repeated for additional images (for example, image 414 from camera angle 404). Using multiple images provides additional information for each pixel, including multiple camera angles to increase robustness of the calculation, facilitating reducing the errors and noise inherent in determining the depth of a pixel in the generated range image.

In another implementation, the cost sub-functional with a constraint on the data further includes comparing the texture values of all possible pairs of 2D images from the plurality of 2D images of a scene. Depending on the specific application, the cost sub- functional with a constraint on the data can be calculated using other methods of comparing the texture values of 2D images from the plurality of 2D images of a scene.

In one implementation, the value of a cost sub-functional with a constraint on the data can be calculated by simply summing the differences in the values of the textures for corresponding pixels. Other methods for calculating the value of a cost function can be used depending on the specific application. In another implementation, a provided cost sub-functional has a constraint on smoothness and compares the depth information for each of the pixels in the generated range image for piecewise smoothness. The calculation of this cost sub-functional compares the depth information of a pixel in the current generated range image to the depth information of neighboring pixels in the same current generated range image. Conventionally, a similar type of comparison is done by optical flow techniques to enforce smoothness on the motion field that describes the 2D movement of pixels between images. Defining a cost of smoothness (Ls in the above equation) is known from the field of optical flow. For a description of optical flow and smoothness constraints refer, for example, to B. Horn and B. Schunck, Determining optical flow, Artificial Intelligence, 17; 185-203, 1981 and T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High accuracy optic flow estimation based on a theory for warping, In T. Pajdla and J. Matas, editors, Computer Vision - ECCV 2004, volume 3024 of Lecture Notes in Computer Science, pages 25-36, Springer, Berlin, 2004. Comparing the depth information determines the smoothness of the change in depth for neighboring pixels. The comparison is referred to as piecewise smooth because edges of objects in the range image are areas where the depth of neighboring pixels does not change smoothly, rather there is an abrupt change of depth at an edge. Depending on the application, the smoothness cost sub-functional can be defined as piecewise smooth, totally smooth, or other types of smoothness appropriate for the particular application. The closeness in depth of a pixel to the neighbors of the pixel is used as a value in the cost sub-functional. The comparison is repeated for other pixels in the current range image. Smoothness of depth can be used to correct noise and enforce constraints on the generated range image. In addition, optical flow techniques do not use aperture windows, in contrast looking at individual pixels. Pixel comparison, improves the accuracy in reconstruction of an object, particularly the edges of objects, and reduces possible fattening of objects.

In one implementation, the value of a cost sub-functional with a constraint on the smoothness can be calculated by simply summing the differences in the values of the depths for neighboring pixels. Other methods for calculating the value of a cost function can be used depending on the specific application. Convergence of the range image is limited by practical considerations. Convergence of a majority of the range image is normally desirable, and practically not all of the range image will converge. The resulting range images 318 can be post-processed 320 by conventional means depending on the application. In an optional implementation, additional 2D images are provided and used to update the generated range image. In another optional implementation, the above-described method is repeated on the provided 2D images to generate a plurality of range images. In another optional implementation, the plurality of range images are processed to combine multiple range images of a scene, or of portions of a scene, to produce a single 3D model. For techniques for range image merging, refer to Poisson Surface Reconstruction by Michael Kazhdan, Matthew Bolitho and Hugues Hoppe, Eurographics Symposium on Geometry Processing (2006) and A Volumetric Method for Building Complex Models from Range images, Brian Curless and Marc Levoy, Stanford University, presented at SIGGRAPH '96. Another non-limiting example is the case where the location of the range image in the world is provided, and the range image is used to generate a 3D map, which can be in the form, of a digital terrain map (DTM) or digital surface map (DSM). The results of postprocessing 320 include a 3D model 322 and/or other results depending on the application. In another optional implementation, additional 2D images are provided and used to update or extend the 3D model. New range images can also be used to increase the level of detail of a 3D model.

Referring to FIGURE 3, a diagram of a system for generating range images, the system inputs a plurality of two-dimensional (2D) images of a scene 500A, 500B, and 500C. A processing system 502 includes one or more processors 504 configured with a variety of processing modules, depending on the implementation. In one implementation, 2D images 500A, 500B, 500C are sent to an image-preprocessing module 506. From the pre-processed images 506, or optionally from storage 500C, an image selection module 508 chooses key frames and provides a reference 2D image of the scene. A range image generation module 510 generates a range image that can be optionally post-processed in post-processing module 512. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.

One or more image providing devices are configured for providing a plurality of 2D images of a scene from a plurality of camera angles, each of the 2D images of a scene including pixels and associated texture values. In one implementation, the image providing device is a digital picture camera 500A providing still images. In another implementation, the image capture device is a digital video camera 500B providing video images. In a case where the amount of provided video images is greater than required for the specific application., the video images can be decimated as appropriate for the given application. One or more image providing devices can provide the 2D images simultaneously. One non-limiting example of an image-providing device is using a camera function on a cellular phone to capture an image. In the case where the cellular phone has compass and/or global positioning system (GPS) functionality, location and orientation information can be provided with the captured images. In another implementation, the 2D images are provided from storage 500C. In another implementation, the 2D images are provided from a combination of sources. The types of images include, but are not limited to visible and infrared (IR), from sources including, but not limited to aerial photographs, video from vehicle mounted cameras, photographs taken from street-level, and satellite imagery (raw images with projection information or

orthorectified to orthophotos).

2D images are sent to a processing system 502 containing one or more processors 504 configured with a variety of processing modules, depending on the implementation. 2D images can be preprocessed by image preprocessing module 506, as described above. From the pre-processed images 506, or optionally from storage 500C, an image selection module 508 chooses key frames and provides a reference 2D image of the scene. The reference 2D image of the scene includes pixels and associated texture values.

The chosen 2D images and reference 2D image are used by the range image generation module 510 to calculate a generated range image. An initial generated range image is provided which includes pixels and associated depth information. The pixels of the generated range image correspond to the pixels of the reference 2D image. Note that although the term image is used for clarity in this description, it should be understood that image also refers to a portion of the image, subsaraple of the image, or other sub-set of data from the 2D image. In one implementation, the reference 2D image of the scene is provided by choosing an image from the plurality of 2D images of a scene. The initial generated range image can be defined using a variety of techniques depending on the application. In one implementation, the initial generated range image is defined by using a structure from motion (SFM) technique with the provided 2D images. In another implementation, the initial generated range image is defined by using optical flow with the 2D images. In another implementation, the initial generated range image is defined by using the technique of simultaneous location and mapping (SLAM) with the 2D images. In another implementation, the initial generated range image is defined by calculating an average plane from sparse feature matches of the 2D images. In another implementation, the position and orientation of the initial generated range image is known in world coordinates, a DTM or DSM is given, and the initial generated range image is calculated using range calculations to the DTM or DSM.

A cost is calculated for the initial generated range image using a variational method with at least one cost sub-functional, as described above. In one implementation, the at least one cost sub-functional further includes a cost sub-functional that compares the depth information of the pixels from the generated range image to calculate a smoothness value. The costs for all the cost sub-functionals are combined to provide a cost (score) for the initial generated range image. The initial generated range image is then updated based on the calculated cost and a new cost (score) calculated on the updated generated range image. The process of updating the generated range image and calculating a new cost is repeated until a given stopping criterion is reached, as described above.

In an optional implementation, additional 2D images are provided and the processing system uses the additional 2D images to update the generated range image. In another optional implementation, the above-described process is repeated on the provided 2D images to generate a plurality of generated range images. In another optional implementation, a postprocessing module 512 is configured to use one or more generated range images to generate a three-dimensional (3D) model of the scene. In another optional implementation, additional 2D images are provided and used to update the 3D model. The results of post-processing can be provided to a user 514 or sent to another location, such as storage 516.

Note that a variety of implementations for modules and processing are possible, depending on the application. Optionally multiple image preprocessing modules 506 can be used, where each image-preprocessing module configured to process one or more types of 2D images. Optionally the image selection module 508 can be implemented as two modules, where a first module chooses key frames and a second module provides a reference 2D image of the scene. Optionally image preprocessing can be done after the 2D images have been selected 508. Based on this description, further variations will be obvious to one skilled in the art.

It will be appreciated that the above descriptions are intended only to serve as examples, and that many other embodiments are possible within the scope of the present invention as defined in the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A method for generating range images, comprising the steps of:

(a) providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of said 2D images of the scene including texture values associated with pixels of the 2D image;

(b) defining a reference 2D image of the scene;

(c) defining an initial generated range image of the scene associated with said reference 2D image of the scene, the generated range image including depth information associated with pixels of said generated range image, and said pixels of said generated range image corresponding to said pixels of said reference 2D image of the scene;

(d) providing a cost-functional including at least one cost sub-functional,

wherein one cost sub-functional uses the generated range image to project pixels from said reference 2D image according to a viewing direction of at least one of said plurality of 2D images of a scene to find projected pixels, and compares the texture values of said projected pixels with texture values of corresponding pixels from the 2D image associated with said viewing direction; and

(e) updating the generated range image using a variational method comprising:

(i) calculating a cost for said cost-functional;

(ii) updating the generated range image based on the calculated cost; and

(iii) repeating steps (i) and (ii) until a given stopping criterion has been reached.

2. The method of claim 1 wherein said reference 2D image of the scene is defined by choosing an image from said plurality of 2D images of a scene.

3. The method of claim 1 wherein said initial generated range image is defined by using a structure from motion (SFM) technique with said plurality of 2D images of a scene.

4. The method of claim 1 wherein said initial generated range image is defined by using optical flow on images from said plurality of 2D images of a scene.

5. The method of claim 1 wherein said initial generated range image is defined by using the technique of simultaneous location and mapping (SLAM) with said, plurality of 2D images of a scene.

6. The method of claim 1 wherein said initial generated range image is defined by calculating an average plane from sparse feature matches of said plurality of 2D images of a scene.

7. The method of claim 1 wherein said initial generated range image is defined using a digital terrain map (DTM) corresponding to at least part of said scene.

8. The method of claim 1 wherein said initial generated range image is defined by assigning a uniform range value to all points in said initial generated range image.

9. The method of claim 1 wherein said cost-functional further includes a second cost sub-functional that processes the generated range image to calculate a smoothness value.

10. The method of claim 1 further comprising updating said generated range image when additional 2D images become available.

11. The method of claim 1 wherein said range image is used to generate a three- dimensional (3D) model of the scene.

12. The method of claim 1 wherein steps (b) to (e)(iii) are repeated defining different reference 2D images thereby generating a plurality of range images.

13. The method of claim 12 wherein one or more of said plurality of range images are used to generate a three-dimensional (3D) model of the scene.

14. The method of claim 13 further comprising updating said 3D model when additional 2D images become available.

15. A system for generating range images, comprising:

(a) one or more image providing devices configured for providing a plurality of two-dimensional (2D) images of a scene from a plurality of viewing directions, each of said plurality of 2D images of a scene including texture values associated with pixels of the 2D image;

(b) a processing system containing one or more processors configured for:

(i) defining a reference 2D image of the scene;

(ii) defining an initial generated range image of the scene associated with said reference 2D image of the scene, the generated range image including depth information associated with pixels of said generated range image, and said pixels of said generated range image corresponding to said pixels of said reference 2D image of the scene;

(iii) providing a cost-functional including at least one cost sub-functional, wherein one cost sub-functional uses the generated range image to project pixels from said reference 2D image according to a viewing direction of at least one of said plurality of 2D images of a scene to find projected pixels, and compares the texture values of said projected pixels with texture values of corresponding pixels from the 2D image associated with said viewing direction; and

(iv) updating the generated range image using a variational method comprising:

(A) calculating a cost for said cost-functional;

(B) updating the generated range image based on the calculated cost; and

(C) repeating steps (A) and (B) until a given stopping criterion has been reached.

16. The system of claim 15 wherein said one or more image providing devices includes a digital picture camera.

17. The system of claim 15 wherein said one or more image providing devices includes a digital video camera.

18. The system of claim 15 wherein said one or more image providing devices includes a storage system.

19. The system of claim 15 wherein said plurality of 2D images of a scene are infrared (IR) images.

20. The system of claim 15 wherein said processing system is configured to define said reference 2D image of the scene by choosing an image from said plurality of 2D images of a scene.

21. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using a structure from motion (SFM) technique with said plurality of 2D images of a scene.

22. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using optical flow on images from said plurality of 2D images of a scene.

23. The system of claim 15 wherein said processing system is configured to define said initial generated range image by using the technique of simultaneous location and mapping (SLAM) with said plurality of 2D images of a scene.

24. The system of claim 15 wherein said processing system is configured to define said initial generated range image by calculating an average plane from sparse feature matches of said plurality of 2D images of a scene.

25. The system of claim 15 wherein said processing system is configured to define said initial generated range image using a digital terrain map (DTM) corresponding to at least part of said scene.

26. The system of claim 15 wherein said processing system is configured to define said initial generated range image by assigning a uniform range value to all points in said initial generated range image.

27. The system of claim 15 wherein said processing system is configured to provide a second cost sub-functional that processes the generated range image to calculate a smoothness value.

28. The system of claim 15 wherein said processing system is configured to update said generated range image when additional 2D images become available

29. The system of claim 15 wherein said processing system is further configured to generate a three-dimensional (3D) model of the scene.

30. The system of claim 15 wherein the processing step of (b) is repeated defining different reference 2D images thereby generating a plurality of range images

31. The system of claim 30 wherein said processing system is further configured to generate a three-dimensional (3D) model of the scene using one or more of said plurality of range images.

32. The system of claim 31 wherein said processing system is further configured to update said 3D model when additional 2D images become available.