+

WO2008152607A1 - Method, apparatus, system and computer program product for depth-related information propagation - Google Patents

Method, apparatus, system and computer program product for depth-related information propagation Download PDF

Info

Publication number
WO2008152607A1
WO2008152607A1 PCT/IB2008/052340 IB2008052340W WO2008152607A1 WO 2008152607 A1 WO2008152607 A1 WO 2008152607A1 IB 2008052340 W IB2008052340 W IB 2008052340W WO 2008152607 A1 WO2008152607 A1 WO 2008152607A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
related information
depth
segmentation
information
Prior art date
Application number
PCT/IB2008/052340
Other languages
French (fr)
Inventor
Vasanth Philomin
Fang Liu
Chunfeng Shen
Original Assignee
Koninklijke Philips Electronics N.V.
Philips Intellectual Property & Standards Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V., Philips Intellectual Property & Standards Gmbh filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2008152607A1 publication Critical patent/WO2008152607A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present invention relates to a method, apparatus, system and computer program product for depth-related information generation.
  • Autostereoscopic displays generally generate an impression of a three-dimensional image by rendering different views of an object for different viewing angles. In this manner a first image can be generated for the left eye of a viewer and a second image for the right eye of the viewer.
  • a first image can be generated for the left eye of a viewer and a second image for the right eye of the viewer.
  • the source material for use with autostereoscopic displays can be generated in a variety of manners. For example multiple image sequences may be recorded using multiple suitably positioned cameras in order to record an image sequence corresponding to each and every view. Alternatively, individual image sequences can be generated using a three dimensional computer model by the autostereoscopic display.
  • depth maps provide depth information indicative of the absolute or relative distance of objects depicted in the image to a/the (virtual) camera.
  • Depth maps can provide depth- information on a per-pixel basis but as will be clear to the skilled person may also provide depth information at a coarser granularity. In certain applications it may be desirable to use a lower resolution depth-map wherein each depth-map value provides depth-information for multiple pixels in the image.
  • Disparity maps can be used as an alternative to the above mentioned depth maps. Disparity refers to the apparent shift of objects in a scene when it is observed from two distinct viewpoints, such as a left-eye and a right-eye viewpoint. This shift is larger for objects near by.
  • Disparity information and depth information are related and can be mapped onto one another using a model, such as a pinhole camera model. More information with regard to the mapping of disparity information to depth-map information can be found in "Depth Estimation from Stereoscopic Image Pairs Assuming Piecewise Continuous Surfaces", hereby incorporated by reference, by L. Falkenhagen and published in Proc. of European Workshop on combined Real and Synthetic Image Processing for Broadcast and Video Production, Hamburg, November 1994.
  • depth-related information is used throughout the description and is understood to comprise information such as depth information as well as disparity information.
  • the autostereoscopic display can render multiple views of the content for one or more viewers.
  • newly created content might be provided with accurately recorded depth-related information
  • conventional 2-D image sequences generally do not comprise the required depth-related information.
  • a known approach for converting 2D to 3D information is presented in International Patent Application WO200213141.
  • a neural network is trained using a manually annotated key image in order to learn the relationship between image characteristics and depth characteristics.
  • the trained neural network is subsequently used to generate depth information for key- frames.
  • depth maps of one or more key- frames are used to generate depth maps for non key- frames, using information such as the relative location, image characteristics and distance to the respective key-frame(s).
  • a problem with the above approach is that it handles scenes wherein objects and/or regions with similar color and/or image characteristics are located at different depths.
  • the present invention proposes to evolve depth-related information and segmentation-related information in an image sequence using a probabilistic network.
  • depth-related information such as depth-information or disparity information
  • segmentation-related information is annotated to at least one image of the image sequence. This information is used as initialization for the probabilistic network.
  • the present invention proposes to pose the depth-related information and segmentation-related information propagation problem as a Bayesian inference problem, wherein the solution is defined as being the maximum a posteriori (MAP) probability estimate of the true labeling.
  • MAP maximum a posteriori
  • a Markov Random Field (MRF) network is used to evolve a labeling comprising the depth-related information as well as a mapping indicative of the relationship between respective (groups of) pixels in the current image and those of the previously processed image.
  • the MRF network evolves the labeling and mapping based on observations in the form of image characteristics such as e.g. color, intensity, texture and/or curvature of edges in the current image.
  • the MRF network can incorporate a variety of prior contextual information, such as the labeling, the mapping in a quantative manner.
  • the MRF network can achieve an optimal solution within the limitations of the probabilistic network by minimizing the posterior energy which comprises contributions from both the labeling and the mapping. In this manner the present invention can generate depth-related information for images while simultaneously taking both depth-related information, segmentation-related information and motion of objects in the image into account.
  • a further advantage of the use of an MRF network is that it is well suited for further extensions. Yet another advantage of using an MRF network it is that is substantially parallel and as a result is well suited for parallel processing.
  • the Bayesian inference problem is extended to also comprise segmentation of the image.
  • a segment here is understood to be a region with some common properties, such as chrominance, luminance, and/or texture, that typically moves as a whole and has depth- related information associated with it.
  • the labeling comprises both segmentation information such as e.g. a segment index, as well as the depth-related information.
  • the MRF network may be further enhanced to capitalize on the fact that in general segment shapes do not exhibit major variations between consecutive images.
  • the MRF network may be enhanced to encode a curvature smoothness model which allows enforcement of constraints on the shape of the respective segments, thereby warranting temporal segment shape continuity.
  • the MRF network is arranged to take account of a time-invariant segment specific color models.
  • a color model can be generated based on image characteristics of the segment from the image for which the depth-related information was annotated.
  • the nodes within the MRF are organized in a pair- wise manner. Consequently the MRF is capable of encoding a spatial smoothness constraint and enables the use of a fast inference algorithm such as e.g. Graph Cut or Belief Propagation, to estimate the MAP solution.
  • a fast inference algorithm such as e.g. Graph Cut or Belief Propagation
  • the neighborhood of a node is defined by its 8- neighborhood. This allows enforcement of more complicated constraints such as curvature smoothness.
  • the image sequence corresponds to a shot, and one image within the shot is annotated, preferably the image with the largest number of visible objects and/or segments.
  • Depth-related information is provided for the image as well as segment information provided that is required by the embodiment in question.
  • the labeling can be evolved both forward and backward in time towards the shot boundaries.
  • the image with the largest number of visible objects is annotated in order to provide an efficient initialization.
  • at least two images within an image sequence are annotated and the MRF network is arranged to apply bi-directional propagation of the depth-related information. To this end the MRF network needs to estimate the labeling and mapping for both the forward and backward propagation at the same time.
  • further constraints such as a constraint with regard to the similarity between forward and backward mapping could be encoded into the MRF network in order to improve temporal consistency of the mapping.
  • a further embodiment of the present invention comprises an apparatus according to claim 14 that is arranged to propagate depth-related information and segmentation-related information in a manner that takes movement of objects in a scene into account in a more direct manner.
  • a further embodiment of the present invention comprises a system according to claim 18 that is arranged to propagate depth-related information and segmentation-related information in a manner that takes movement of objects in a scene into account in a more direct manner.
  • a further embodiment of the present invention comprises a computer program product according to claim 23.
  • Fig. 1 shows a schematic overview of the composition of an image as an overlay of segments
  • Fig. 2 shows a graphical model of a single node in an MRF according to the present invention
  • Fig. 3 shows a graphical model of a MRF according to the present invention
  • Fig. 4 shows several a flow-chart of a method according to the present invention
  • FIG. 5 shows several depth maps propagated using the present invention
  • Fig. 6 shows an apparatus according to the present invention
  • FIG. 7 shows a system according to the present invention.
  • the Figures are not drawn to scale. Generally, identical components are denoted by the same reference numerals in the Figures.
  • the present invention proposes to evolve depth-related information and segmentation-related information in an image sequence using a probabilistic network.
  • the present invention can be applied to images without the need for segmentation.
  • the present invention may advantageously incorporate segmentation of images.
  • the probabilistic network is arranged to evolve both the depth-related information as well as the segments.
  • a segment within the context of this specification is understood to be a region with some common properties, such as chrominance, luminance, and/or texture, that typically moves as a whole and has depth-related information associated with it. It will be clear to the skilled person that such image characteristics can be defined regardless of the color space representation, such as e.g. RGB, or YUV.
  • the depth-related information discussed here may be information such as, but not limited to, depth-information, disparity information and/or (de-)occlusion information
  • the method according to the invention utilizes annotated depth-related information and segmentation information of at least one image of the image sequence. This information may comprise e.g. a segment index and depth-values associated with the respective segments.
  • the method according to this embodiment comprises a step for annotating the image sequence, this is not always required. Instead an annotated image sequence, which was previously annotated, can be used to equal effect.
  • the annotation step in turn might be a manual, semi-automatic, or fully automatic annotation process.
  • the annotated information provides an initial labeling for at least one image of the image sequence. Although it is not mandatory to provide an initial segment labeling for an image, it will be clear to the skilled person that by providing a reliable initial labeling the evolution process can be made much more efficient.
  • the segments are evolved across subsequent images of the image sequence together with their depth-related information using a probabilistic network.
  • the problem of evolving the segments can be posed as a Bayesian inference problem wherein the solution is defined as the Maximum A Posteriori (MAP) labeling.
  • the MAP labeling is obtained by minimizing the a posteriori energy.
  • the MAP labeling is an estimate of the true optimum, but is the best possible based on random observations within the probabilistic network.
  • Sites may represent e.g. pixels, or multiple pixels, but may equally well represent more complex objects such as lines.
  • sites may represent either pixels or multiple pixels that are substantially spatially homogeneous. The actual relationship between sites can be determined by a so-called neighborhood system, which will be discussed later.
  • D be a set of labels. Labeling is to assign a label from the set of labels D to each of the sites in d.
  • F [F 1 ,...,F N ) , i.e. F comprises a random variable for each site.
  • segmentation of an image can be posed as a labeling problem.
  • the set of sites d comprises sites corresponding with the pixels of the image being segmented.
  • Segmentation now corresponds to labeling each of the sites/pixels with a segment index such that all pixels belonging to a segment have one and the same segment index.
  • Bayesian statistics are subsequently used to define an optimization problem that aims to find an optimal configuration/labeling based on quantative criteria that incorporate observations in the form of image characteristic.
  • observations may relate e.g. to the color, intensity, texture distribution within respective segments.
  • Bayesian statistics can be used to incorporate such information in the optimization process.
  • the a priori probability P(f) also referred to as the prior, comprises probability distributions that express uncertainty before evidence is taken into account.
  • the prior depends on how various prior constraints are expressed.
  • the likelihood function p(r I /) in turn relates to how data is observed and is problem domain dependent.
  • MRF Markov Random Field
  • Belief Propagation techniques can estimate the MAP solution of MRF networks through independent local (message-passing) operations. Consequently, they are particularly well suited parallel processing. More background on the use of MRF modeling can be found in "Markov Random Field modeling in image analysis.” by S.Z. Li, Springer- Verlag, 2001, hereby incorporated by reference.
  • the present invention can be applied in the generation of depth-related information from 2D content.
  • the present invention will typically be used to propagate manually or automatically generated depth related-information available for images in the sequence, to further images in the image sequence.
  • the propagation of depth information throughout the image sequence can be posed as a labeling problem.
  • the present invention capitalizes on the fact that there is a strong correlation between the labeling and the mapping in the images. Moreover there is a strong correlation between the mapping and segmentation. By evolving all of these together the present invention effectively improves the depth map propagation compared to the prior art, in that it takes into account both the labeling and the mapping simultaneously, whereas the prior art does not.
  • the use of segments moreover provides a further advantage in that the segments provide a compact abstraction of the image contents.
  • the detection and tracking of segments enables segment based operations. This can be useful e.g. when an object is moving from the background to the foreground, or is appearing from/disappearing behind another segment.
  • segments the present invention effectively simplifies handling of occlusion and de-occlusion in an elegant manner.
  • Segments here are understood to be regions with some common properties, such as chrominance, luminance, texture and/or curvature of the edge. Typically segments move as a whole and have depth-related information associated with it.
  • Fig. 1 shows an image sequence comprising images 100, 101, 102, 103, and 104. Each image of the image sequence can be interpreted as a combination of several segments.
  • Fig. 1 illustrates that image 100 which shows a triangular object 121 and a circular object 131 over a grey background. The image 100 can be interpreted as three overlaid segments; a first segment 110 comprising the grey background, a second segment 120 comprising the triangular object and a third segment 130 comprising the circular object.
  • An MRF network is a model of a joint probability distribution of a set of random variables.
  • An MRF network comprises multiple nodes.
  • the nodes are the basic units of the network.
  • Fig. 2 presents a graphical model of a single node in an MRF network according to the present invention.
  • a node may represent a single pixel, or could alternatively represent multiple pixels, such as a region of an image.
  • An example of a scenario wherein a node represents multiple pixel would be a scenario wherein depth-related information are calculated at a resolution lower than that of the 2D image. In this manner calculating and propagating the depth-related information will be more computationally efficient. For the sake of simplicity here we consider the scenario wherein a node corresponds to a single pixel.
  • Fig. 2 presents a graphical model of a single node in an MRF network according to the present invention.
  • the circles represent hidden state of the node, whereas the boxes represent the observations.
  • this model :
  • / represents the depth-related information and segment labeling of the node of the current image
  • m represents the mapping of the current image with respect to the previous image
  • c represents the set of observed image characteristics such as color characteristics derived from the current image
  • represents a set of color HiOcIeIsG 1 , wherein i is a segment index and ⁇ ; represents the color model conditioned on segment i.
  • the hidden state information / will typically comprise the depth-related information and segment label of that pixel.
  • the state information m represents the mapping of the pixel in the current image with respect to the previous image.
  • the state information m can comprise e.g. a 2D vector (x ⁇ y') which indicates the corresponding pixel in the previous image.
  • the 2D vector may encode an offset in the previous image (dx, dy) .
  • the actual format of the information is not relevant as long as it provides information with regard to the mapping.
  • the observed set of image characteristics C represents color characteristics it will be clear to the skilled person that these can be replaced by and/or augmented with intensity characteristics, texture characteristics, or other image characteristics known to those skilled in the art.
  • the set of segment specific color models are typically time-invariant. Segment specific color models are typically constructed using the color information from the respective segment in an annotated image in the image sequence. Based on the initial segment labels color models can be generated for each segment.
  • the color model can be parametric, such as when using Gaussian mixtures, or can be non-parametric, such as when using histograms. Alternatively they may be discriminative; i.e. differentiating between foreground and background.
  • segment specific color models are time-invariant there are situations where time- variant color models may be beneficial.
  • time-variant color models may be beneficial.
  • the image sequence comprises multiple annotated images.
  • the two color models for one and the same segment may be mixed.
  • the contribution of the color models is weighted based on the distance to the annotated image.
  • the joint probability corresponding to the graphical model presented in Fig. 2 can be factorized as shown in eq. 3.
  • p ⁇ l, m,c ⁇ ⁇ ) p ⁇ l)p ⁇ m
  • p(c I m,l, ⁇ ) as such is typically intractable due to the large number of combination of labeling and mapping in practical applications.
  • m,l, ⁇ ) is preferably approximated using p(c ⁇ m)p(c
  • alternative approximations can be used such as approximations based on the use of the principle of structured variational approximation.
  • the variational technique approximates the intractable probability distribution with another tractable probability distribution through minimizing the Kullback-Leibler divergence between them.
  • the variational technique refers to the paper "On structured variational approximations.”, by Ghahramani, Z. (1997), hereby incorporated by reference, Technical Report CRG-TR-97-1, Department of Computer Science, University of Toronto.
  • the MRF network described above comprises multiple nodes which connect to their respective neighbors.
  • the connectivity of the nodes or neighborhood is shown in Fig. 3.
  • the connectivity of the MRF network is pair- wise. It will be clear to the skilled person that the present invention is not limited to two-node cliques MRF networks, but the example is restricted thereto for the sake of simplicity.
  • Fig. 3 the nodes i andy represent neighboring nodes in the MRF.
  • / represent the labeling of the respective nodes.
  • MRF network forms a segment labeling field L.
  • Wi 1 and m ⁇ represent the mapping of the nodes i and j, the set of all mappings in turn defines a mapping field M.
  • C 1 and c ⁇ represent the observations for the respective nodes. These could correspond to e.g. the color value(s) of the corresponding (plurality of) pixel(s) or other image characteristics.
  • the set of all observations defines an observation field C.
  • corresponds to the set of color models as defined before.
  • Z is a normalization constant ⁇ ; is defined as the evidence of the hidden state or the compatibility function of the hidden state and the observations
  • is defined as the compatibility function between the neighboring nodes.
  • N is the neighborhood system defined on the network, here the set of all possible node pairs.
  • the solution of the segment evolution problem is defined as the MAP probability estimate of the labeling and the mapping.
  • the computation is preferably performed on negative log probability as shown below, which corresponds to the energy function definitions.
  • the definition of the energy functions is given for this particular case by way of example.
  • E(L,M) ⁇ E ⁇ (c 1 ,l 1 ,m 1 ) + ⁇ ⁇ E 2 (l ,,m, , / ⁇ ,m , ) (eq. 5)
  • the energy function E(L, M) comprises the contextual information of a first node i, as well as the contextual information encoded in the links between neighboring nodes i andy.
  • the energy of the contextual information of the first node i can be written as:
  • E 1 (C 1 , l 1 ,m 1 ) E ⁇ (l i ,c 1 ) + E: (m 1 , C 1 ) (eq. 6)
  • ETM (Tn 1 , C 1 ) can be expressed as being:
  • the second energy component E 2 (I 1 is the interaction term between the two neighboring nodes i andy.
  • the second energy component can be expressed as:
  • T 0 is a predefined cost and E 2 (M 1 ,m ⁇ ) is defined as:
  • Fig. 4 shows a flowchart of a method according to the present invention.
  • step 410 image sequence 405 is processed and depth-related information 415 is generated for at least one image of the image sequence 405.
  • depth-related information 415 is generated for an image sequence, such as those disclosed in International Patent Applications WO2005/013623 and WO2005/083630 by the same applicant, hereby incorporated by reference.
  • step 420 the generated depth-related information is combined with manually entered depth-related information 425 and segmentation-related information resulting in annotation information 435 for the at least one image.
  • annotation information 435 may comprise other information that can be annotated to the at least one image.
  • the annotation information 435 is subsequently annotated to the image sequence 405 in step 430.
  • the annotated image sequence 445 is subsequently used to determine the MAP solution for both the labeling and mapping for at least one further image in the image sequence 405, in step 440, in the process taking into account the evidence in the form image sequence 405.
  • the MAP solution in turn comprises the propagated depth-related information.
  • the process of determining a MAP solution can be subsequently repeated for further images in the sequence 405 based on the MAP solution that was just established until all images with the image sequence are annotated.
  • the present invention can be used to propagate depth-related information and segmentation-related information for consecutive images in an image sequence.
  • depth-related information and segmentation-related information propagation would be in a forward direction; i.e. from the current image to another image forward in time
  • the present invention may also be applied on images in an image sequence in reverse order, thereby effectively propagating annotated depth-related information and segmentation-related information backwards in time.
  • Fig. 5 shows an example of three input images for which a depth map and segmentation were generated according to the present invention.
  • the images 501, 502 and 503 represent the original images in sequence.
  • the images 504, 505 and 506 represent the corresponding propagated depth-related information.
  • FIG. 6 shows an apparatus 600 according to the present invention arranged to propagate depth-related information and segmentation-related information in an image sequence.
  • the apparatus comprises two input connectors 605 and 615.
  • the input connectors 615 and 605 are used to receive annotation information 435 for at least one image of the image sequence 405 and the image sequence 405 respectively.
  • the annotating means 610 is arranged to annotate at least one image of the image sequence 405 using annotation information 435.
  • the annotated image sequence 445 is subsequently processed in accordance with the present invention by processing means 620 in order to establish the MAP solution for both the labeling and mapping using a probabilistic network for at least one further image in the image sequence.
  • the processing means 620 may also provide the image sequence 405 on optional output connector 625 and segmentation information on optional output connector 645.
  • the apparatus 600 is well-suited for embedding within more complex devices, such as set-top boxes and/or auto stereoscopic displays.
  • a further apparatus 650 in accordance with the present invention.
  • This device further comprises generation means 630 which is arranged to generate annotation information 435 based on an input sequence 405.
  • the apparatus 650 comprises an input connector 605 for receiving an image sequence 405.
  • the image sequence 405 is subsequently presented to the generation means 630 for generating annotation information 435.
  • the apparatus 650 autonomously generates the annotation information 435 for use in the processing means 620.
  • the present invention may be implemented on a variety of processing platforms. These may range from dedicated hardware platforms that comprise a plurality of massively parallel processor arrays, to general purpose processing on single processor platforms. Moreover the generation means 630, the annotation means 610 and the processing means 620 may be implemented on one and the same processing platform in a substantially sequential or parallel manner, i.e. as far as algorithmic constraints allow parallelism. Finally the implementation of the present invention may be implemented primarily in software e.g. on a programmable computing platform, or alternatively can be mapped primarily on hardware e.g. on a dedicated Application Specific Integrated Circuit (ASIC).
  • ASIC Application Specific Integrated Circuit
  • Fig. 7 shows a system 700 according to the present invention.
  • the system 700 comprises several devices according to the present invention.
  • the system comprises a storage server 755, which might be local or remote and/or a network server 760.
  • Each of these servers 755,760 can provide both image sequence data 405 over network 750.
  • they may be further arranged to also provide annotation information 435 over network 750.
  • This information may be provided e.g. to an apparatus 600 according to the present invention for further processing.
  • the image sequence 405 can be provided to a Set Top Box (STB) 707 comprising an apparatus 650 according to the present invention that is connected to an autostereoscopic display 705.
  • STB Set Top Box
  • the image sequence data 405 can be provided to an apparatus 710 that comprises the functionality of the above mentioned STB 707 and the autostereoscopic display 705.
  • the image sequence 405 may also be provided to a compute server 720 that is arranged to execute instructions stored on a data carrier 730, which instructions when executed by the compute server 720 perform the steps of a method in accordance with the present invention.
  • a compute server 720 that is arranged to execute instructions stored on a data carrier 730, which instructions when executed by the compute server 720 perform the steps of a method in accordance with the present invention.
  • the MRF network can be furthermore enhanced to constrain the effects that such imported depth-related information estimations may have on the labeling and mapping. In this manner temporal stability can be substantially preserved and erratic behavior resulting from external depth-related information can be prevented. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The present invention relates to an apparatus, a system, a method and a computer program product of propagating depth-related information and segmentation.related information associated with at least one image from an image sequence to a consecutive image in the image sequence using a probabilistic network. The method comprising use of the probabilistic network to solve a Bayesian labeling problem wherein the labeling comprises the depth-related information and segmentation.related information and wherein the node links of the probabilistic network are configured to simultaneously account for constraints imposed by the depth-related information and segmentation.related information of the consecutive image and mapping information for the respective node from the at least one image to the consecutive image. The nodes taking into account evidence in the form of image characteristics from the consecutive image, such that propagated depth- related information and segmentation.related information is established for the consecutive image by establishing a Maximum A Posteriori solution for both the labeling and the mapping. The Bayesian problem comprises segmentation of the consecutive image, and propagation of depth information, based on segmentation and mapping information stored in the probabilistic network constructed from the at least one image to the consecutive image.

Description

Method, apparatus, system and computer program product for depth-related information propagation
FIELD OF THE INVENTION
The present invention relates to a method, apparatus, system and computer program product for depth-related information generation.
BACKGROUND OF THE INVENTION
Over the last few years a variety of auto-stereoscopic displays have been designed suitable for rendering three-dimensional imagery without the need for special headgear and/or glasses. Autostereoscopic displays generally generate an impression of a three-dimensional image by rendering different views of an object for different viewing angles. In this manner a first image can be generated for the left eye of a viewer and a second image for the right eye of the viewer. By displaying appropriate images, i.e. appropriate from the viewpoint of the left and right eye respectively, it is possible to convey an impression of a three-dimensional image to the viewer.
The source material for use with autostereoscopic displays can be generated in a variety of manners. For example multiple image sequences may be recorded using multiple suitably positioned cameras in order to record an image sequence corresponding to each and every view. Alternatively, individual image sequences can be generated using a three dimensional computer model by the autostereoscopic display.
However, in order to maintain backwards compatibility and improve on bandwidth usage many of the current auto-stereoscopic displays and or auto-stereoscopic display drivers use an input in the form of a sequence of conventional images and a corresponding sequence of depth maps.
Generally such depth maps provide depth information indicative of the absolute or relative distance of objects depicted in the image to a/the (virtual) camera. Depth maps can provide depth- information on a per-pixel basis but as will be clear to the skilled person may also provide depth information at a coarser granularity. In certain applications it may be desirable to use a lower resolution depth-map wherein each depth-map value provides depth-information for multiple pixels in the image. Disparity maps can be used as an alternative to the above mentioned depth maps. Disparity refers to the apparent shift of objects in a scene when it is observed from two distinct viewpoints, such as a left-eye and a right-eye viewpoint. This shift is larger for objects near by. Disparity information and depth information are related and can be mapped onto one another using a model, such as a pinhole camera model. More information with regard to the mapping of disparity information to depth-map information can be found in "Depth Estimation from Stereoscopic Image Pairs Assuming Piecewise Continuous Surfaces", hereby incorporated by reference, by L. Falkenhagen and published in Proc. of European Workshop on combined Real and Synthetic Image Processing for Broadcast and Video Production, Hamburg, November 1994.
In view of the above the term depth-related information is used throughout the description and is understood to comprise information such as depth information as well as disparity information.
By providing an autostereoscopic display with an image sequence and a corresponding sequence of depth-related information the autostereoscopic display can render multiple views of the content for one or more viewers. Although newly created content might be provided with accurately recorded depth-related information, conventional 2-D image sequences generally do not comprise the required depth-related information.
Various approaches exist that address the conversion of two-dimensional content into three-dimensions. Some of these approaches address real-time conversion, e.g. those implemented in an autostereoscopic display, whereas others address off-line conversion, e.g. in case of high-end 2D-to-3D movie conversions.
A known approach for converting 2D to 3D information is presented in International Patent Application WO200213141. According to this approach a neural network is trained using a manually annotated key image in order to learn the relationship between image characteristics and depth characteristics. The trained neural network is subsequently used to generate depth information for key- frames. During a second phase depth maps of one or more key- frames are used to generate depth maps for non key- frames, using information such as the relative location, image characteristics and distance to the respective key-frame(s).
A problem with the above approach is that it handles scenes wherein objects and/or regions with similar color and/or image characteristics are located at different depths.
SUMMARY OF THE INVENTION It is an object of the present invention to provide an alternative solution to generate depth maps that takes movement of objects and/or regions in a scene into account in the process of propagating depth-related information.
This object is realized in a method according to the method of claim 1 and an apparatus and a system according to the claims 14 and 18 respectively and a computer program according to claim 23.
The present invention proposes to evolve depth-related information and segmentation-related information in an image sequence using a probabilistic network. In a first step depth-related information, such as depth-information or disparity information, and segmentation-related information is annotated to at least one image of the image sequence. This information is used as initialization for the probabilistic network. The present invention proposes to pose the depth-related information and segmentation-related information propagation problem as a Bayesian inference problem, wherein the solution is defined as being the maximum a posteriori (MAP) probability estimate of the true labeling. A Markov Random Field (MRF) network is used to evolve a labeling comprising the depth-related information as well as a mapping indicative of the relationship between respective (groups of) pixels in the current image and those of the previously processed image. As a result the method of the present invention does accurately capture movement and depth-related information. The MRF network evolves the labeling and mapping based on observations in the form of image characteristics such as e.g. color, intensity, texture and/or curvature of edges in the current image. The MRF network can incorporate a variety of prior contextual information, such as the labeling, the mapping in a quantative manner. During optimization the MRF network can achieve an optimal solution within the limitations of the probabilistic network by minimizing the posterior energy which comprises contributions from both the labeling and the mapping. In this manner the present invention can generate depth-related information for images while simultaneously taking both depth-related information, segmentation-related information and motion of objects in the image into account.
A further advantage of the use of an MRF network is that it is well suited for further extensions. Yet another advantage of using an MRF network it is that is substantially parallel and as a result is well suited for parallel processing.
The Bayesian inference problem is extended to also comprise segmentation of the image. A segment here is understood to be a region with some common properties, such as chrominance, luminance, and/or texture, that typically moves as a whole and has depth- related information associated with it. The labeling comprises both segmentation information such as e.g. a segment index, as well as the depth-related information. By incorporating segmentation of the current image in the Bayesian labeling problem the present invention moreover provides a compact representation of the image in the form of the segments. These segments allow segment based operations and in addition enable an efficient solution for handling occlusion and deocclusion.
The MRF network may be further enhanced to capitalize on the fact that in general segment shapes do not exhibit major variations between consecutive images. The MRF network may be enhanced to encode a curvature smoothness model which allows enforcement of constraints on the shape of the respective segments, thereby warranting temporal segment shape continuity.
In another embodiment of the above embodiment that incorporates segmentation, the MRF network is arranged to take account of a time-invariant segment specific color models. Such a color model can be generated based on image characteristics of the segment from the image for which the depth-related information was annotated.
In a further preferred embodiment the nodes within the MRF are organized in a pair- wise manner. Consequently the MRF is capable of encoding a spatial smoothness constraint and enables the use of a fast inference algorithm such as e.g. Graph Cut or Belief Propagation, to estimate the MAP solution. In a further embodiment the neighborhood of a node is defined by its 8- neighborhood. This allows enforcement of more complicated constraints such as curvature smoothness.
In a further embodiment the image sequence corresponds to a shot, and one image within the shot is annotated, preferably the image with the largest number of visible objects and/or segments. Depth-related information is provided for the image as well as segment information provided that is required by the embodiment in question. After annotation the labeling can be evolved both forward and backward in time towards the shot boundaries. Preferably the image with the largest number of visible objects is annotated in order to provide an efficient initialization. In a further embodiment of the present invention at least two images within an image sequence are annotated and the MRF network is arranged to apply bi-directional propagation of the depth-related information. To this end the MRF network needs to estimate the labeling and mapping for both the forward and backward propagation at the same time. In addition further constraints such as a constraint with regard to the similarity between forward and backward mapping could be encoded into the MRF network in order to improve temporal consistency of the mapping.
A further embodiment of the present invention comprises an apparatus according to claim 14 that is arranged to propagate depth-related information and segmentation-related information in a manner that takes movement of objects in a scene into account in a more direct manner.
A further embodiment of the present invention comprises a system according to claim 18 that is arranged to propagate depth-related information and segmentation-related information in a manner that takes movement of objects in a scene into account in a more direct manner.
A further embodiment of the present invention comprises a computer program product according to claim 23.
BRIEF DESCRIPTION OF THE DRAWINGS These and other advantageous aspects of the invention will be described in more detail using the following Figures in which
Fig. 1, shows a schematic overview of the composition of an image as an overlay of segments;
Fig. 2, shows a graphical model of a single node in an MRF according to the present invention;
Fig. 3, shows a graphical model of a MRF according to the present invention; Fig. 4, shows several a flow-chart of a method according to the present invention;
Fig. 5, shows several depth maps propagated using the present invention; Fig. 6, shows an apparatus according to the present invention and
Fig. 7, shows a system according to the present invention. The Figures are not drawn to scale. Generally, identical components are denoted by the same reference numerals in the Figures.
DETAILED DESCRIPTION OF EMBODIMENTS
The present invention proposes to evolve depth-related information and segmentation-related information in an image sequence using a probabilistic network. The present invention can be applied to images without the need for segmentation. However the present invention may advantageously incorporate segmentation of images. In the latter case the probabilistic network is arranged to evolve both the depth-related information as well as the segments.
Hereafter the invention will be explained by way of example using a probabilistic network that is arranged to evolve both the depth-related information as well as the segments. The color models presented below are primarily used to provide additional differentiation between segments, the use of color models can be discarded when not using segmentation.
A segment within the context of this specification is understood to be a region with some common properties, such as chrominance, luminance, and/or texture, that typically moves as a whole and has depth-related information associated with it. It will be clear to the skilled person that such image characteristics can be defined regardless of the color space representation, such as e.g. RGB, or YUV. The depth-related information discussed here may be information such as, but not limited to, depth-information, disparity information and/or (de-)occlusion information The method according to the invention utilizes annotated depth-related information and segmentation information of at least one image of the image sequence. This information may comprise e.g. a segment index and depth-values associated with the respective segments. Although the method according to this embodiment comprises a step for annotating the image sequence, this is not always required. Instead an annotated image sequence, which was previously annotated, can be used to equal effect. The annotation step in turn might be a manual, semi-automatic, or fully automatic annotation process.
The annotated information provides an initial labeling for at least one image of the image sequence. Although it is not mandatory to provide an initial segment labeling for an image, it will be clear to the skilled person that by providing a reliable initial labeling the evolution process can be made much more efficient.
The segments are evolved across subsequent images of the image sequence together with their depth-related information using a probabilistic network. The problem of evolving the segments can be posed as a Bayesian inference problem wherein the solution is defined as the Maximum A Posteriori (MAP) labeling. The MAP labeling is obtained by minimizing the a posteriori energy. The MAP labeling is an estimate of the true optimum, but is the best possible based on random observations within the probabilistic network.
Most vision problems can be posed as a Bayesian inference problem taking into account constraints resulting from prior knowledge and observations. A labeling problem can be specified in terms of a set of sites and a set of labels. Let d be a set of TV discrete sites such that d = {\,...,N} .
Sites may represent e.g. pixels, or multiple pixels, but may equally well represent more complex objects such as lines. In the context of the present application we restrict sites to represent either pixels or multiple pixels that are substantially spatially homogeneous. The actual relationship between sites can be determined by a so-called neighborhood system, which will be discussed later. Let D be a set of labels. Labeling is to assign a label from the set of labels D to each of the sites in d.
Let Fbe a set of random variables defined on d such thatF = [F1,...,FN ) , i.e. F comprises a random variable for each site. A joint event ((F1 = fx , ... , FN = fN } is called a realization of F and / = [fγ ,...,/ N) is called a configuration of F.
By way of example, segmentation of an image can be posed as a labeling problem. In this case the set of sites d comprises sites corresponding with the pixels of the image being segmented. Segmentation now corresponds to labeling each of the sites/pixels with a segment index such that all pixels belonging to a segment have one and the same segment index.
In accordance with the present invention Bayesian statistics are subsequently used to define an optimization problem that aims to find an optimal configuration/labeling based on quantative criteria that incorporate observations in the form of image characteristic. In case of the above segmentation problem such observations may relate e.g. to the color, intensity, texture distribution within respective segments. Bayesian statistics can be used to incorporate such information in the optimization process.
For example consider that we know the a priori probabilities P(f) of configurations f and the likelihood densities of p(r | /)of the observation r. Within this framework the best configuration possible is that configuration that maximizes the a posteriori probability (MAP). The posterior probability can be computed as using the
Bayesian rule:
P(/ I r) _ P(r I f)P(f) (eq. 1) P(r) It should be noted that the density function of r does not affect the MAP solution. Using Bayesian statistics the labeling problem can be written as finding the MAP configuration /wherein:
/ = argmax /eD' P(F = / | r) (eq.2)
The a priori probability P(f) , also referred to as the prior, comprises probability distributions that express uncertainty before evidence is taken into account. The prior depends on how various prior constraints are expressed. The likelihood function p(r I /) in turn relates to how data is observed and is problem domain dependent.
Unfortunately incorporating prior constraints in the prior probability is not that simple. However Markov Random Field (MRF) theory provides tools to encode such contextual information and/or constraints in the prior. MRF based methods incorporate contextual information and constraints in a quantative to obtain the MAP solution in an efficient manner. It is typically difficult to find the global optimal solution since mostly the exact inference is intractable in the problem domain. However the optimization methods such as Belief Propagation can achieve good results in practice.
Belief Propagation techniques can estimate the MAP solution of MRF networks through independent local (message-passing) operations. Consequently, they are particularly well suited parallel processing. More background on the use of MRF modeling can be found in "Markov Random Field modeling in image analysis." by S.Z. Li, Springer- Verlag, 2001, hereby incorporated by reference.
The present invention can be applied in the generation of depth-related information from 2D content. The present invention will typically be used to propagate manually or automatically generated depth related-information available for images in the sequence, to further images in the image sequence. The propagation of depth information throughout the image sequence can be posed as a labeling problem. The present invention capitalizes on the fact that there is a strong correlation between the labeling and the mapping in the images. Moreover there is a strong correlation between the mapping and segmentation. By evolving all of these together the present invention effectively improves the depth map propagation compared to the prior art, in that it takes into account both the labeling and the mapping simultaneously, whereas the prior art does not. The use of segments moreover provides a further advantage in that the segments provide a compact abstraction of the image contents. The detection and tracking of segments enables segment based operations. This can be useful e.g. when an object is moving from the background to the foreground, or is appearing from/disappearing behind another segment. Through the use of segments the present invention effectively simplifies handling of occlusion and de-occlusion in an elegant manner.
Segments here are understood to be regions with some common properties, such as chrominance, luminance, texture and/or curvature of the edge. Typically segments move as a whole and have depth-related information associated with it. Fig. 1 shows an image sequence comprising images 100, 101, 102, 103, and 104. Each image of the image sequence can be interpreted as a combination of several segments. Fig. 1 illustrates that image 100 which shows a triangular object 121 and a circular object 131 over a grey background. The image 100 can be interpreted as three overlaid segments; a first segment 110 comprising the grey background, a second segment 120 comprising the triangular object and a third segment 130 comprising the circular object.
As indicated above the present invention is preferably implemented using an MRF network. An MRF network is a model of a joint probability distribution of a set of random variables. An MRF network comprises multiple nodes. The nodes are the basic units of the network. Fig. 2 presents a graphical model of a single node in an MRF network according to the present invention. A node may represent a single pixel, or could alternatively represent multiple pixels, such as a region of an image. An example of a scenario wherein a node represents multiple pixel would be a scenario wherein depth-related information are calculated at a resolution lower than that of the 2D image. In this manner calculating and propagating the depth-related information will be more computationally efficient. For the sake of simplicity here we consider the scenario wherein a node corresponds to a single pixel.
Fig. 2 presents a graphical model of a single node in an MRF network according to the present invention. In this graphical model the circles represent hidden state of the node, whereas the boxes represent the observations. In this model:
/ : represents the depth-related information and segment labeling of the node of the current image; m : represents the mapping of the current image with respect to the previous image; c : represents the set of observed image characteristics such as color characteristics derived from the current image; and θ : represents a set of color HiOcIeIsG1 , wherein i is a segment index and θ; represents the color model conditioned on segment i.
In a situation wherein a node refers to a single pixel, the hidden state information / will typically comprise the depth-related information and segment label of that pixel. The state information m represents the mapping of the pixel in the current image with respect to the previous image. In case a node refers to a single pixel the state information m can comprise e.g. a 2D vector (x\ y') which indicates the corresponding pixel in the previous image. Alternatively the 2D vector may encode an offset in the previous image (dx, dy) . The actual format of the information is not relevant as long as it provides information with regard to the mapping.
Although in the example shown in Fig. 2 the observed set of image characteristics C represents color characteristics it will be clear to the skilled person that these can be replaced by and/or augmented with intensity characteristics, texture characteristics, or other image characteristics known to those skilled in the art. The set of segment specific color models are typically time-invariant. Segment specific color models are typically constructed using the color information from the respective segment in an annotated image in the image sequence. Based on the initial segment labels color models can be generated for each segment. The color model can be parametric, such as when using Gaussian mixtures, or can be non-parametric, such as when using histograms. Alternatively they may be discriminative; i.e. differentiating between foreground and background.
Although generally the segment specific color models are time-invariant there are situations where time- variant color models may be beneficial. For example in a scenario where the appearance of the segment in the image sequence changes of over time. Another scenario where use of time-variant color models may be beneficial is the situation where the image sequence comprises multiple annotated images. When evolving the labeling and mapping for images in between the annotated images the two color models for one and the same segment may be mixed. Preferably the contribution of the color models is weighted based on the distance to the annotated image. The joint probability corresponding to the graphical model presented in Fig. 2 can be factorized as shown in eq. 3.
p{l, m,c \ θ ) = p{l)p{m | l)p{c \ m,l,θ ) (eq. 3 ) p(c I m,l,θ ) as such is typically intractable due to the large number of combination of labeling and mapping in practical applications. However, p(c | m,l,θ ) is preferably approximated using p(c \ m)p(c | /,θ ) . It should be noted that alternative approximations can be used such as approximations based on the use of the principle of structured variational approximation. The variational technique approximates the intractable probability distribution with another tractable probability distribution through minimizing the Kullback-Leibler divergence between them. For more information with regard to the variational technique refer to the paper "On structured variational approximations.", by Ghahramani, Z. (1997), hereby incorporated by reference, Technical Report CRG-TR-97-1, Department of Computer Science, University of Toronto.
The MRF network described above comprises multiple nodes which connect to their respective neighbors. The connectivity of the nodes or neighborhood is shown in Fig. 3. The connectivity of the MRF network is pair- wise. It will be clear to the skilled person that the present invention is not limited to two-node cliques MRF networks, but the example is restricted thereto for the sake of simplicity.
In Fig. 3 the nodes i andy represent neighboring nodes in the MRF. In this figure /; and / represent the labeling of the respective nodes. The labeling of all node in the
MRF network forms a segment labeling field L. Analogously Wi1 and m} represent the mapping of the nodes i and j, the set of all mappings in turn defines a mapping field M.
Finally C1 and c} represent the observations for the respective nodes. These could correspond to e.g. the color value(s) of the corresponding (plurality of) pixel(s) or other image characteristics. The set of all observations defines an observation field C. Note that θ corresponds to the set of color models as defined before. Based on the above the joint probability distribution function can be defined as being:
Figure imgf000013_0001
where
Z is a normalization constant Φ; is defined as the evidence of the hidden state or the compatibility function of the hidden state and the observations
Ψ; is defined as the compatibility function between the neighboring nodes.
N is the neighborhood system defined on the network, here the set of all possible node pairs.
The solution of the segment evolution problem is defined as the MAP probability estimate of the labeling and the mapping. The computation is preferably performed on negative log probability as shown below, which corresponds to the energy function definitions. Here the definition of the energy functions is given for this particular case by way of example.
The energy function based on the above joint probability distribution function E(L, M) can be written as:
E(L,M) = ∑Eι (c1 ,l1 ,m1) + λ ∑E2(l ,,m, , / } ,m , ) (eq. 5)
Here the energy function E(L, M) comprises the contextual information of a first node i, as well as the contextual information encoded in the links between neighboring nodes i andy. The energy of the contextual information of the first node i can be written as:
E1(C1 , l1 ,m1 ) = E{(li ,c1 ) + E: (m1 , C1) (eq. 6)
Here
Figure imgf000014_0001
is the color likelihood in terms of the color model and Ef(In1 ,C1) is the color likelihood in terms of the mapping. Under the assumption that the color model comprises a Gaussian mixture
Figure imgf000014_0002
can be expressed as being:
El(I1^1) = -log ∑wk - N(Cl - μk 2 - σk 2) (eq. 7) k=\
wherein μ^ and σk are the mean and standard deviation of the k-th component of the Gaussian of the color model belonging to/; and wherein wk is the corresponding weight of the component k and wherein N stands for the evaluation variable C1 against a Gaussian distribution with mean value μ^ and standard deviation^ . E™ (Tn1, C1) can be expressed as being:
E™(m1,c1) = a - \\c1 t - ct-m l i ( (eq. 8)
The second energy component E2 (I1
Figure imgf000015_0001
is the interaction term between the two neighboring nodes i andy. The second energy component can be expressed as:
E2(li,m1JJ,mJ) = E2 (m1,mJ) , I1 = I1
(eq. 9)
T0 , l, ≠ lj
wherein T0 is a predefined cost and E2 (M1 ,m} ) is defined as:
Figure imgf000015_0002
T1 , otherwise
It will be clear to the skilled person that the above energy function definition is merely exemplary and should not be construed as limiting the scope of the present invention. Fig. 4 shows a flowchart of a method according to the present invention.
During step 410 image sequence 405 is processed and depth-related information 415 is generated for at least one image of the image sequence 405. A wide variety of techniques can be applied to generate depth-related information 415 for an image sequence, such as those disclosed in International Patent Applications WO2005/013623 and WO2005/083630 by the same applicant, hereby incorporated by reference. In step 420 the generated depth-related information is combined with manually entered depth-related information 425 and segmentation-related information resulting in annotation information 435 for the at least one image. In other embodiments of the present invention the annotation information 435 may comprise other information that can be annotated to the at least one image. The annotation information 435 is subsequently annotated to the image sequence 405 in step 430. The annotated image sequence 445 is subsequently used to determine the MAP solution for both the labeling and mapping for at least one further image in the image sequence 405, in step 440, in the process taking into account the evidence in the form image sequence 405. The MAP solution in turn comprises the propagated depth-related information. The process of determining a MAP solution can be subsequently repeated for further images in the sequence 405 based on the MAP solution that was just established until all images with the image sequence are annotated.
It will be clear to the skilled person that although the steps 410, 420, and 430 as shown in the flowchart are part of the preparation for the step 440. The present invention is primarily embodied in the step 440, which is reflected through the use of dashed boxes in the flow-chart.
In this manner the present invention can be used to propagate depth-related information and segmentation-related information for consecutive images in an image sequence. Although typically depth-related information and segmentation-related information propagation would be in a forward direction; i.e. from the current image to another image forward in time, the present invention may also be applied on images in an image sequence in reverse order, thereby effectively propagating annotated depth-related information and segmentation-related information backwards in time. Fig. 5 shows an example of three input images for which a depth map and segmentation were generated according to the present invention. The images 501, 502 and 503 represent the original images in sequence. The images 504, 505 and 506 represent the corresponding propagated depth-related information. Finally the images 507, 508, and 509 represent the corresponding propagated segmentation. Fig. 6 shows an apparatus 600 according to the present invention arranged to propagate depth-related information and segmentation-related information in an image sequence. The apparatus comprises two input connectors 605 and 615. The input connectors 615 and 605 are used to receive annotation information 435 for at least one image of the image sequence 405 and the image sequence 405 respectively. The annotating means 610 is arranged to annotate at least one image of the image sequence 405 using annotation information 435. The annotated image sequence 445 is subsequently processed in accordance with the present invention by processing means 620 in order to establish the MAP solution for both the labeling and mapping using a probabilistic network for at least one further image in the image sequence. The resulting propagated depth-related information is subsequently output on output connector 635. The processing means 620 may also provide the image sequence 405 on optional output connector 625 and segmentation information on optional output connector 645. As such the apparatus 600 is well-suited for embedding within more complex devices, such as set-top boxes and/or auto stereoscopic displays. Also shown in Fig. 6 is a further apparatus 650 in accordance with the present invention. This device further comprises generation means 630 which is arranged to generate annotation information 435 based on an input sequence 405. The apparatus 650 comprises an input connector 605 for receiving an image sequence 405. The image sequence 405 is subsequently presented to the generation means 630 for generating annotation information 435. In this embodiment the apparatus 650 autonomously generates the annotation information 435 for use in the processing means 620.
It will be clear to those skilled in the art that as the present invention relates to image (sequence) processing the present invention may be implemented on a variety of processing platforms. These may range from dedicated hardware platforms that comprise a plurality of massively parallel processor arrays, to general purpose processing on single processor platforms. Moreover the generation means 630, the annotation means 610 and the processing means 620 may be implemented on one and the same processing platform in a substantially sequential or parallel manner, i.e. as far as algorithmic constraints allow parallelism. Finally the implementation of the present invention may be implemented primarily in software e.g. on a programmable computing platform, or alternatively can be mapped primarily on hardware e.g. on a dedicated Application Specific Integrated Circuit (ASIC). It will be clear to the skilled person that hybrid solutions of the above implementations are also within the scope of the present invention. Fig. 7 shows a system 700 according to the present invention. The system 700 comprises several devices according to the present invention. The system comprises a storage server 755, which might be local or remote and/or a network server 760. Each of these servers 755,760 can provide both image sequence data 405 over network 750. Optionally they may be further arranged to also provide annotation information 435 over network 750. This information may be provided e.g. to an apparatus 600 according to the present invention for further processing. Alternatively the image sequence 405 can be provided to a Set Top Box (STB) 707 comprising an apparatus 650 according to the present invention that is connected to an autostereoscopic display 705. More alternatively the image sequence data 405 can be provided to an apparatus 710 that comprises the functionality of the above mentioned STB 707 and the autostereoscopic display 705.
Finally the image sequence 405 may also be provided to a compute server 720 that is arranged to execute instructions stored on a data carrier 730, which instructions when executed by the compute server 720 perform the steps of a method in accordance with the present invention. Although throughout the text of the present application the focus is on propagation of depth-related information, it will be clear to the skilled person that the MRF network presented herein may be further enhanced to import depth-related information from other sources. These sources can be substantially similar in nature to those used for generating annotation information 435 prior to the MAP-optimization.
The MRF network can be furthermore enhanced to constrain the effects that such imported depth-related information estimations may have on the labeling and mapping. In this manner temporal stability can be substantially preserved and erratic behavior resulting from external depth-related information can be prevented. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. It will be clear that within the framework of the invention many variations are possible. It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. The invention resides in each and every novel characteristic feature and each and every combination of characteristic features. Reference numerals in the claims do not limit their protective scope. Use of the verb "to comprise" and its conjugations does not exclude the presence of elements other than those stated in the claims. Use of the article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

Claims

CLAIMS:
1. A method of propagating depth-related information and segmentation-related information associated with at least one image from an image sequence to a consecutive image in the image sequence using a probabilistic network to solve a Bayesian inference problem wherein the hidden nodes' states to estimate comprise depth-related information and segmentation-related information and wherein the links between the nodes of the probabilistic network are configured to simultaneously account for constraints imposed by the depth-related information and segmentation-related information of the consecutive image and mapping information for the respective node from the at least one image to the consecutive image, the nodes taking into account evidence in the form of image characteristics from the consecutive image, such that propagated depth-related information and segmentation-related information is established for the consecutive image by establishing a Maximum A Posteriori solution for both the labeling and the mapping, whereby the Bayesian problem comprises segmentation of the consecutive image, and propagation of depth information for the consecutive image, based on segmentation and mapping information stored in the probabilistic network constructed from the at least one image and consecutive image..
2. A method according to claim 1, whereby the method comprises the step of: processing an image sequence to generate depth-related information, such as depth information or disparity information, from at least one image of the image sequence.
3. A method according to claim 1 or 2, wherein the method comprises the step of: combining generated depth-related information with segmentation-related information, such as a segment index, from the annotation information for the at least one image or generating depth-related information and segmentation-related information, such as a segment index, from the annotation information for the at least one image.
4. A method according to claim 3, wherein the method comprises the step of: annotating the image sequence with the annotation information.
5. A method according to any of the preceding claims, wherein the probabilistic network uses a set of segment specific color models when establishing the maximum a posteriori (MAP) solution.
6. A method according to any of the preceding claims, wherein the probabilistic network is a Markov Random Field (MRF) network.
7. A method according to claim 6, wherein the MRF network is arranged to take account of a time-invariant segment specific color model or a time- variant segment specific color model adapted temporally.
8. A method according to claim 6 or 7, wherein the nodes within the MRF network are organized in a pair-wise manner, whereby the MRF network is capable of encoding a spatial smoothness constraint and enables the use of a fast inference algorithm to establish the MAP solution.
9. A method according to any of claims 6-8, wherein the neighborhood of a node is defined by its 8-neighborhood to allow enforcement of more complicated constraints such as curvature smoothness.
10. A method according to any of claims 6-9, wherein the MRF network is arranged to apply bi-directional propagation of the depth-related information.
11. A method according to any of the preceding claims, wherein said image sequence corresponds to a shot and at least one image within the shot is annotated.
12. A method according to claim 11, wherein the image within the shot with the largest number of visible objects and/or segments is annotated .
13. A method according to any of the preceding claims, wherein the process of determining a MAP solution is subsequently repeated for further images in the image sequence based on the MAP solution established for said at least one image from the image sequence until all images of the image sequence are annotated.
14. An apparatus for propagating depth-related information and segmentation- related information associated with at least one image from an image sequence to a further consecutive image in the image sequence using a probabilistic network, the apparatus comprising processing means that is arranged to use of the probabilistic network to solve a Bayesian inference problem wherein the hidden nodes' states to estimate comprise the depth- related information and segmentation-related information and wherein the node links of the probabilistic network are configured to simultaneously account for constraints imposed by the depth-related information and segmentation-related information of the consecutive image and mapping information for the respective node from the at least one image to the consecutive image, the nodes taking into account evidence in the form of image characteristics from the consecutive image, such that propagated depth-related information and segmentation-related information is established for the consecutive image by establishing a Maximum A Posteriori solution for both the labeling and the mapping, whereby the Bayesian problem further comprises segmentation of the consecutive image, and propagation of depth information for the consecutive image, based on segmentation and mapping information stored in the probabilistic network constructed from the at least one image and consecutive image.
15. An apparatus according to claim 14, the apparatus also comprising: input connectors that are arranged to receive at least one image of an image sequence and annotation information, such as depth-related information and segmentation-related information for at least one image of the image sequence, and annotating means that is arranged to annotate at least one image of the image sequence using said annotation information.
16. An apparatus according to claim 14 or 15, the apparatus also comprising: at least one output connector for outputting propagated depth-related information, and segmentation-related information of the image sequence.
17. An apparatus according to any of claims 14-16, the apparatus also comprising: generation means that is arranged to generate depth and segmentation information based on an input image sequence using the annotation information.
18. A system for propagating depth-related information and segmentation-related information associated with at least one image from an image sequence to a consecutive image in the image sequence using a probabilistic network, the system comprising an apparatus according to any of claims 14-17.
19. A system according to claim 18, wherein the system comprises a storage server and a network server, whereby each of these servers is arranged to provide image sequence data over a network.
20. A system according to claim 19, wherein the storage server and the network server are arranged to provide annotation information over a network, for example to an apparatus according to any of claims 14-17 for further processing.
21. A system according to claim 18, wherein the image sequence is arranged to be provided to a Set Top Box (STB) comprising an apparatus according to any of claims 14-17 that is connected to an autostereoscopic display, or to an apparatus that comprises the functionality of an STB and an autostereoscopic display.
22. A system according to claim 18, wherein the image sequence is arranged to be provided to a compute server that is arranged to execute instructions stored on a data carrier, which instructions when executed by the compute server perform the steps of a method according to any of claims 1-13.
23. A computer program, distributable by electronic data transmission, comprising computer program code means adapted to, when said program is loaded onto a computer, to make the computer execute the steps of a method according to any of the claim 1-13.
24. A computer program according to claim 23, wherein the computer program code means are stored on a data carrier.
PCT/IB2008/052340 2007-06-15 2008-06-13 Method, apparatus, system and computer program product for depth-related information propagation WO2008152607A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP07110360 2007-06-15
EP07110360.0 2007-06-15
EP08151409 2008-02-14
EP08151409.3 2008-02-14

Publications (1)

Publication Number Publication Date
WO2008152607A1 true WO2008152607A1 (en) 2008-12-18

Family

ID=39811683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/052340 WO2008152607A1 (en) 2007-06-15 2008-06-13 Method, apparatus, system and computer program product for depth-related information propagation

Country Status (2)

Country Link
TW (1) TW200907859A (en)
WO (1) WO2008152607A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679717A (en) * 2013-12-05 2014-03-26 河海大学 Method for splitting image based on Markov random field
US9338424B2 (en) 2011-02-23 2016-05-10 Koninklijlke Philips N.V. Processing depth data of a three-dimensional scene
EP2656315B1 (en) * 2010-12-22 2016-10-05 Legend3D, Inc. System and method for minimal iteration workflow for image sequence depth enhancement
WO2016170330A1 (en) * 2015-04-24 2016-10-27 Oxford University Innovation Limited Processing a series of images to identify at least a portion of an object
CN106570880A (en) * 2016-10-28 2017-04-19 中国人民解放军第三军医大学 Brain tissue MRI image segmentation method based on fuzzy clustering and Markov random field
CN111951282A (en) * 2020-08-12 2020-11-17 辽宁石油化工大学 An Improved Image Segmentation Algorithm Based on Markov Random Field and Region Merging

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CIPOLLA R ET AL: "Layered motion segmentation and depth ordering by tracking edges", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 26, no. 4, 1 April 2004 (2004-04-01), pages 479 - 494, XP011107234, ISSN: 0162-8828 *
JIE-YU ZHAO: "Dynamic background discrimination with belief propagation", MACHINE LEARNING AND CYBERNETICS, 2004. PROCEEDINGS OF 2004 INTERNATIO NAL CONFERENCE ON SHANGHAI, CHINA AUG. 26-29, 204, PISCATAWAY, NJ, USA,IEEE, vol. 7, 26 August 2004 (2004-08-26), pages 4342 - 4346, XP010763211, ISBN: 978-0-7803-8403-3 *
SEBASTIAN KNORR ET AL: "A Modular Scheme for 2D/3D Conversion of TV Broadcast", 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, THIRD INTERNATION AL SYMPOSIUM ON, IEEE, PI, 1 June 2006 (2006-06-01), pages 703 - 710, XP031079006, ISBN: 978-0-7695-2825-0 *
YAN LI ET AL: "Object Detection Using 2D Spatial Ordering Constraints", COMPUTER VISION AND PATTERN RECOGNITION, 2005. CVPR 2005. IEEE COMPUTE R SOCIETY CONFERENCE ON SAN DIEGO, CA, USA 20-26 JUNE 2005, PISCATAWAY, NJ, USA,IEEE, vol. 2, 20 June 2005 (2005-06-20), pages 711 - 718, XP010817983, ISBN: 978-0-7695-2372-9 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2656315B1 (en) * 2010-12-22 2016-10-05 Legend3D, Inc. System and method for minimal iteration workflow for image sequence depth enhancement
US9338424B2 (en) 2011-02-23 2016-05-10 Koninklijlke Philips N.V. Processing depth data of a three-dimensional scene
CN103679717A (en) * 2013-12-05 2014-03-26 河海大学 Method for splitting image based on Markov random field
WO2016170330A1 (en) * 2015-04-24 2016-10-27 Oxford University Innovation Limited Processing a series of images to identify at least a portion of an object
CN106570880A (en) * 2016-10-28 2017-04-19 中国人民解放军第三军医大学 Brain tissue MRI image segmentation method based on fuzzy clustering and Markov random field
CN106570880B (en) * 2016-10-28 2019-10-11 中国人民解放军第三军医大学 In conjunction with the brain tissue MRI image dividing method of fuzzy clustering and markov random file
CN111951282A (en) * 2020-08-12 2020-11-17 辽宁石油化工大学 An Improved Image Segmentation Algorithm Based on Markov Random Field and Region Merging

Also Published As

Publication number Publication date
TW200907859A (en) 2009-02-16

Similar Documents

Publication Publication Date Title
CA2668941C (en) System and method for model fitting and registration of objects for 2d-to-3d conversion
Feng et al. Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Jang et al. Efficient disparity map estimation using occlusion handling for various 3D multimedia applications
KR20110090958A (en) Generation of occlusion data for image attributes
CN102047288A (en) System and method for depth extraction of images with forward and backward depth prediction
US9661307B1 (en) Depth map generation using motion cues for conversion of monoscopic visual content to stereoscopic 3D
JP4892113B2 (en) Image processing method and apparatus
CN102368826A (en) Real time adaptive generation method from double-viewpoint video to multi-viewpoint video
EP2462536A1 (en) Systems and methods for three-dimensional video generation
WO2008152607A1 (en) Method, apparatus, system and computer program product for depth-related information propagation
US10218956B2 (en) Method and apparatus for generating a depth cue
Zhang et al. Interactive stereoscopic video conversion
KR100560464B1 (en) How to configure a multiview image display system adaptive to the observer's point of view
Yang et al. Depth map generation using local depth hypothesis for 2D-to-3D conversion
Lee et al. Segment-based multi-view depth map estimation using belief propagation from dense multi-view video
Zhang et al. Stereoscopic learning for disparity estimation
Orozco et al. HDR multiview image sequence generation: Toward 3D HDR video
Wildeboer et al. A semi-automatic multi-view depth estimation method
KR20220071935A (en) Method and Apparatus for Deriving High-Resolution Depth Video Using Optical Flow
Caviedes et al. Real time 2D to 3D conversion: Technical and visual quality requirements
Lin et al. A 2D to 3D conversion scheme based on depth cues analysis for MPEG videos
Priya et al. 3d Image Generation from Single 2d Image using Monocular Depth Cues
Tsubaki et al. 2D to 3D conversion based on tracking both vanishing point and objects
Raviya et al. Depth and Disparity Extraction Structure for Multi View Images-Video Frame-A Review
Cai et al. Image-guided depth propagation for 2-D-to-3-D video conversion using superpixel matching and adaptive autoregressive model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08763327

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08763327

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载