US20130321566A1 - Audio source positioning using a camera - Google Patents
Audio source positioning using a camera Download PDFInfo
- Publication number
- US20130321566A1 US20130321566A1 US13/599,678 US201213599678A US2013321566A1 US 20130321566 A1 US20130321566 A1 US 20130321566A1 US 201213599678 A US201213599678 A US 201213599678A US 2013321566 A1 US2013321566 A1 US 2013321566A1
- Authority
- US
- United States
- Prior art keywords
- participant
- site
- remote
- location
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/08—Volume rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
- H04N13/117—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/194—Transmission of image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/246—Calibration of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/257—Colour aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/157—Conference systems defining a virtual conference space and using avatars or agents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/005—Audio distribution systems for home, i.e. multi-room use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Definitions
- a spatial audio teleconference between two or more geographically distant sites is typically achieved by processing audio signals captured with microphones at one site to produce spatial audio data.
- This spatial audio data is then transmitted to the other sites and processed at each of these sites to generate a plurality of output audio signals that are played through multiple audio speakers in a manner that spatializes the sound from a sending site to a distinct location in the receiving site.
- This process is repeated at all the sites resulting in the voices of participants at other sites seeming to a participant at the receiving site as if they are emanating from different locations in the receiving site.
- This spatializing of the voices of other participants in the receiving site is typically accomplished using only the spatial audio data received from the other sites.
- Audio source positioning technique embodiments described herein are generally employed in a video teleconference or telepresence session between a local site and one or more remote sites.
- each of these sites has one participant, and a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites.
- audio source positioning technique embodiments described herein are used to make it seem to a participant viewing a rendering of the virtual scene that the voice of each depicted participant is emanating from a location on the display device where that participant is depicted.
- this audio source positioning is accomplished at a site (referred to as the local site for convenience) by transmitting data to the other site or sites (referred to as remote sites for convenience), which is then used at those sites to construct the aforementioned virtual scene with spatialized audio.
- similar data is received from the other site or sites to construct a virtual scene with spatialzed audio at the local site.
- streams of sensor data generated from an arrangement of sensors that capture participant data are input into a computing device or devices resident at the local site.
- This arrangement of sensors includes a plurality of video and audio devices.
- Each video capture device captures the participant from a different geometric perspective
- each audio capture device captures the voice of the participant.
- Scene proxies are generated from the streams of sensor data, which geometrically describes the local site including the participant on a frame by frame basis.
- the streams of video sensor data and a face tracking technique are employed to identify a 3D point representing the location of the participant in the local site for each frame of the scene proxies.
- the scene proxies representing each frame are transmitted in the order generated over a data communication network to each remote site, along with, two additional items.
- audio data representing the local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted, and the 3D point coordinates representing the location of the participant in the local site for the frame currently being transmitted.
- the local site's computing device or devices receive scene proxies representing successive scene proxy frames from each remote site.
- audio data representing the remote site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the remote site, and a 3D point representing the location of the participant in the remote site are received from each remote site that is facilitating audio source positioning at the local site.
- a frame of a virtual scene is rendered from the last-received frame or frames of scene proxies that includes a depiction of each of the remote site participants.
- the rendered frame is then displayed to the local site participant via a display device.
- a spatial audio technique is employed to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted.
- FIG. 1 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for a local site to facilitate audio source positioning at a remote site.
- FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for audio source positioning in a local site.
- FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of an implementation of the part of the process of FIG. 2 involving the rendering and displaying the frames of a virtual scene.
- FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for audio source positioning in a local site that adds simulated reverberation.
- FIG. 5 is a diagram illustrating an exemplary video conferencing or telepresence application that supports the generation, storage, distribution, and presentation of a virtual scene in which audio source positioning technique embodiments described herein can be implemented.
- FIG. 6 is a diagram depicting a general purpose computing device constituting an exemplary system for implementing audio source positioning technique embodiments described herein.
- the term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene.
- the audio source positioning technique embodiments described herein employ one or more sensors which can be configured in various arrangements to capture a scene, thus allowing one or more streams of sensor data to be generated each of which represents the scene from a different geometric perspective.
- Each of the sensors can be any type of video capture device (e.g., any type of video camera), or any type of audio capture device, or any combination thereof.
- Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time).
- the audio source positioning technique embodiments described herein can employ a combination of different types of sensors to capture a given scene.
- Audio source positioning technique embodiments described herein are generally employed in a video teleconference or telepresence session between a local site and one or more remote sites.
- each of these sites has one participant, and a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites in the constructed scene.
- a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites in the constructed scene.
- the construction of such a virtual scene is accomplished using conventional methods, with an exception.
- audio source positioning technique embodiments described herein are used to co-locate the voice of each of the other participant(s) with the depiction of that person on a display.
- audio source positioning technique embodiments described herein make it seem to a participant viewing a rendering of the virtual scene that the voice of another participant is emanating from a location on the display device where the remote participant is depicted. This audio illusion enhances the video teleconference or telepresence session experience and makes it seem more like the viewing participant is actually present with the other participant(s) in the virtual scene.
- the participant who is viewing the rendered virtual scene will be referred to as a local or first participant, and the site that this participant is viewing from will be referred to the local or first site.
- Each of the other participants involved will be referred to as a remote or other participant, and the site associated with a remote participant will be referred to as a remote or other site.
- any of the sites participating in a video teleconference or telepresence session can be considered the local site with the others being the remote sites.
- one general embodiment of the audio source positioning technique involves, from the viewpoint of the local site, using a computing device to perform the following process actions.
- Streams of sensor data generated from an arrangement of sensors that capture participant data are input (block 100 ).
- This arrangement includes a plurality of video and audio devices which generate a plurality of streams of sensor data.
- Each video device captures the site participant from a different geometric perspective, and each audio device captures the voice of the participant at the site.
- Scene proxies are then generated from the streams of sensor data (block 102 ).
- a scene proxy geometrically describes the local site including the participant on a frame by frame basis.
- a frame of scene proxies refers to the geometric and texture data needed to render a frame of the aforementioned virtual scene.
- Examples of a scene proxy include a stream of depth map images of the captured scene.
- a scene proxy can also include a stream of calibrated point cloud reconstructions of the captured scene.
- a scene proxy can further include one or more types of high order geometric models such as planes, billboards, and existing (i.e., previously created) generic object models (e.g., human body models) which can be either modified, or animated, or both.
- a scene proxy can also include other high fidelity proxies such as a stream of mesh models of the captured scene, and the like. Further, more than one type of scene proxy can be employed in a frame of scene proxies.
- the streams of sensor data are used, along with a face tracking technique, to identify a 3D point representing the location of the participant in the local site for each frame of the scene proxies (block 104 ).
- this 3D point representing the location of the participant in the local site is a 3D point representing the location of the participant's head in the local site.
- the 3D point representing the location of the participant in the local site is a 3D point representing the location of the participant's mouth in the local site.
- the scene proxies representing each frame are transmitted in the order generated over a data communication network to the remote site or sites, along with, audio data representing the local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted, and the 3D point coordinates representing the location of the participant in the local site identified for the frame of scene proxies currently being transmitted (block 106 ).
- audio data representing the local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted
- the 3D point coordinates representing the location of the participant in the local site identified for the frame of scene proxies currently being transmitted
- the foregoing process actions provided the data used at a remote site to perform audio source positioning.
- the foregoing action can be said to facilitate audio source positioning at a remote site.
- audio source positioning is to be implemented at the local site as well, then the same type of data is provided from a remote site or sites.
- FIG. 2 the process actions generally employed for audio source positioning in a video teleconference or telepresence session at the local site will now be described.
- scene proxies representing successive scene proxy frames are receiving from each remote site participating in the conference over a data communication network (block 200 ).
- other data is received from at least one remote site.
- this data includes audio data representing the remote site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the remote site, and a 3D point representing the location of the participant in the remote site (block 202 ).
- a frame of a virtual scene is rendered (block 204 ).
- the virtual scene frame includes a depiction of each of the remote site participants from the last-received frame or frames of scene proxies.
- the rendered virtual frame is then displayed to the local site participant via a display device (block 206 ).
- the term contemporaneously used above is not to be taken literally.
- the frames of scene proxies coming from multiple remote sites are considered contemporaneous if they arrive before the next frame from any of the sites.
- a spatial audio technique is employed to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted (block 208 ). This is accomplished using conventional methods given the received audio data and 3D point representing the location of the participant in the remote site.
- the 3D point representing the location of the participant in the remote site can be the person's head or mouth. More particularly, in one embodiment, the 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's mouth in the remote site when the mouth of that participant is visible in the last-rendered frame of the virtual scene. In another embodiment, the 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's head in the remote site when the mouth of that participant is not visible by the sensor used to determine that 3D point.
- a first transform is computed that converts 3D locations in the remote site to points in the frame of the virtual scene.
- the action of displaying a rendered frame to the local site participant involves the use of a second transform that converts points in a frame of the virtual scene to screen coordinates on the local site's display device.
- the first transform is used to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene (block 300 ), and the second transform is employed to convert the point in the last-rendered frame of the virtual scene representing the remote participant location to screen coordinates on the local site's display device (block 302 ).
- a third transform is also computed that converts screen coordinates in the display device to 3D points in the local site (block 304 ). It is noted that this transform need only be computed once, unless the display device is moved—at which point it would be re-computed.
- the third transform is used to compute the 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device (block 306 ).
- the spatial audio technique and a plurality of audio speakers resident in the local site are then used to make it seem to the local site participant that the voice of the remote site participants are respectively emanating from the computed 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device (block 308 ).
- the location of the local participant within the local site has an effect on how audio source positioning is accomplished.
- a parallax effect results when the local site residence moves and in one embodiment, the spatial audio technique compensates based on the current location of the local participant.
- the head of the local site participant is tracked and periodically a 3D point representative of the location of the local site participant's head in the local site is computed. The point is then used in the audio source positioning.
- the spatial audio technique is used to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted taking into consideration the last-computed 3D point representative of the location of the local site participant's head.
- the rate at which 3D points representative of the location of the local site participant's head in the local site are computed should be high. In one embodiment, this rate exceeds the rate at which frames of the virtual scene are calculated.
- a typical virtual scene frame rate is 30 frames per second (fps).
- the rate at which 3D points representative of the location of the local site participant's head are rendered is four times the virtual frame rate-namely 120 times per second.
- the depiction of the scene from the point of view of the local participant is updated at 120 fps. In other words the scene is calculated at 30 fps, but rendered at 120 fps.
- the audio source positioning technique embodiments described so far make it seem to a participant viewing a rendering of the virtual scene that the voice of another participant is emanating from a location on the display device where the remote participant is depicted.
- This enhancement involves simulating the reverberations a participant's voice would create in the virtual scene (e.g., reverberations of the sound against the virtual walls or other virtual objects in the scene) and playing these reverberation in the participant's site.
- This reverberation enhancement can be accomplished at the local site, given, from each remote site, the 3D point representing the location of the remote participant in the remote site and a modified version of the audio data representing the remote site participant's voice site.
- the modification to the audio data involves suppressing reverberations and noise in the audio captured at the remote site. While this modification can be performed at the local site given certain information about the remote site, the more efficient method would be for the reverberations and noise in the audio captured at the remote site to be suppressed in the audio data prior to the data being sent to the local site. In either case, conventional suppression techniques are employed to accomplish the modification.
- one general embodiment of the audio source positioning technique that adds reverberation on a frame-by-frame basis involves, from the viewpoint of the local site, using the local site computing device to perform the following process actions.
- the previously-described first transform computed to convert 3D locations in the remote site to points in the last-rendered frame of the virtual scene is employed to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene (block 400 ).
- the 3D point representing the location of the remote participant in the remote site corresponds to a 3D point representing the location of the remote participant's mouth in the remote site.
- the orientation of the remote site participant's face in the virtual scene, as depicted in the last-rendered virtual scene frame is identified (block 402 ). Conventional methods are employed to accomplish this task.
- the direction that the remote participant's voice projects in the virtual space from the point in the last-rendered frame of the virtual scene that corresponds to the 3D point representing the location of the remote participant's mouth is then computed based on the orientation of the remote site participant's face in the virtual scene (block 404 ).
- the reverberation characteristics of the virtual scene, as depicted in the last-rendered virtual scene frame are estimated (block 406 ).
- reverberation audio data is then computed that when added to the received audio data simulates the reverberations of the remote participant's voice in the virtual space for the current frame (block 408 ).
- This computed reverberation audio data is then added into the audio played in the local site in conjunction with the display of the current virtual scene frame (block 410 ).
- the audio source positioning technique embodiments described herein can be employed in a variety of video conferencing or telepresence applications. Generally, any video conferencing or telepresence application that involves the generation and display of a virtual scene for each participant can be enhanced using the audio source positioning technique embodiments described herein.
- One exemplary video conferencing or telepresence application supports the generation, storage, distribution, and presentation of a virtual scene (such as a virtual conference room).
- the exemplary video conferencing or telepresence application can support various types of traditional, single viewpoint virtual scene presentations in which the viewpoint of the scene is fixed when the video is recorded/captured and this viewpoint cannot be controlled or changed by a participant while they are viewing the virtual scene.
- the viewpoint of the scene is fixed and cannot be modified when the scene is being rendered and displayed to a participant.
- the exemplary video conferencing or telepresence application can support various types of free viewpoint video in which the viewpoint of the virtual scene can be interactively controlled and changed by a participant at will while they are viewing the scene.
- a participant can interactively generate different viewpoints of the scene on-the-fly when the virtual scene is being rendered and displayed.
- FIG. 5 illustrates an exemplary video conferencing or telepresence application processing pipeline in which the audio source positioning technique embodiments described herein can be implemented.
- the exemplary processing pipeline 500 starts with a generation stage 502 during which, and generally speaking, the aforementioned scene proxies of a site are generated.
- the generation stage 502 includes a capture sub-stage 504 and a processing sub-stage 506 whose operations will now be described in more detail.
- the capture sub-stage 504 of the processing pipeline 500 generally captures the scene in a site including the participant 508 and generates one or more streams of sensor data that represent the scene. More particularly, during the capture sub-stage 504 , an arrangement of sensors is used to capture the scene, where the arrangement includes a plurality of video capture devices 510 (as will be described shortly) and one or more audio capture devices 512 (such as a microphone or microphone array). The arrangement of sensors generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. These streams of sensor data are input from the sensors and calibrated, and then output to the processing sub-stage 506 .
- the processing sub-stage 506 inputs the streams of sensor data from the capture sub-stage 504 , and then generates scene proxies which geometrically describe the captured scene as a function of time from the streams of sensor data. These scene proxies also include texture data for rendering the virtual scene.
- the scene proxies are output to a storage and distribution stage 514 , which stores them, along with the aforementioned audio data captured using the audio capture devices 512 .
- the generation stage 502 is implemented on one or a collection of computing devices at a participant site (such as the local site shown) and a presentation stage 516 of the pipeline 500 is implemented on one or more computing devices resident at the other participant sites (such as the exemplary remote site shown in FIG. 5 ).
- the storage and distribution stage 514 distributes the scene proxies and audio data to the other participating sites by transmitting over whatever one or more data communication networks 518 to which the participant site computing devices are connected. It is noted that each participant site has a generation stage 502 and storage and distribution stage 514 (although only those associated with the aforementioned local site are shown in FIG. 5 ).
- a presentation stage 516 of the processing pipeline 500 is resident at each of the other participating sites (one of which is shown).
- the presentation stage 516 inputs the scene proxies and audio data that were transmitted from the storage and distribution stage 514 resident at each of the other sites (again one of which is shown in FIG. 5 ), and presents the participant at the receiving site with a rendering of the scene proxies in the form of the previously described virtual scene frames.
- the presentation stage 516 includes a rendering sub-stage 520 and a participant viewing experience sub-stage 522 whose operations will now be described in more detail.
- the rendering sub-stage 520 of the processing pipeline 500 inputs the scene proxies from the storage and distribution stage 514 , and then generates successive frames of the virtual scene (one of which 524 is shown in FIG. 5 ). If more than one other participant site is involved, then generating successive frames of the virtual scene entails the rendering sub-stage 520 inputting the scene proxies from the storage and distribution stage 514 operating at each of the other sites, and combining the proxy data using conventional methods to create an aggregate virtual scene (such as 524 ). Each virtual scene frame generated is then output to the participant viewing experience sub-stage 522 of the pipeline 500 .
- the participant viewing experience sub-stage 522 inputs each frame from the rendering sub-stage 520 , and then displays it on a display device 526 for viewing by the participant.
- the audio source positioning technique embodiments described herein are implemented as described previously to provided spatialized audio in association with each frame displayed using two or more audio speakers 528 located in the receiving site.
- the rendering sub-stage 520 inputs the scene proxies output from the storage and distribution stage 514 (or stages if multiple other sites are involved), and then generates a frame exhibiting a current synthetic viewpoint.
- the current synthetic viewpoint is either a default viewpoint, or if the participant has specified a viewpoint, is the last-specified viewpoint.
- the participant-specified viewpoint comes from the participant viewing experience sub-stage 522 , which inputs it from the participant via a user interface.
- the video capture devices 510 include a circular arrangement of eight genlocked sensors used to capture a site which includes the participant, where each of the sensors has a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the sensors each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream. The pair of infrared image streams and the color image stream generated by each sensor are used to generate different depth map image streams. The different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene. These point cloud reconstructions can then used to generate a stream of mesh models of the scene.
- a conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each sensor and map this texture data to the stream of mesh models of the scene.
- these sensors and their data streams are also used in a face tracking process to identify the 3D location of the participant (which as described above can be the location of the participant's head or mouth).
- the video capture devices 510 include four genlocked visible light video cameras used to capture a site which includes the participant, where the cameras are evenly placed around the site. Accordingly, the cameras each generate a different stream of video data which includes a color image stream.
- An existing 3D geometric model of a human body can be used in the scene proxies as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows. A conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model.
- the cameras and their video data streams are also used in a face tracking process to identify the 3D location of the participant (which can be the location of the participant's head or mouth).
- FIG. 6 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the audio source positioning technique embodiments, as described herein, may be implemented. It should be noted that any boxes that are represented by broken or dashed lines in FIG. 6 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document.
- FIG. 6 shows a general system diagram showing a simplified computing device 10 .
- Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers, server computers, hand-held computing devices, laptop or mobile computers, communications devices such as cell phones and PDA's, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, etc.
- the device should have a sufficient computational capability and system memory to enable basic computational operations.
- the computational capability is generally illustrated by one or more processing unit(s) 12 , and may also include one or more GPUs 14 , either or both in communication with system memory 16 .
- the processing unit(s) 12 of the general computing device may be specialized microprocessors, such as a DSP, a VLIW, or other micro-controller, or can be conventional CPUs having one or more processing cores, including specialized GPU-based cores in a multi-core CPU.
- the simplified computing device of FIG. 6 may also include other components, such as, for example, a communications interface 18 .
- the simplified computing device of FIG. 6 may also include one or more conventional computer input devices 20 (e.g., pointing devices, keyboards, audio input devices, video input devices, haptic input devices, devices for receiving wired or wireless data transmissions, etc.).
- the simplified computing device of FIG. 6 may also include other optional components, such as, for example, one or more conventional display device(s) 24 and other computer output devices 22 (e.g., audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, etc.).
- typical communications interfaces 18 , input devices 20 , output devices 22 , and storage devices 26 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein.
- the simplified computing device of FIG. 6 may also include a variety of computer readable media.
- Computer readable media can be any available media that can be accessed by computer 10 via storage devices 26 and includes both volatile and nonvolatile media that is either removable 28 and/or non-removable 30 , for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data.
- Computer readable media may comprise computer storage media and communication media.
- Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as DVD's, CD's, floppy disks, tape drives, hard drives, optical drives, solid state memory devices, RAM, ROM, EEPROM, flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
- computer or machine readable media or storage devices such as DVD's, CD's, floppy disks, tape drives, hard drives, optical drives, solid state memory devices, RAM, ROM, EEPROM, flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
- Retention of information such as computer-readable or computer-executable instructions, data structures, program modules, etc. can also be accomplished by using any of a variety of the aforementioned communication media to encode one or more modulated data signals or carrier waves, or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism.
- modulated data signal or “carrier wave” generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, RF, infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of the any of the above should also be included within the scope of communication media.
- software, programs, and/or computer program products embodying some or all of the various audio source positioning technique embodiments described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
- audio source positioning technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device.
- program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types.
- the embodiments described herein may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks.
- program modules may be located in both local and remote computer storage media including media storage devices.
- the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
- the receiving site has more than one participant, the sound is separately spatialized as described previously for each participant. This can be easily accomplished if the participants each wear audio earphones (i.e., the plurality of audio speakers at the site are sets of headphones) and a spatial audio technique designed for earphones is employed.
- audio earphones i.e., the plurality of audio speakers at the site are sets of headphones
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Processing Or Creating Images (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Image Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Image Generation (AREA)
- Telephonic Communication Services (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Studio Devices (AREA)
Abstract
Audio source positioning technique embodiments are presented that are employed in a video teleconference or telepresence session between a local site and one or more remote sites. Each of these sites has one participant, and a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites in the constructed scene. However, rather than simply playing audio captured at the other site or sites in the viewing participant's site, audio source positioning is used to make it seem to a participant viewing a rendering of the virtual scene that the voice of another participant is emanating from a location on the display device where the remote participant is depicted.
Description
- This application claims the benefit of and priority to provisional U.S. patent application Ser. No. 61/653,983 filed May 31, 2012.
- A spatial audio teleconference between two or more geographically distant sites is typically achieved by processing audio signals captured with microphones at one site to produce spatial audio data. This spatial audio data is then transmitted to the other sites and processed at each of these sites to generate a plurality of output audio signals that are played through multiple audio speakers in a manner that spatializes the sound from a sending site to a distinct location in the receiving site. This process is repeated at all the sites resulting in the voices of participants at other sites seeming to a participant at the receiving site as if they are emanating from different locations in the receiving site. This spatializing of the voices of other participants in the receiving site is typically accomplished using only the spatial audio data received from the other sites.
- This Summary is provided to introduce a selection of concepts, in a simplified form, that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Audio source positioning technique embodiments described herein are generally employed in a video teleconference or telepresence session between a local site and one or more remote sites. In one embodiment, each of these sites has one participant, and a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites. However, rather than simply playing audio captured at the other site or sites in the viewing participant's site, audio source positioning technique embodiments described herein are used to make it seem to a participant viewing a rendering of the virtual scene that the voice of each depicted participant is emanating from a location on the display device where that participant is depicted.
- In general this audio source positioning is accomplished at a site (referred to as the local site for convenience) by transmitting data to the other site or sites (referred to as remote sites for convenience), which is then used at those sites to construct the aforementioned virtual scene with spatialized audio. In addition, similar data is received from the other site or sites to construct a virtual scene with spatialzed audio at the local site.
- More particularly, in one general embodiment, streams of sensor data generated from an arrangement of sensors that capture participant data are input into a computing device or devices resident at the local site. This arrangement of sensors includes a plurality of video and audio devices. Each video capture device captures the participant from a different geometric perspective, and each audio capture device captures the voice of the participant. Scene proxies are generated from the streams of sensor data, which geometrically describes the local site including the participant on a frame by frame basis. In addition, the streams of video sensor data and a face tracking technique are employed to identify a 3D point representing the location of the participant in the local site for each frame of the scene proxies. The scene proxies representing each frame are transmitted in the order generated over a data communication network to each remote site, along with, two additional items. Namely, audio data representing the local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted, and the 3D point coordinates representing the location of the participant in the local site for the frame currently being transmitted.
- Meanwhile, the local site's computing device or devices receive scene proxies representing successive scene proxy frames from each remote site. In addition, audio data representing the remote site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the remote site, and a 3D point representing the location of the participant in the remote site, are received from each remote site that is facilitating audio source positioning at the local site. For each frame of scene proxies received from a remote site if there is only one remote site sending frames, or for each group of frames of scene proxies contemporaneously received from remote sites if there are multiple remote sites sending frames, a frame of a virtual scene is rendered from the last-received frame or frames of scene proxies that includes a depiction of each of the remote site participants. The rendered frame is then displayed to the local site participant via a display device. In addition, for each remote site participant depicted in the last-rendered frame of the virtual scene that is resident at a remote site that sent the aforementioned audio data representing the remote site participant's voice and the 3D point representing the location of the participant in the remote site, a spatial audio technique is employed to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted.
- The specific features, aspects, and advantages of the disclosure will become better understood with regard to the following description, appended claims, and accompanying drawings where:
-
FIG. 1 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for a local site to facilitate audio source positioning at a remote site. -
FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for audio source positioning in a local site. -
FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of an implementation of the part of the process ofFIG. 2 involving the rendering and displaying the frames of a virtual scene. -
FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for audio source positioning in a local site that adds simulated reverberation. -
FIG. 5 is a diagram illustrating an exemplary video conferencing or telepresence application that supports the generation, storage, distribution, and presentation of a virtual scene in which audio source positioning technique embodiments described herein can be implemented. -
FIG. 6 is a diagram depicting a general purpose computing device constituting an exemplary system for implementing audio source positioning technique embodiments described herein. - In the following description of audio source positioning technique embodiments reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the technique may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the technique.
- It is also noted that for the sake of clarity specific terminology will be resorted to in describing the audio source positioning technique embodiments described herein and it is not intended for these embodiments to be limited to the specific terms so chosen. Furthermore, it is to be understood that each specific term includes all its technical equivalents that operate in a broadly similar manner to achieve a similar purpose. Reference herein to “one embodiment”, or “another embodiment”, or an “exemplary embodiment”, or an “alternate embodiment”, or “one implementation”, or “another implementation”, or an “exemplary implementation”, or an “alternate implementation” means that a particular feature, a particular structure, or particular characteristics described in connection with the embodiment or implementation can be included in at least one embodiment of the audio source positioning technique. The appearances of the phrases “in one embodiment”, “in another embodiment”, “in an exemplary embodiment”, “in an alternate embodiment”, “in one implementation”, “in another implementation”, “in an exemplary implementation”, “in an alternate implementation” in various places in the specification are not necessarily all referring to the same embodiment or implementation, nor are separate or alternative embodiments/implementations mutually exclusive of other embodiments/implementations. Yet furthermore, the order of process flow representing one or more embodiments or implementations of the audio source positioning technique does not inherently indicate any particular order nor imply any limitations of the audio source positioning technique.
- The term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene. Generally speaking and as will be described in more detail hereafter, the audio source positioning technique embodiments described herein employ one or more sensors which can be configured in various arrangements to capture a scene, thus allowing one or more streams of sensor data to be generated each of which represents the scene from a different geometric perspective. Each of the sensors can be any type of video capture device (e.g., any type of video camera), or any type of audio capture device, or any combination thereof. Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time). The audio source positioning technique embodiments described herein can employ a combination of different types of sensors to capture a given scene.
- Audio source positioning technique embodiments described herein are generally employed in a video teleconference or telepresence session between a local site and one or more remote sites. In one embodiment, each of these sites has one participant, and a virtual scene is constructed and displayed at each site that depicts each of the participants from the other sites in the constructed scene. Thus, it appears to a participant who is viewing the virtual scene that he or she is in a space with the participant or participants from the other site or sites. The construction of such a virtual scene is accomplished using conventional methods, with an exception. Rather than simply playing audio captured at the other site(s) in the viewing participant's site, audio source positioning technique embodiments described herein are used to co-locate the voice of each of the other participant(s) with the depiction of that person on a display. In other words, audio source positioning technique embodiments described herein make it seem to a participant viewing a rendering of the virtual scene that the voice of another participant is emanating from a location on the display device where the remote participant is depicted. This audio illusion enhances the video teleconference or telepresence session experience and makes it seem more like the viewing participant is actually present with the other participant(s) in the virtual scene.
- It is noted that for convenience, the participant who is viewing the rendered virtual scene will be referred to as a local or first participant, and the site that this participant is viewing from will be referred to the local or first site. Each of the other participants involved will be referred to as a remote or other participant, and the site associated with a remote participant will be referred to as a remote or other site. Given this, it will be evident that any of the sites participating in a video teleconference or telepresence session can be considered the local site with the others being the remote sites.
- Referring to
FIG. 1 , one general embodiment of the audio source positioning technique involves, from the viewpoint of the local site, using a computing device to perform the following process actions. Streams of sensor data generated from an arrangement of sensors that capture participant data are input (block 100). This arrangement includes a plurality of video and audio devices which generate a plurality of streams of sensor data. Each video device captures the site participant from a different geometric perspective, and each audio device captures the voice of the participant at the site. Scene proxies are then generated from the streams of sensor data (block 102). In general, a scene proxy geometrically describes the local site including the participant on a frame by frame basis. A frame of scene proxies refers to the geometric and texture data needed to render a frame of the aforementioned virtual scene. Examples of a scene proxy include a stream of depth map images of the captured scene. A scene proxy can also include a stream of calibrated point cloud reconstructions of the captured scene. A scene proxy can further include one or more types of high order geometric models such as planes, billboards, and existing (i.e., previously created) generic object models (e.g., human body models) which can be either modified, or animated, or both. A scene proxy can also include other high fidelity proxies such as a stream of mesh models of the captured scene, and the like. Further, more than one type of scene proxy can be employed in a frame of scene proxies. - In addition to generating scene proxies, the streams of sensor data are used, along with a face tracking technique, to identify a 3D point representing the location of the participant in the local site for each frame of the scene proxies (block 104). In one embodiment, this 3D point representing the location of the participant in the local site is a 3D point representing the location of the participant's head in the local site. In another embodiment, the 3D point representing the location of the participant in the local site is a 3D point representing the location of the participant's mouth in the local site.
- The scene proxies representing each frame are transmitted in the order generated over a data communication network to the remote site or sites, along with, audio data representing the local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted, and the 3D point coordinates representing the location of the participant in the local site identified for the frame of scene proxies currently being transmitted (block 106). It is noted that the “if any” caveat refers to the fact that the local participant may not speak during the frame time period alluded to above.
- The foregoing process actions provided the data used at a remote site to perform audio source positioning. Thus, the foregoing action can be said to facilitate audio source positioning at a remote site. If audio source positioning is to be implemented at the local site as well, then the same type of data is provided from a remote site or sites. Referring to
FIG. 2 , the process actions generally employed for audio source positioning in a video teleconference or telepresence session at the local site will now be described. First, scene proxies representing successive scene proxy frames are receiving from each remote site participating in the conference over a data communication network (block 200). In addition, other data is received from at least one remote site. It is noted that while it will be assumed in the following description that each remote site sends this other data (which is used to implement the audio source positioning), that may not be the case. If a remote site does not send the other data, then any audio received from that site is played in the normal manner in the local site (in one embodiment, this may be using monophonic playback). For each remote site sending the other data, this data includes audio data representing the remote site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the remote site, and a 3D point representing the location of the participant in the remote site (block 202). - For each frame of scene proxies received from a remote site if there is only one remote site, or for each group of frames of scene proxies contemporaneously received from remote sites if there are multiple remote sites, a frame of a virtual scene is rendered (block 204). As indicated previously, the virtual scene frame includes a depiction of each of the remote site participants from the last-received frame or frames of scene proxies. The rendered virtual frame is then displayed to the local site participant via a display device (block 206). It is noted that the term contemporaneously used above is not to be taken literally. For example, in one implementation, the frames of scene proxies coming from multiple remote sites are considered contemporaneous if they arrive before the next frame from any of the sites.
- In addition, for each remote site participant depicted in the last-rendered frame of the virtual scene that is resident at a remote site that sent audio data representing the remote site participant's voice and the 3D point representing the location of the participant in the remote site, a spatial audio technique is employed to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted (block 208). This is accomplished using conventional methods given the received audio data and 3D point representing the location of the participant in the remote site.
- As mentioned previously, the 3D point representing the location of the participant in the remote site can be the person's head or mouth. More particularly, in one embodiment, the 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's mouth in the remote site when the mouth of that participant is visible in the last-rendered frame of the virtual scene. In another embodiment, the 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's head in the remote site when the mouth of that participant is not visible by the sensor used to determine that 3D point.
- With regard to the foregoing action of rendering the frames of the virtual scene, it is noted that as part of this process, for each remote site, a first transform is computed that converts 3D locations in the remote site to points in the frame of the virtual scene. In addition, the action of displaying a rendered frame to the local site participant involves the use of a second transform that converts points in a frame of the virtual scene to screen coordinates on the local site's display device. These transforms are used in the aforementioned spatial audio technique. More particularly, referring to
FIG. 3 , in one embodiment, the first transform is used to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene (block 300), and the second transform is employed to convert the point in the last-rendered frame of the virtual scene representing the remote participant location to screen coordinates on the local site's display device (block 302). A third transform is also computed that converts screen coordinates in the display device to 3D points in the local site (block 304). It is noted that this transform need only be computed once, unless the display device is moved—at which point it would be re-computed. The third transform is used to compute the 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device (block 306). The spatial audio technique and a plurality of audio speakers resident in the local site are then used to make it seem to the local site participant that the voice of the remote site participants are respectively emanating from the computed 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device (block 308). - It is noted that the location of the local participant within the local site has an effect on how audio source positioning is accomplished. In general, a parallax effect results when the local site residence moves and in one embodiment, the spatial audio technique compensates based on the current location of the local participant. Generally, the head of the local site participant is tracked and periodically a 3D point representative of the location of the local site participant's head in the local site is computed. The point is then used in the audio source positioning. More particularly, in one embodiment, each time a 3D point representative of the location of the local site participant's head in the local site is computed, the spatial audio technique is used to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted taking into consideration the last-computed 3D point representative of the location of the local site participant's head.
- It is noted that to provide a more realistic experience for the local participant, the rate at which 3D points representative of the location of the local site participant's head in the local site are computed should be high. In one embodiment, this rate exceeds the rate at which frames of the virtual scene are calculated. For example, a typical virtual scene frame rate is 30 frames per second (fps). In one implementation, the rate at which 3D points representative of the location of the local site participant's head are rendered is four times the virtual frame rate-namely 120 times per second. Thus, while the content of the scene may only be updated at 30 fps, the depiction of the scene from the point of view of the local participant is updated at 120 fps. In other words the scene is calculated at 30 fps, but rendered at 120 fps.
- The audio source positioning technique embodiments described so far make it seem to a participant viewing a rendering of the virtual scene that the voice of another participant is emanating from a location on the display device where the remote participant is depicted. However, there is another enhancement that can be made to make the video teleconference or telepresence session experience even more like the viewing participant is actually present with the other participant(s) in the virtual scene. This enhancement involves simulating the reverberations a participant's voice would create in the virtual scene (e.g., reverberations of the sound against the virtual walls or other virtual objects in the scene) and playing these reverberation in the participant's site.
- This reverberation enhancement can be accomplished at the local site, given, from each remote site, the 3D point representing the location of the remote participant in the remote site and a modified version of the audio data representing the remote site participant's voice site. The modification to the audio data involves suppressing reverberations and noise in the audio captured at the remote site. While this modification can be performed at the local site given certain information about the remote site, the more efficient method would be for the reverberations and noise in the audio captured at the remote site to be suppressed in the audio data prior to the data being sent to the local site. In either case, conventional suppression techniques are employed to accomplish the modification.
- Assuming the above-described modified audio data and the 3D point representing the location of the remote participant has been received from a remote site, one general embodiment of the audio source positioning technique that adds reverberation on a frame-by-frame basis involves, from the viewpoint of the local site, using the local site computing device to perform the following process actions. First, the previously-described first transform computed to convert 3D locations in the remote site to points in the last-rendered frame of the virtual scene is employed to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene (block 400). In this embodiment, the 3D point representing the location of the remote participant in the remote site corresponds to a 3D point representing the location of the remote participant's mouth in the remote site. Next, the orientation of the remote site participant's face in the virtual scene, as depicted in the last-rendered virtual scene frame, is identified (block 402). Conventional methods are employed to accomplish this task. The direction that the remote participant's voice projects in the virtual space from the point in the last-rendered frame of the virtual scene that corresponds to the 3D point representing the location of the remote participant's mouth is then computed based on the orientation of the remote site participant's face in the virtual scene (block 404). In addition, the reverberation characteristics of the virtual scene, as depicted in the last-rendered virtual scene frame, are estimated (block 406).
- Given the point representing the location of the remote participant's mouth in the virtual scene and the computed direction, reverberation audio data is then computed that when added to the received audio data simulates the reverberations of the remote participant's voice in the virtual space for the current frame (block 408). This computed reverberation audio data is then added into the audio played in the local site in conjunction with the display of the current virtual scene frame (block 410).
- The audio source positioning technique embodiments described herein can be employed in a variety of video conferencing or telepresence applications. Generally, any video conferencing or telepresence application that involves the generation and display of a virtual scene for each participant can be enhanced using the audio source positioning technique embodiments described herein.
- One exemplary video conferencing or telepresence application supports the generation, storage, distribution, and presentation of a virtual scene (such as a virtual conference room). The exemplary video conferencing or telepresence application can support various types of traditional, single viewpoint virtual scene presentations in which the viewpoint of the scene is fixed when the video is recorded/captured and this viewpoint cannot be controlled or changed by a participant while they are viewing the virtual scene. In other words, in a single viewpoint virtual scene the viewpoint of the scene is fixed and cannot be modified when the scene is being rendered and displayed to a participant. However, the exemplary video conferencing or telepresence application can support various types of free viewpoint video in which the viewpoint of the virtual scene can be interactively controlled and changed by a participant at will while they are viewing the scene. In other words, in a free viewpoint video a participant can interactively generate different viewpoints of the scene on-the-fly when the virtual scene is being rendered and displayed.
-
FIG. 5 illustrates an exemplary video conferencing or telepresence application processing pipeline in which the audio source positioning technique embodiments described herein can be implemented. As exemplified inFIG. 5 , theexemplary processing pipeline 500 starts with ageneration stage 502 during which, and generally speaking, the aforementioned scene proxies of a site are generated. Thegeneration stage 502 includes acapture sub-stage 504 and aprocessing sub-stage 506 whose operations will now be described in more detail. - Referring again to
FIG. 5 , thecapture sub-stage 504 of theprocessing pipeline 500 generally captures the scene in a site including theparticipant 508 and generates one or more streams of sensor data that represent the scene. More particularly, during thecapture sub-stage 504, an arrangement of sensors is used to capture the scene, where the arrangement includes a plurality of video capture devices 510 (as will be described shortly) and one or more audio capture devices 512 (such as a microphone or microphone array). The arrangement of sensors generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. These streams of sensor data are input from the sensors and calibrated, and then output to theprocessing sub-stage 506. - Referring again to
FIG. 5 , theprocessing sub-stage 506 inputs the streams of sensor data from thecapture sub-stage 504, and then generates scene proxies which geometrically describe the captured scene as a function of time from the streams of sensor data. These scene proxies also include texture data for rendering the virtual scene. The scene proxies are output to a storage anddistribution stage 514, which stores them, along with the aforementioned audio data captured using theaudio capture devices 512. Typically, thegeneration stage 502 is implemented on one or a collection of computing devices at a participant site (such as the local site shown) and apresentation stage 516 of thepipeline 500 is implemented on one or more computing devices resident at the other participant sites (such as the exemplary remote site shown inFIG. 5 ). The storage anddistribution stage 514 distributes the scene proxies and audio data to the other participating sites by transmitting over whatever one or moredata communication networks 518 to which the participant site computing devices are connected. It is noted that each participant site has ageneration stage 502 and storage and distribution stage 514 (although only those associated with the aforementioned local site are shown inFIG. 5 ). - Referring again to
FIG. 5 and generally speaking, apresentation stage 516 of theprocessing pipeline 500 is resident at each of the other participating sites (one of which is shown). Thepresentation stage 516 inputs the scene proxies and audio data that were transmitted from the storage anddistribution stage 514 resident at each of the other sites (again one of which is shown inFIG. 5 ), and presents the participant at the receiving site with a rendering of the scene proxies in the form of the previously described virtual scene frames. Thepresentation stage 516 includes arendering sub-stage 520 and a participantviewing experience sub-stage 522 whose operations will now be described in more detail. - The
rendering sub-stage 520 of theprocessing pipeline 500 inputs the scene proxies from the storage anddistribution stage 514, and then generates successive frames of the virtual scene (one of which 524 is shown inFIG. 5 ). If more than one other participant site is involved, then generating successive frames of the virtual scene entails therendering sub-stage 520 inputting the scene proxies from the storage anddistribution stage 514 operating at each of the other sites, and combining the proxy data using conventional methods to create an aggregate virtual scene (such as 524). Each virtual scene frame generated is then output to the participantviewing experience sub-stage 522 of thepipeline 500. The participantviewing experience sub-stage 522 inputs each frame from therendering sub-stage 520, and then displays it on adisplay device 526 for viewing by the participant. In addition, the audio source positioning technique embodiments described herein are implemented as described previously to provided spatialized audio in association with each frame displayed using two or moreaudio speakers 528 located in the receiving site. - It is noted that in a video conferencing or telepresence application that can support various types of free viewpoint video in which the viewpoint of the virtual scene can be interactively controlled and changed by a participant at will while they are viewing the scene, in addition to the foregoing, the
rendering sub-stage 520 inputs the scene proxies output from the storage and distribution stage 514 (or stages if multiple other sites are involved), and then generates a frame exhibiting a current synthetic viewpoint. The current synthetic viewpoint is either a default viewpoint, or if the participant has specified a viewpoint, is the last-specified viewpoint. The participant-specified viewpoint comes from the participantviewing experience sub-stage 522, which inputs it from the participant via a user interface. - Referring again to
FIG. 5 , this section provides an overview description, in simplified form, of two implementations of thevideo capture devices 510 of thecapture sub-stage 504. It will be appreciated that the implementations described in this section are merely exemplary. Many other implementations of are also possible which use other types of sensor arrangements. - In one implementation, the
video capture devices 510 include a circular arrangement of eight genlocked sensors used to capture a site which includes the participant, where each of the sensors has a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the sensors each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream. The pair of infrared image streams and the color image stream generated by each sensor are used to generate different depth map image streams. The different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene. These point cloud reconstructions can then used to generate a stream of mesh models of the scene. A conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each sensor and map this texture data to the stream of mesh models of the scene. The combination of the mesh models and texture data, among other information, forms the scene proxies. Finally, these sensors and their data streams are also used in a face tracking process to identify the 3D location of the participant (which as described above can be the location of the participant's head or mouth). - In another implementation, the
video capture devices 510 include four genlocked visible light video cameras used to capture a site which includes the participant, where the cameras are evenly placed around the site. Accordingly, the cameras each generate a different stream of video data which includes a color image stream. An existing 3D geometric model of a human body can be used in the scene proxies as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows. A conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model. The combination of the kinematically articulated model and texture data, among other information, forms the scene proxies. Here again, the cameras and their video data streams are also used in a face tracking process to identify the 3D location of the participant (which can be the location of the participant's head or mouth). - The audio source positioning technique embodiments described herein are operational within numerous types of general purpose or special purpose computing system environments or configurations.
FIG. 6 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the audio source positioning technique embodiments, as described herein, may be implemented. It should be noted that any boxes that are represented by broken or dashed lines inFIG. 6 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document. - For example,
FIG. 6 shows a general system diagram showing asimplified computing device 10. Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers, server computers, hand-held computing devices, laptop or mobile computers, communications devices such as cell phones and PDA's, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, etc. - To allow a device to implement the audio source positioning technique embodiments described herein, the device should have a sufficient computational capability and system memory to enable basic computational operations. In particular, as illustrated by
FIG. 6 , the computational capability is generally illustrated by one or more processing unit(s) 12, and may also include one ormore GPUs 14, either or both in communication withsystem memory 16. Note that that the processing unit(s) 12 of the general computing device may be specialized microprocessors, such as a DSP, a VLIW, or other micro-controller, or can be conventional CPUs having one or more processing cores, including specialized GPU-based cores in a multi-core CPU. - In addition, the simplified computing device of
FIG. 6 may also include other components, such as, for example, acommunications interface 18. The simplified computing device ofFIG. 6 may also include one or more conventional computer input devices 20 (e.g., pointing devices, keyboards, audio input devices, video input devices, haptic input devices, devices for receiving wired or wireless data transmissions, etc.). The simplified computing device ofFIG. 6 may also include other optional components, such as, for example, one or more conventional display device(s) 24 and other computer output devices 22 (e.g., audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, etc.). Note that typical communications interfaces 18,input devices 20,output devices 22, andstorage devices 26 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein. - The simplified computing device of
FIG. 6 may also include a variety of computer readable media. Computer readable media can be any available media that can be accessed bycomputer 10 viastorage devices 26 and includes both volatile and nonvolatile media that is either removable 28 and/or non-removable 30, for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as DVD's, CD's, floppy disks, tape drives, hard drives, optical drives, solid state memory devices, RAM, ROM, EEPROM, flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices. - Retention of information such as computer-readable or computer-executable instructions, data structures, program modules, etc., can also be accomplished by using any of a variety of the aforementioned communication media to encode one or more modulated data signals or carrier waves, or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism. Note that the terms “modulated data signal” or “carrier wave” generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, RF, infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of the any of the above should also be included within the scope of communication media.
- Further, software, programs, and/or computer program products embodying some or all of the various audio source positioning technique embodiments described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
- Finally, the audio source positioning technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The embodiments described herein may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Still further, the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
- While the audio source positioning technique embodiments described so far involve only one participant at each site, in one embodiment it is possible to have any number of participants at a site, as long as a separate audio stream and separate location information are sent for each participant. In general, the operation is the same as described for a site sending audio source positioning data to another site or sites, except that audio data representing a remote site participant's voice and the 3D point representing the location of the participant in the remote site is sent for each participant at the site. At a site receiving this data, the virtual scene is rendered so as to include all the remote site participants as before (including each participant at a site having multiple participants). If the receiving site has one participant, then the spatial audio technique employed to spatialize the audio is accomplished in the same manner as described previously. However, if the receiving site has more than one participant, the sound is separately spatialized as described previously for each participant. This can be easily accomplished if the participants each wear audio earphones (i.e., the plurality of audio speakers at the site are sets of headphones) and a spatial audio technique designed for earphones is employed.
- It is noted that any or all of the aforementioned embodiments throughout the description may be used in any combination desired to form additional hybrid embodiments. In addition, although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (20)
1. A computer-implemented process for audio source positioning in a video teleconference or telepresence session between a local site and one or more remote sites, each of said sites having one or more participants, comprising for the local site:
using a computing device to perform the following process actions:
receiving from each remote site, scene proxies representing successive scene proxy frames transmitted by a remote site over a data communication network;
receiving from at least one remote site, along with each frame of scene proxies received from the site,
audio data representing each remote site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the remote site, and
a 3D point representing the location of each participant in the remote site;
for each frame of scene proxies received from a remote site if there is only one remote site sending frames, or for each group of frames of scene proxies contemporaneously received from remote sites if there are multiple remote sites sending frames,
rendering a frame of a virtual scene comprising a depiction of each of the remote site participants from the last-received frame or frames of scene proxies, and
displaying the rendered frame to the local site participant or participants via a display device;
for each remote site participant depicted in the last-rendered frame of the virtual scene that is resident at a remote site that sent audio data representing the remote site participant's voice and the 3D point representing the location of the participant in the remote site, employing a spatial audio technique to make it seem to each local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted using the audio data and the 3D point representing the location of the participant in the remote site that was received from the remote site in conjunction with the last-received frame of scene proxies.
2. The process of claim 1 , wherein said 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's mouth in that remote site for each remote site participant depicted in the last-rendered frame of a virtual scene whose mouth is visible.
3. The process of claim 1 , wherein said 3D point representing the location of a participant in a remote site is a 3D point representing the location of the participant's head in that remote site for each remote site participant depicted in the last-rendered frame of a virtual scene whose mouth is not visible.
4. The process of claim 1 , wherein the process action of rendering a frame of the virtual scene, comprises, for each remote site, computing a first transform that converts 3D locations in the remote site to points in the frame of the virtual scene, and wherein the process action of displaying the rendered frame to the local site participant via a display device, comprises computing a second transform that converts points in a frame of the virtual scene to screen coordinates on the display device.
5. The process of claim 4 , wherein the process action of employing a spatial audio technique to make it seem to the local site participant that the voice of a remote site participant is emanating from a location on the display device where the remote participant is depicted using the audio data and the 3D point representing the location of a remote participant in the remote site that was received in conjunction with the last-received frame of scene proxies from the remote site, comprises the actions of:
employing the first transform computed to convert 3D locations in the remote site to points in the last-rendered frame of the virtual scene, to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene;
employing the second transform computed to convert points in a frame of the virtual scene to screen coordinates on the display device, to convert the point in the last-rendered frame of the virtual scene representing the remote participant location to screen coordinates on the display device;
employing a third transform that converts screen coordinates in the display device to 3D points in the local site to compute the 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device; and
employing said spatial audio technique and a plurality of audio speakers resident in the local site to make it seem to the local site participant that the voice of the remote site participant is emanating from the computed 3D point in the local site of the screen coordinates representing the location of the remote participant depicted on the display device.
6. The process action of claim 1 , wherein the process action of employing a spatial audio technique to make it seem to a local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted, further comprises the actions of:
tracking the head of the local site participant and periodically computing a 3D point representative of the location of the local site participant's head in the local site; and
each time a 3D point representative of the location of the local site participant's head in the local site is computed, employing the spatial audio technique to make it seem to the local site participant that the voice of the remote site participant is emanating from a location on the display device where the remote participant is depicted taking into consideration the last-computed 3D point representative of the location of the local site participant's head.
7. The process of claim 6 , wherein the process action of periodically computing a 3D point representative of the location of a local site participant's head in the local site, comprises computing a 3D point representative of the location of the local site participant's head in the local site at a rate that exceeds the rate at which frames of the virtual scene are computed.
8. The process of claim 1 , wherein said audio data representing a remote site participant's voice received from a remote site has been modified so as to suppress reverberations and noise in the audio captured at that remote site.
9. The process of claim 8 , wherein the process action of rendering a frame of the virtual scene, comprises for each remote site, computing a first transform that converts 3D locations in the remote site to points in the frame of the virtual scene.
10. The process of claim 9 , further comprising the process actions of:
for each remote site participant depicted in the last-rendered frame of the virtual scene that is resident at a remote site that sent audio data representing the remote site participant's voice and the 3D point representing the location of the participant in the remote site,
employing the first transform computed to convert 3D locations in the remote site to points in the last-rendered frame of the virtual scene, to convert the 3D point representing the location of the remote participant in the remote site to a point in the last-rendered frame of the virtual scene, wherein said 3D point representing the location of the remote participant in the remote site corresponds to a 3D point representing the location of the remote participant's mouth in the remote site,
identifying the orientation of the remote site participant's face in the virtual scene as depicted in the last-rendered virtual scene frame,
computing the direction from the point in the last-rendered frame of the virtual scene that corresponds to the 3D point representing the location of the remote participant's mouth in the remote site that the remote participant's voice projects in the virtual space based on the orientation of the remote site participant's face in the virtual scene,
estimating the reverberation characteristics of the virtual scene as depicted in the last-rendered virtual scene frame,
computing reverberation audio data that when added to the received audio data simulates the reverberations of the remote participant's voice in the virtual scene as spoken from the point representing the location of the remote participant's mouth in the virtual scene in the computed direction, and
adding the computed reverberation audio data into audio played in the local site in conjunction with the display of the virtual scene frame.
11. A computer-implemented process for facilitating audio source positioning at a remote site in a video teleconference or telepresence session between a local site and the remote site, each of said sites having one or more participants, comprising for the local site:
using a computing device to perform the following process actions:
inputting streams of sensor data generated from an arrangement of sensors that capture participant data, said arrangement comprising a plurality of video and audio devices which generate a plurality of streams of sensor data, each video capture device of which captures the participant from a different geometric perspective, and each audio capture device of which captures the voice of the participant at the local site;
generating scene proxies from the streams of sensor data which geometrically describes the local site including the participant on a frame by frame basis;
employing the streams of sensor data and a face tracking technique to identify a 3D point representing the location of the participant in the local site for each frame of the scene proxies; and
transmitting the scene proxies representing each frame in the order generated over a data communication network to the remote site, along with,
audio data representing each local site participant's voice captured, if any, during the time period between the frame currently being transmitted and next frame of scene proxies to be transmitted, and
the 3D point coordinates representing the location of each participant in the local site identified for the frame currently being transmitted.
12. The process of claim 11 , wherein said 3D point representing the location of a participant in the local site is a 3D point representing the location of the participant's head in the local site.
13. The process of claim 11 , wherein said 3D point representing the location of a participant in the local site is a 3D point representing the location of the participant's mouth in the local site.
14. The process of claim 11 , wherein prior to performing the process action of transmitting audio data representing a local site participant's voice, performing an action of suppressing reverberations and noise in the audio data.
15. A computer-implemented process for audio source positioning in a video teleconference or telepresence session between two non co-located sites, each of said sites having one participant, comprising for a first of the two sites:
using a computing device to perform the following process actions:
receiving from the other site, scene proxies representing successive scene proxy frames transmitted by the other site over a data communication network, along with for each scene proxy frame received,
audio data representing the other site participant's voice captured, if any, during the time period between the currently received frame and the next frame of scene proxies to be received from the other site, and
a 3D point representing the location of the participant in the other site;
for each frame of scene proxies received from the other site, rendering a frame of a virtual scene comprising a depiction of the other site's participant from the last-received frame of scene proxies and displaying the rendered frame to the first site participant via a display device; and
whenever audio data representing the other site participant's voice is received, employing a spatial audio technique to make it seem to the first site participant that the voice of the other site participant is emanating from a location on the display device where the other site participant is depicted using the audio data and the 3D point representing the location of the participant in the other site that was received from the other site in conjunction with the last-received frame of scene proxies.
16. The process of claim 15 , wherein said 3D point representing the location of the participant in the other site is a 3D point representing the location of the participant's mouth in the other site whenever the other site participant's mouth is visible in the last-rendered frame of a virtual scene.
17. The process of claim 15 , wherein said 3D point representing the location of the participant in the other site is a 3D point representing the location of the participant's head in the other site whenever the other site participant's mouth is not visible in the last-rendered frame of a virtual scene.
18. The process of claim 15 , wherein the process action of rendering a frame of the virtual scene, comprises, computing a first transform that converts 3D locations in the other site to points in the frame of the virtual scene, and wherein the process action of displaying the rendered frame to the first site participant via a display device, comprises computing a second transform that converts points in a frame of the virtual scene to screen coordinates on the display device.
19. The process of claim 18 , wherein the process action of employing a spatial audio technique to make it seem to the first site participant that the voice of the other site participant is emanating from a location on the display device where the other site participant is depicted using the audio data and the 3D point representing the location of the other participant in the other site that was received in conjunction with the last-received frame of scene proxies, comprises the actions of:
employing the first transform computed to convert 3D locations in the other site to points in the last-rendered frame of the virtual scene, to convert the 3D point representing the location of the other site participant in the other site to a point in the last-rendered frame of the virtual scene;
employing the second transform computed to convert points in a frame of the virtual scene to screen coordinates on the display device, to convert the point in the last-rendered frame of the virtual scene representing the other participant's location to screen coordinates on the display device;
employing a third transform that converts screen coordinates in the display device to 3D points in the first site to compute the 3D point in the first site of the screen coordinates representing the location of the other site participant depicted on the display device; and
employing said spatial audio technique and a plurality of audio speakers resident in the first site to make it seem to the first site participant that the voice of the other site participant is emanating from the computed 3D point in the first site of the screen coordinates representing the location of the other participant depicted on the display device.
20. The process action of claim 15 , wherein the process action of employing a spatial audio technique to make it seem to the first site participant that the voice of the other site participant is emanating from a location on the display device where the other participant is depicted, further comprises the actions of:
tracking the head of the first site participant and periodically computing a 3D point representative of the location of the first site participant's head in the first site; and
each time a 3D point representative of the location of the first site participant's head in the first site is computed, employing the spatial audio technique to make it seem to the first site participant that the voice of the other site participant is emanating from a location on the display device where the other participant is depicted taking into consideration the last-computed 3D point representative of the location of the first site participant's head.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/599,678 US20130321566A1 (en) | 2012-05-31 | 2012-08-30 | Audio source positioning using a camera |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261653983P | 2012-05-31 | 2012-05-31 | |
US13/599,678 US20130321566A1 (en) | 2012-05-31 | 2012-08-30 | Audio source positioning using a camera |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130321566A1 true US20130321566A1 (en) | 2013-12-05 |
Family
ID=49669652
Family Applications (10)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/566,877 Active 2034-02-16 US9846960B2 (en) | 2012-05-31 | 2012-08-03 | Automated camera array calibration |
US13/588,917 Abandoned US20130321586A1 (en) | 2012-05-31 | 2012-08-17 | Cloud based free viewpoint video streaming |
US13/598,536 Abandoned US20130321593A1 (en) | 2012-05-31 | 2012-08-29 | View frustum culling for free viewpoint video (fvv) |
US13/599,170 Abandoned US20130321396A1 (en) | 2012-05-31 | 2012-08-30 | Multi-input free viewpoint video processing pipeline |
US13/599,263 Active 2033-02-25 US8917270B2 (en) | 2012-05-31 | 2012-08-30 | Video generation using three-dimensional hulls |
US13/598,747 Abandoned US20130321575A1 (en) | 2012-05-31 | 2012-08-30 | High definition bubbles for rendering free viewpoint video |
US13/599,436 Active 2034-05-03 US9251623B2 (en) | 2012-05-31 | 2012-08-30 | Glancing angle exclusion |
US13/599,678 Abandoned US20130321566A1 (en) | 2012-05-31 | 2012-08-30 | Audio source positioning using a camera |
US13/614,852 Active 2033-10-29 US9256980B2 (en) | 2012-05-31 | 2012-09-13 | Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds |
US13/790,158 Abandoned US20130321413A1 (en) | 2012-05-31 | 2013-03-08 | Video generation using convict hulls |
Family Applications Before (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/566,877 Active 2034-02-16 US9846960B2 (en) | 2012-05-31 | 2012-08-03 | Automated camera array calibration |
US13/588,917 Abandoned US20130321586A1 (en) | 2012-05-31 | 2012-08-17 | Cloud based free viewpoint video streaming |
US13/598,536 Abandoned US20130321593A1 (en) | 2012-05-31 | 2012-08-29 | View frustum culling for free viewpoint video (fvv) |
US13/599,170 Abandoned US20130321396A1 (en) | 2012-05-31 | 2012-08-30 | Multi-input free viewpoint video processing pipeline |
US13/599,263 Active 2033-02-25 US8917270B2 (en) | 2012-05-31 | 2012-08-30 | Video generation using three-dimensional hulls |
US13/598,747 Abandoned US20130321575A1 (en) | 2012-05-31 | 2012-08-30 | High definition bubbles for rendering free viewpoint video |
US13/599,436 Active 2034-05-03 US9251623B2 (en) | 2012-05-31 | 2012-08-30 | Glancing angle exclusion |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/614,852 Active 2033-10-29 US9256980B2 (en) | 2012-05-31 | 2012-09-13 | Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds |
US13/790,158 Abandoned US20130321413A1 (en) | 2012-05-31 | 2013-03-08 | Video generation using convict hulls |
Country Status (1)
Country | Link |
---|---|
US (10) | US9846960B2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9888333B2 (en) * | 2013-11-11 | 2018-02-06 | Google Technology Holdings LLC | Three-dimensional audio rendering techniques |
CN108881784A (en) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Virtual scene implementation method, device, terminal and server |
CN109618122A (en) * | 2018-12-07 | 2019-04-12 | 合肥万户网络技术有限公司 | A kind of virtual office conference system |
US10567185B2 (en) * | 2015-02-03 | 2020-02-18 | Dolby Laboratories Licensing Corporation | Post-conference playback system having higher perceived quality than originally heard in the conference |
US20200145753A1 (en) * | 2018-11-01 | 2020-05-07 | Sennheiser Electronic Gmbh & Co. Kg | Conference System with a Microphone Array System and a Method of Speech Acquisition In a Conference System |
WO2021077090A1 (en) * | 2019-10-18 | 2021-04-22 | Msg Entertainment Group, Llc. | Modifying audio according to visual images of a remote venue |
US11081127B2 (en) | 2018-01-18 | 2021-08-03 | Samsung Electronics Co., Ltd. | Electronic device and control method therefor |
US11202162B2 (en) | 2019-10-18 | 2021-12-14 | Msg Entertainment Group, Llc | Synthesizing audio of a venue |
US20240187553A1 (en) * | 2020-04-06 | 2024-06-06 | Eingot Llc | Integration of remote audio into a performance venue |
Families Citing this family (251)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8400494B2 (en) * | 2005-10-11 | 2013-03-19 | Primesense Ltd. | Method and system for object reconstruction |
US8866920B2 (en) | 2008-05-20 | 2014-10-21 | Pelican Imaging Corporation | Capturing and processing of images using monolithic camera array with heterogeneous imagers |
US11792538B2 (en) | 2008-05-20 | 2023-10-17 | Adeia Imaging Llc | Capturing and processing of images including occlusions focused on an image sensor by a lens stack array |
US9892546B2 (en) * | 2010-06-30 | 2018-02-13 | Primal Space Systems, Inc. | Pursuit path camera model method and system |
US20150373153A1 (en) * | 2010-06-30 | 2015-12-24 | Primal Space Systems, Inc. | System and method to reduce bandwidth requirement for visibility event packet streaming using a predicted maximal view frustum and predicted maximal viewpoint extent, each computed at runtime |
US8878950B2 (en) | 2010-12-14 | 2014-11-04 | Pelican Imaging Corporation | Systems and methods for synthesizing high resolution images using super-resolution processes |
EP2761534B1 (en) | 2011-09-28 | 2020-11-18 | FotoNation Limited | Systems for encoding light field image files |
US9001960B2 (en) * | 2012-01-04 | 2015-04-07 | General Electric Company | Method and apparatus for reducing noise-related imaging artifacts |
US9300841B2 (en) * | 2012-06-25 | 2016-03-29 | Yoldas Askan | Method of generating a smooth image from point cloud data |
DK4296963T3 (en) | 2012-08-21 | 2025-03-03 | Adeia Imaging Llc | METHOD FOR DEPTH DETECTION IN IMAGES TAKEN WITH ARRAY CAMERAS |
US10079968B2 (en) | 2012-12-01 | 2018-09-18 | Qualcomm Incorporated | Camera having additional functionality based on connectivity with a host device |
US9519968B2 (en) * | 2012-12-13 | 2016-12-13 | Hewlett-Packard Development Company, L.P. | Calibrating visual sensors using homography operators |
US9224227B2 (en) * | 2012-12-21 | 2015-12-29 | Nvidia Corporation | Tile shader for screen space, a method of rendering and a graphics processing unit employing the tile shader |
US8866912B2 (en) | 2013-03-10 | 2014-10-21 | Pelican Imaging Corporation | System and methods for calibration of an array camera using a single captured image |
US9144905B1 (en) * | 2013-03-13 | 2015-09-29 | Hrl Laboratories, Llc | Device and method to identify functional parts of tools for robotic manipulation |
US9578259B2 (en) | 2013-03-14 | 2017-02-21 | Fotonation Cayman Limited | Systems and methods for reducing motion blur in images or video in ultra low light with array cameras |
US9445003B1 (en) * | 2013-03-15 | 2016-09-13 | Pelican Imaging Corporation | Systems and methods for synthesizing high resolution images using image deconvolution based on motion and depth information |
EP2983140A4 (en) * | 2013-04-04 | 2016-11-09 | Sony Corp | Display control device, display control method and program |
US9191643B2 (en) | 2013-04-15 | 2015-11-17 | Microsoft Technology Licensing, Llc | Mixing infrared and color component data point clouds |
US10262462B2 (en) | 2014-04-18 | 2019-04-16 | Magic Leap, Inc. | Systems and methods for augmented and virtual reality |
US9208609B2 (en) * | 2013-07-01 | 2015-12-08 | Mitsubishi Electric Research Laboratories, Inc. | Method for fitting primitive shapes to 3D point clouds using distance fields |
WO2015010098A1 (en) * | 2013-07-19 | 2015-01-22 | Google Inc. | Asymmetric sensor array for capturing images |
US10140751B2 (en) * | 2013-08-08 | 2018-11-27 | Imagination Technologies Limited | Normal offset smoothing |
CN104424655A (en) * | 2013-09-10 | 2015-03-18 | 鸿富锦精密工业(深圳)有限公司 | System and method for reconstructing point cloud curved surface |
JP6476658B2 (en) * | 2013-09-11 | 2019-03-06 | ソニー株式会社 | Image processing apparatus and method |
US9286718B2 (en) * | 2013-09-27 | 2016-03-15 | Ortery Technologies, Inc. | Method using 3D geometry data for virtual reality image presentation and control in 3D space |
US10242400B1 (en) | 2013-10-25 | 2019-03-26 | Appliance Computing III, Inc. | User interface for image-based rendering of virtual tours |
US10591969B2 (en) | 2013-10-25 | 2020-03-17 | Google Technology Holdings LLC | Sensor-based near-field communication authentication |
US10119808B2 (en) | 2013-11-18 | 2018-11-06 | Fotonation Limited | Systems and methods for estimating depth from projected texture using camera arrays |
US9426361B2 (en) | 2013-11-26 | 2016-08-23 | Pelican Imaging Corporation | Array camera configurations incorporating multiple constituent array cameras |
EP2881918B1 (en) * | 2013-12-06 | 2018-02-07 | My Virtual Reality Software AS | Method for visualizing three-dimensional data |
US9233469B2 (en) * | 2014-02-13 | 2016-01-12 | GM Global Technology Operations LLC | Robotic system with 3D box location functionality |
US9530226B2 (en) * | 2014-02-18 | 2016-12-27 | Par Technology Corporation | Systems and methods for optimizing N dimensional volume data for transmission |
WO2015130320A1 (en) | 2014-02-28 | 2015-09-03 | Hewlett-Packard Development Company, L.P. | Calibration of sensors and projector |
US9396586B2 (en) | 2014-03-14 | 2016-07-19 | Matterport, Inc. | Processing and/or transmitting 3D data |
US10600245B1 (en) * | 2014-05-28 | 2020-03-24 | Lucasfilm Entertainment Company Ltd. | Navigating a virtual environment of a media content item |
CN104089628B (en) * | 2014-06-30 | 2017-02-08 | 中国科学院光电研究院 | Self-adaption geometric calibration method of light field camera |
US11051000B2 (en) | 2014-07-14 | 2021-06-29 | Mitsubishi Electric Research Laboratories, Inc. | Method for calibrating cameras with non-overlapping views |
US10169909B2 (en) * | 2014-08-07 | 2019-01-01 | Pixar | Generating a volumetric projection for an object |
WO2016037014A1 (en) * | 2014-09-03 | 2016-03-10 | Nextvr Inc. | Methods and apparatus for capturing, streaming and/or playing back content |
US11205305B2 (en) | 2014-09-22 | 2021-12-21 | Samsung Electronics Company, Ltd. | Presentation of three-dimensional video |
US10547825B2 (en) * | 2014-09-22 | 2020-01-28 | Samsung Electronics Company, Ltd. | Transmission of three-dimensional video |
WO2016054089A1 (en) | 2014-09-29 | 2016-04-07 | Pelican Imaging Corporation | Systems and methods for dynamic calibration of array cameras |
US9600892B2 (en) * | 2014-11-06 | 2017-03-21 | Symbol Technologies, Llc | Non-parametric method of and system for estimating dimensions of objects of arbitrary shape |
US10154246B2 (en) * | 2014-11-20 | 2018-12-11 | Cappasity Inc. | Systems and methods for 3D capturing of objects and motion sequences using multiple range and RGB cameras |
US9396554B2 (en) | 2014-12-05 | 2016-07-19 | Symbol Technologies, Llc | Apparatus for and method of estimating dimensions of an object associated with a code in automatic response to reading the code |
DE102014118989A1 (en) * | 2014-12-18 | 2016-06-23 | Connaught Electronics Ltd. | Method for calibrating a camera system, camera system and motor vehicle |
US11019330B2 (en) * | 2015-01-19 | 2021-05-25 | Aquifi, Inc. | Multiple camera system with auto recalibration |
US9686520B2 (en) | 2015-01-22 | 2017-06-20 | Microsoft Technology Licensing, Llc | Reconstructing viewport upon user viewpoint misprediction |
US9661312B2 (en) * | 2015-01-22 | 2017-05-23 | Microsoft Technology Licensing, Llc | Synthesizing second eye viewport using interleaving |
KR20170127505A (en) | 2015-03-01 | 2017-11-21 | 넥스트브이알 인코포레이티드 | Methods and apparatus for performing environmental measurements and / or using these measurements in 3D image rendering |
EP3070942B1 (en) * | 2015-03-17 | 2023-11-22 | InterDigital CE Patent Holdings | Method and apparatus for displaying light field video data |
US10878278B1 (en) * | 2015-05-16 | 2020-12-29 | Sturfee, Inc. | Geo-localization based on remotely sensed visual features |
WO2016198059A1 (en) * | 2015-06-11 | 2016-12-15 | Conti Temic Microelectronic Gmbh | Method for generating a virtual image of vehicle surroundings |
US9460513B1 (en) | 2015-06-17 | 2016-10-04 | Mitsubishi Electric Research Laboratories, Inc. | Method for reconstructing a 3D scene as a 3D model using images acquired by 3D sensors and omnidirectional cameras |
US10554713B2 (en) | 2015-06-19 | 2020-02-04 | Microsoft Technology Licensing, Llc | Low latency application streaming using temporal frame transformation |
KR101835434B1 (en) * | 2015-07-08 | 2018-03-09 | 고려대학교 산학협력단 | Method and Apparatus for generating a protection image, Method for mapping between image pixel and depth value |
US9848212B2 (en) * | 2015-07-10 | 2017-12-19 | Futurewei Technologies, Inc. | Multi-view video streaming with fast and smooth view switch |
US10701318B2 (en) | 2015-08-14 | 2020-06-30 | Pcms Holdings, Inc. | System and method for augmented reality multi-view telepresence |
GB2543776B (en) * | 2015-10-27 | 2019-02-06 | Imagination Tech Ltd | Systems and methods for processing images of objects |
US10757394B1 (en) * | 2015-11-09 | 2020-08-25 | Cognex Corporation | System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance |
US11562502B2 (en) * | 2015-11-09 | 2023-01-24 | Cognex Corporation | System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance |
US10812778B1 (en) | 2015-11-09 | 2020-10-20 | Cognex Corporation | System and method for calibrating one or more 3D sensors mounted on a moving manipulator |
US20180374239A1 (en) * | 2015-11-09 | 2018-12-27 | Cognex Corporation | System and method for field calibration of a vision system imaging two opposite sides of a calibration object |
WO2017100487A1 (en) * | 2015-12-11 | 2017-06-15 | Jingyi Yu | Method and system for image-based image rendering using a multi-camera and depth camera array |
US10352689B2 (en) | 2016-01-28 | 2019-07-16 | Symbol Technologies, Llc | Methods and systems for high precision locationing with depth values |
US10145955B2 (en) | 2016-02-04 | 2018-12-04 | Symbol Technologies, Llc | Methods and systems for processing point-cloud data with a line scanner |
KR20170095030A (en) * | 2016-02-12 | 2017-08-22 | 삼성전자주식회사 | Scheme for supporting virtual reality content display in communication system |
CN107097698B (en) * | 2016-02-22 | 2021-10-01 | 福特环球技术公司 | Inflatable airbag system for vehicle seat, seat assembly and adjustment method thereof |
US11567201B2 (en) | 2016-03-11 | 2023-01-31 | Kaarta, Inc. | Laser scanner with real-time, online ego-motion estimation |
US11573325B2 (en) | 2016-03-11 | 2023-02-07 | Kaarta, Inc. | Systems and methods for improvements in scanning and mapping |
US10989542B2 (en) | 2016-03-11 | 2021-04-27 | Kaarta, Inc. | Aligning measured signal data with slam localization data and uses thereof |
JP6987797B2 (en) | 2016-03-11 | 2022-01-05 | カールタ インコーポレイテッド | Laser scanner with real-time online egomotion estimation |
US10721451B2 (en) | 2016-03-23 | 2020-07-21 | Symbol Technologies, Llc | Arrangement for, and method of, loading freight into a shipping container |
US9965870B2 (en) | 2016-03-29 | 2018-05-08 | Institut National D'optique | Camera calibration method using a calibration target |
WO2017172528A1 (en) | 2016-04-01 | 2017-10-05 | Pcms Holdings, Inc. | Apparatus and method for supporting interactive augmented reality functionalities |
US9805240B1 (en) | 2016-04-18 | 2017-10-31 | Symbol Technologies, Llc | Barcode scanning and dimensioning |
CN107341768B (en) * | 2016-04-29 | 2022-03-11 | 微软技术许可有限责任公司 | Grid noise reduction |
WO2017197114A1 (en) | 2016-05-11 | 2017-11-16 | Affera, Inc. | Anatomical model generation |
US11728026B2 (en) | 2016-05-12 | 2023-08-15 | Affera, Inc. | Three-dimensional cardiac representation |
EP3264759A1 (en) | 2016-06-30 | 2018-01-03 | Thomson Licensing | An apparatus and a method for generating data representative of a pixel beam |
US10192345B2 (en) * | 2016-07-19 | 2019-01-29 | Qualcomm Incorporated | Systems and methods for improved surface normal estimation |
WO2018022882A1 (en) * | 2016-07-27 | 2018-02-01 | R-Stor Inc. | Method and apparatus for bonding communication technologies |
US10574909B2 (en) | 2016-08-08 | 2020-02-25 | Microsoft Technology Licensing, Llc | Hybrid imaging sensor for structured light object capture |
US10776661B2 (en) | 2016-08-19 | 2020-09-15 | Symbol Technologies, Llc | Methods, systems and apparatus for segmenting and dimensioning objects |
US9980078B2 (en) | 2016-10-14 | 2018-05-22 | Nokia Technologies Oy | Audio object modification in free-viewpoint rendering |
US10229533B2 (en) * | 2016-11-03 | 2019-03-12 | Mitsubishi Electric Research Laboratories, Inc. | Methods and systems for fast resampling method and apparatus for point cloud data |
US11042161B2 (en) | 2016-11-16 | 2021-06-22 | Symbol Technologies, Llc | Navigation control method and apparatus in a mobile automation system |
US10451405B2 (en) | 2016-11-22 | 2019-10-22 | Symbol Technologies, Llc | Dimensioning system for, and method of, dimensioning freight in motion along an unconstrained path in a venue |
JP6948171B2 (en) * | 2016-11-30 | 2021-10-13 | キヤノン株式会社 | Image processing equipment and image processing methods, programs |
WO2018100928A1 (en) | 2016-11-30 | 2018-06-07 | キヤノン株式会社 | Image processing device and method |
EP3336801A1 (en) * | 2016-12-19 | 2018-06-20 | Thomson Licensing | Method and apparatus for constructing lighting environment representations of 3d scenes |
US10354411B2 (en) | 2016-12-20 | 2019-07-16 | Symbol Technologies, Llc | Methods, systems and apparatus for segmenting objects |
WO2018123801A1 (en) * | 2016-12-28 | 2018-07-05 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Three-dimensional model distribution method, three-dimensional model receiving method, three-dimensional model distribution device, and three-dimensional model receiving device |
US11096004B2 (en) * | 2017-01-23 | 2021-08-17 | Nokia Technologies Oy | Spatial audio rendering point extension |
US11665308B2 (en) | 2017-01-31 | 2023-05-30 | Tetavi, Ltd. | System and method for rendering free viewpoint video for sport applications |
WO2018147329A1 (en) * | 2017-02-10 | 2018-08-16 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Free-viewpoint image generation method and free-viewpoint image generation system |
JP7086522B2 (en) * | 2017-02-28 | 2022-06-20 | キヤノン株式会社 | Image processing equipment, information processing methods and programs |
US10531219B2 (en) | 2017-03-20 | 2020-01-07 | Nokia Technologies Oy | Smooth rendering of overlapping audio-object interactions |
WO2018172614A1 (en) | 2017-03-22 | 2018-09-27 | Nokia Technologies Oy | A method and an apparatus and a computer program product for adaptive streaming |
US10726574B2 (en) | 2017-04-11 | 2020-07-28 | Dolby Laboratories Licensing Corporation | Passive multi-wearable-devices tracking |
JP6922369B2 (en) * | 2017-04-14 | 2021-08-18 | 富士通株式会社 | Viewpoint selection support program, viewpoint selection support method and viewpoint selection support device |
US10939038B2 (en) * | 2017-04-24 | 2021-03-02 | Intel Corporation | Object pre-encoding for 360-degree view for optimal quality and latency |
US11367092B2 (en) | 2017-05-01 | 2022-06-21 | Symbol Technologies, Llc | Method and apparatus for extracting and processing price text from an image set |
US11978011B2 (en) | 2017-05-01 | 2024-05-07 | Symbol Technologies, Llc | Method and apparatus for object status detection |
US10591918B2 (en) | 2017-05-01 | 2020-03-17 | Symbol Technologies, Llc | Fixed segmented lattice planning for a mobile automation apparatus |
US10663590B2 (en) | 2017-05-01 | 2020-05-26 | Symbol Technologies, Llc | Device and method for merging lidar data |
US11449059B2 (en) | 2017-05-01 | 2022-09-20 | Symbol Technologies, Llc | Obstacle detection for a mobile automation apparatus |
WO2018204342A1 (en) | 2017-05-01 | 2018-11-08 | Symbol Technologies, Llc | Product status detection system |
US10726273B2 (en) | 2017-05-01 | 2020-07-28 | Symbol Technologies, Llc | Method and apparatus for shelf feature and object placement detection from shelf images |
US10949798B2 (en) | 2017-05-01 | 2021-03-16 | Symbol Technologies, Llc | Multimodal localization and mapping for a mobile automation apparatus |
US11074036B2 (en) | 2017-05-05 | 2021-07-27 | Nokia Technologies Oy | Metadata-free audio-object interactions |
WO2018201423A1 (en) | 2017-05-05 | 2018-11-08 | Symbol Technologies, Llc | Method and apparatus for detecting and interpreting price label text |
US10165386B2 (en) | 2017-05-16 | 2018-12-25 | Nokia Technologies Oy | VR audio superzoom |
US10154176B1 (en) * | 2017-05-30 | 2018-12-11 | Intel Corporation | Calibrating depth cameras using natural objects with expected shapes |
EP3593323B1 (en) * | 2017-06-07 | 2020-08-05 | Google LLC | High speed, high-fidelity face tracking |
US10841537B2 (en) | 2017-06-09 | 2020-11-17 | Pcms Holdings, Inc. | Spatially faithful telepresence supporting varying geometries and moving users |
BR102017012517A2 (en) * | 2017-06-12 | 2018-12-26 | Samsung Eletrônica da Amazônia Ltda. | method for 360 ° media display or bubble interface |
WO2019003953A1 (en) | 2017-06-29 | 2019-01-03 | ソニー株式会社 | Image processing apparatus and image processing method |
JP6948175B2 (en) * | 2017-07-06 | 2021-10-13 | キヤノン株式会社 | Image processing device and its control method |
US11049218B2 (en) | 2017-08-11 | 2021-06-29 | Samsung Electronics Company, Ltd. | Seamless image stitching |
WO2019034808A1 (en) | 2017-08-15 | 2019-02-21 | Nokia Technologies Oy | Encoding and decoding of volumetric video |
EP3669333B1 (en) * | 2017-08-15 | 2024-05-01 | Nokia Technologies Oy | Sequential encoding and decoding of volymetric video |
US11290758B2 (en) * | 2017-08-30 | 2022-03-29 | Samsung Electronics Co., Ltd. | Method and apparatus of point-cloud streaming |
JP6409107B1 (en) * | 2017-09-06 | 2018-10-17 | キヤノン株式会社 | Information processing apparatus, information processing method, and program |
US10521914B2 (en) | 2017-09-07 | 2019-12-31 | Symbol Technologies, Llc | Multi-sensor object recognition system and method |
US10572763B2 (en) | 2017-09-07 | 2020-02-25 | Symbol Technologies, Llc | Method and apparatus for support surface edge detection |
US10897269B2 (en) | 2017-09-14 | 2021-01-19 | Apple Inc. | Hierarchical point cloud compression |
US10861196B2 (en) | 2017-09-14 | 2020-12-08 | Apple Inc. | Point cloud compression |
US11818401B2 (en) | 2017-09-14 | 2023-11-14 | Apple Inc. | Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables |
US11113845B2 (en) | 2017-09-18 | 2021-09-07 | Apple Inc. | Point cloud compression using non-cubic projections and masks |
US10909725B2 (en) | 2017-09-18 | 2021-02-02 | Apple Inc. | Point cloud compression |
JP6433559B1 (en) | 2017-09-19 | 2018-12-05 | キヤノン株式会社 | Providing device, providing method, and program |
CN107610182B (en) * | 2017-09-22 | 2018-09-11 | 哈尔滨工业大学 | A kind of scaling method at light-field camera microlens array center |
JP6425780B1 (en) | 2017-09-22 | 2018-11-21 | キヤノン株式会社 | Image processing system, image processing apparatus, image processing method and program |
US11395087B2 (en) | 2017-09-29 | 2022-07-19 | Nokia Technologies Oy | Level-based audio-object interactions |
EP3467777A1 (en) * | 2017-10-06 | 2019-04-10 | Thomson Licensing | A method and apparatus for encoding/decoding the colors of a point cloud representing a 3d object |
WO2019099605A1 (en) | 2017-11-17 | 2019-05-23 | Kaarta, Inc. | Methods and systems for geo-referencing mapping systems |
US10607373B2 (en) | 2017-11-22 | 2020-03-31 | Apple Inc. | Point cloud compression with closed-loop color conversion |
US10951879B2 (en) | 2017-12-04 | 2021-03-16 | Canon Kabushiki Kaisha | Method, system and apparatus for capture of image data for free viewpoint video |
WO2019123547A1 (en) * | 2017-12-19 | 2019-06-27 | 株式会社ソニー・インタラクティブエンタテインメント | Image generator, reference image data generator, image generation method, and reference image data generation method |
WO2019151569A1 (en) | 2018-01-30 | 2019-08-08 | 가이아쓰리디 주식회사 | Method for providing three-dimensional geographic information system web service |
US10417806B2 (en) * | 2018-02-15 | 2019-09-17 | JJK Holdings, LLC | Dynamic local temporal-consistent textured mesh compression |
JP2019144958A (en) * | 2018-02-22 | 2019-08-29 | キヤノン株式会社 | Image processing device, image processing method, and program |
WO2019165194A1 (en) | 2018-02-23 | 2019-08-29 | Kaarta, Inc. | Methods and systems for processing and colorizing point clouds and meshes |
US10542368B2 (en) | 2018-03-27 | 2020-01-21 | Nokia Technologies Oy | Audio content modification for playback audio |
WO2019195270A1 (en) | 2018-04-03 | 2019-10-10 | Kaarta, Inc. | Methods and systems for real or near real-time point cloud map data confidence evaluation |
JP6965439B2 (en) | 2018-04-04 | 2021-11-10 | 株式会社ソニー・インタラクティブエンタテインメント | Reference image generator, display image generator, reference image generation method, and display image generation method |
US10832436B2 (en) | 2018-04-05 | 2020-11-10 | Symbol Technologies, Llc | Method, system and apparatus for recovering label positions |
US11327504B2 (en) | 2018-04-05 | 2022-05-10 | Symbol Technologies, Llc | Method, system and apparatus for mobile automation apparatus localization |
US10809078B2 (en) | 2018-04-05 | 2020-10-20 | Symbol Technologies, Llc | Method, system and apparatus for dynamic path generation |
US10823572B2 (en) | 2018-04-05 | 2020-11-03 | Symbol Technologies, Llc | Method, system and apparatus for generating navigational data |
US10740911B2 (en) | 2018-04-05 | 2020-08-11 | Symbol Technologies, Llc | Method, system and apparatus for correcting translucency artifacts in data representing a support structure |
US11010928B2 (en) | 2018-04-10 | 2021-05-18 | Apple Inc. | Adaptive distance based point cloud compression |
US10909726B2 (en) | 2018-04-10 | 2021-02-02 | Apple Inc. | Point cloud compression |
US10867414B2 (en) | 2018-04-10 | 2020-12-15 | Apple Inc. | Point cloud attribute transfer algorithm |
US10939129B2 (en) | 2018-04-10 | 2021-03-02 | Apple Inc. | Point cloud compression |
US10909727B2 (en) | 2018-04-10 | 2021-02-02 | Apple Inc. | Hierarchical point cloud compression with smoothing |
US11017566B1 (en) | 2018-07-02 | 2021-05-25 | Apple Inc. | Point cloud compression with adaptive filtering |
US11202098B2 (en) | 2018-07-05 | 2021-12-14 | Apple Inc. | Point cloud compression with multi-resolution video encoding |
WO2020009826A1 (en) | 2018-07-05 | 2020-01-09 | Kaarta, Inc. | Methods and systems for auto-leveling of point clouds and 3d models |
US11012713B2 (en) | 2018-07-12 | 2021-05-18 | Apple Inc. | Bit stream structure for compressed point cloud data |
US11367224B2 (en) | 2018-10-02 | 2022-06-21 | Apple Inc. | Occupancy map block-to-patch information compression |
US11010920B2 (en) | 2018-10-05 | 2021-05-18 | Zebra Technologies Corporation | Method, system and apparatus for object detection in point clouds |
US11506483B2 (en) | 2018-10-05 | 2022-11-22 | Zebra Technologies Corporation | Method, system and apparatus for support structure depth determination |
US11430155B2 (en) | 2018-10-05 | 2022-08-30 | Apple Inc. | Quantized depths for projection point cloud compression |
US11003188B2 (en) | 2018-11-13 | 2021-05-11 | Zebra Technologies Corporation | Method, system and apparatus for obstacle handling in navigational path generation |
US11090811B2 (en) | 2018-11-13 | 2021-08-17 | Zebra Technologies Corporation | Method and apparatus for labeling of support structures |
WO2020103040A1 (en) * | 2018-11-21 | 2020-05-28 | Boe Technology Group Co., Ltd. | A method for generating and displaying panorama images based on rendering engine and a display apparatus |
US11079240B2 (en) | 2018-12-07 | 2021-08-03 | Zebra Technologies Corporation | Method, system and apparatus for adaptive particle filter localization |
US11416000B2 (en) | 2018-12-07 | 2022-08-16 | Zebra Technologies Corporation | Method and apparatus for navigational ray tracing |
US11100303B2 (en) | 2018-12-10 | 2021-08-24 | Zebra Technologies Corporation | Method, system and apparatus for auxiliary label detection and association |
US11423572B2 (en) | 2018-12-12 | 2022-08-23 | Analog Devices, Inc. | Built-in calibration of time-of-flight depth imaging systems |
US11015938B2 (en) | 2018-12-12 | 2021-05-25 | Zebra Technologies Corporation | Method, system and apparatus for navigational assistance |
US10731970B2 (en) | 2018-12-13 | 2020-08-04 | Zebra Technologies Corporation | Method, system and apparatus for support structure detection |
WO2020122675A1 (en) * | 2018-12-13 | 2020-06-18 | 삼성전자주식회사 | Method, device, and computer-readable recording medium for compressing 3d mesh content |
US10818077B2 (en) | 2018-12-14 | 2020-10-27 | Canon Kabushiki Kaisha | Method, system and apparatus for controlling a virtual camera |
CA3028708A1 (en) | 2018-12-28 | 2020-06-28 | Zih Corp. | Method, system and apparatus for dynamic loop closure in mapping trajectories |
WO2020154543A1 (en) | 2019-01-23 | 2020-07-30 | Affera, Inc. | Systems and methods for therapy annotation |
JP7211835B2 (en) * | 2019-02-04 | 2023-01-24 | i-PRO株式会社 | IMAGING SYSTEM AND SYNCHRONIZATION CONTROL METHOD |
WO2020164044A1 (en) * | 2019-02-14 | 2020-08-20 | 北京大学深圳研究生院 | Free-viewpoint image synthesis method, device, and apparatus |
JP6647433B1 (en) * | 2019-02-19 | 2020-02-14 | 株式会社メディア工房 | Point cloud data communication system, point cloud data transmission device, and point cloud data transmission method |
US10797090B2 (en) | 2019-02-27 | 2020-10-06 | Semiconductor Components Industries, Llc | Image sensor with near-infrared and visible light phase detection pixels |
US11257283B2 (en) | 2019-03-07 | 2022-02-22 | Alibaba Group Holding Limited | Image reconstruction method, system, device and computer-readable storage medium |
US11057564B2 (en) | 2019-03-28 | 2021-07-06 | Apple Inc. | Multiple layer flexure for supporting a moving image sensor |
JP7479793B2 (en) * | 2019-04-11 | 2024-05-09 | キヤノン株式会社 | Image processing device, system for generating virtual viewpoint video, and method and program for controlling the image processing device |
US11200677B2 (en) | 2019-06-03 | 2021-12-14 | Zebra Technologies Corporation | Method, system and apparatus for shelf edge detection |
US11151743B2 (en) | 2019-06-03 | 2021-10-19 | Zebra Technologies Corporation | Method, system and apparatus for end of aisle detection |
US11341663B2 (en) | 2019-06-03 | 2022-05-24 | Zebra Technologies Corporation | Method, system and apparatus for detecting support structure obstructions |
US11402846B2 (en) | 2019-06-03 | 2022-08-02 | Zebra Technologies Corporation | Method, system and apparatus for mitigating data capture light leakage |
US11662739B2 (en) | 2019-06-03 | 2023-05-30 | Zebra Technologies Corporation | Method, system and apparatus for adaptive ceiling-based localization |
US11960286B2 (en) | 2019-06-03 | 2024-04-16 | Zebra Technologies Corporation | Method, system and apparatus for dynamic task sequencing |
US11080566B2 (en) | 2019-06-03 | 2021-08-03 | Zebra Technologies Corporation | Method, system and apparatus for gap detection in support structures with peg regions |
US11711544B2 (en) | 2019-07-02 | 2023-07-25 | Apple Inc. | Point cloud compression with supplemental information messages |
CN110624220B (en) * | 2019-09-04 | 2021-05-04 | 福建师范大学 | How to Obtain the Optimal Standing Long Jump Technical Template |
EP3821267A4 (en) | 2019-09-17 | 2022-04-13 | Boston Polarimetrics, Inc. | Systems and methods for surface modeling using polarization cues |
US11562507B2 (en) | 2019-09-27 | 2023-01-24 | Apple Inc. | Point cloud compression using video encoding with time consistent patches |
US11627314B2 (en) | 2019-09-27 | 2023-04-11 | Apple Inc. | Video-based point cloud compression with non-normative smoothing |
WO2021063271A1 (en) * | 2019-09-30 | 2021-04-08 | Oppo广东移动通信有限公司 | Human body model reconstruction method and reconstruction system, and storage medium |
US11538196B2 (en) | 2019-10-02 | 2022-12-27 | Apple Inc. | Predictive coding for point cloud compression |
US11895307B2 (en) | 2019-10-04 | 2024-02-06 | Apple Inc. | Block-based predictive coding for point cloud compression |
EP4042366A4 (en) | 2019-10-07 | 2023-11-15 | Boston Polarimetrics, Inc. | Systems and methods for augmentation of sensor systems and imaging systems with polarization |
US11315326B2 (en) * | 2019-10-15 | 2022-04-26 | At&T Intellectual Property I, L.P. | Extended reality anchor caching based on viewport prediction |
CN110769241B (en) * | 2019-11-05 | 2022-02-01 | 广州虎牙科技有限公司 | Video frame processing method and device, user side and storage medium |
WO2021108002A1 (en) | 2019-11-30 | 2021-06-03 | Boston Polarimetrics, Inc. | Systems and methods for transparent object segmentation using polarization cues |
US11507103B2 (en) | 2019-12-04 | 2022-11-22 | Zebra Technologies Corporation | Method, system and apparatus for localization-based historical obstacle handling |
US11107238B2 (en) | 2019-12-13 | 2021-08-31 | Zebra Technologies Corporation | Method, system and apparatus for detecting item facings |
US11734873B2 (en) | 2019-12-13 | 2023-08-22 | Sony Group Corporation | Real-time volumetric visualization of 2-D images |
US11798196B2 (en) | 2020-01-08 | 2023-10-24 | Apple Inc. | Video-based point cloud compression with predicted patches |
US11625866B2 (en) | 2020-01-09 | 2023-04-11 | Apple Inc. | Geometry encoding using octrees and predictive trees |
US11195303B2 (en) | 2020-01-29 | 2021-12-07 | Boston Polarimetrics, Inc. | Systems and methods for characterizing object pose detection and measurement systems |
CN115428028A (en) | 2020-01-30 | 2022-12-02 | 因思创新有限责任公司 | System and method for synthesizing data for training statistical models in different imaging modalities including polarized images |
US11240465B2 (en) | 2020-02-21 | 2022-02-01 | Alibaba Group Holding Limited | System and method to use decoder information in video super resolution |
US11430179B2 (en) * | 2020-02-24 | 2022-08-30 | Microsoft Technology Licensing, Llc | Depth buffer dilation for remote rendering |
US11822333B2 (en) | 2020-03-30 | 2023-11-21 | Zebra Technologies Corporation | Method, system and apparatus for data capture illumination control |
US11953700B2 (en) | 2020-05-27 | 2024-04-09 | Intrinsic Innovation Llc | Multi-aperture polarization optical systems using beam splitters |
US11776205B2 (en) * | 2020-06-09 | 2023-10-03 | Ptc Inc. | Determination of interactions with predefined volumes of space based on automated analysis of volumetric video |
US11620768B2 (en) | 2020-06-24 | 2023-04-04 | Apple Inc. | Point cloud geometry compression using octrees with multiple scan orders |
US11615557B2 (en) | 2020-06-24 | 2023-03-28 | Apple Inc. | Point cloud compression using octrees with slicing |
US11450024B2 (en) | 2020-07-17 | 2022-09-20 | Zebra Technologies Corporation | Mixed depth object detection |
US11875452B2 (en) * | 2020-08-18 | 2024-01-16 | Qualcomm Incorporated | Billboard layers in object-space rendering |
US11748918B1 (en) * | 2020-09-25 | 2023-09-05 | Apple Inc. | Synthesized camera arrays for rendering novel viewpoints |
WO2022076020A1 (en) * | 2020-10-08 | 2022-04-14 | Google Llc | Few-shot synthesis of talking heads |
US11593915B2 (en) | 2020-10-21 | 2023-02-28 | Zebra Technologies Corporation | Parallax-tolerant panoramic image generation |
US11392891B2 (en) | 2020-11-03 | 2022-07-19 | Zebra Technologies Corporation | Item placement detection and optimization in material handling systems |
US11847832B2 (en) | 2020-11-11 | 2023-12-19 | Zebra Technologies Corporation | Object classification for autonomous navigation systems |
US11527014B2 (en) * | 2020-11-24 | 2022-12-13 | Verizon Patent And Licensing Inc. | Methods and systems for calibrating surface data capture devices |
US11874415B2 (en) * | 2020-12-22 | 2024-01-16 | International Business Machines Corporation | Earthquake detection and response via distributed visual input |
US11703457B2 (en) * | 2020-12-29 | 2023-07-18 | Industrial Technology Research Institute | Structure diagnosis system and structure diagnosis method |
US12020455B2 (en) | 2021-03-10 | 2024-06-25 | Intrinsic Innovation Llc | Systems and methods for high dynamic range image reconstruction |
US12069227B2 (en) | 2021-03-10 | 2024-08-20 | Intrinsic Innovation Llc | Multi-modal and multi-spectral stereo camera arrays |
US11651538B2 (en) * | 2021-03-17 | 2023-05-16 | International Business Machines Corporation | Generating 3D videos from 2D models |
US11948338B1 (en) | 2021-03-29 | 2024-04-02 | Apple Inc. | 3D volumetric content encoding using 2D videos and simplified 3D meshes |
US11290658B1 (en) | 2021-04-15 | 2022-03-29 | Boston Polarimetrics, Inc. | Systems and methods for camera exposure control |
US11954886B2 (en) | 2021-04-15 | 2024-04-09 | Intrinsic Innovation Llc | Systems and methods for six-degree of freedom pose estimation of deformable objects |
US12067746B2 (en) | 2021-05-07 | 2024-08-20 | Intrinsic Innovation Llc | Systems and methods for using computer vision to pick up small objects |
US11954882B2 (en) | 2021-06-17 | 2024-04-09 | Zebra Technologies Corporation | Feature-based georegistration for mobile computing devices |
US12175741B2 (en) | 2021-06-22 | 2024-12-24 | Intrinsic Innovation Llc | Systems and methods for a vision guided end effector |
US12172310B2 (en) | 2021-06-29 | 2024-12-24 | Intrinsic Innovation Llc | Systems and methods for picking objects using 3-D geometry and segmentation |
US11689813B2 (en) | 2021-07-01 | 2023-06-27 | Intrinsic Innovation Llc | Systems and methods for high dynamic range imaging using crossed polarizers |
US12293535B2 (en) | 2021-08-03 | 2025-05-06 | Intrinsic Innovation Llc | Systems and methods for training pose estimators in computer vision |
CN113761238B (en) * | 2021-08-27 | 2022-08-23 | 广州文远知行科技有限公司 | Point cloud storage method, device, equipment and storage medium |
US12254556B2 (en) | 2021-09-02 | 2025-03-18 | Nvidia Corporation | Techniques for rendering signed distance functions |
US11887245B2 (en) * | 2021-09-02 | 2024-01-30 | Nvidia Corporation | Techniques for rendering signed distance functions |
CN113905221B (en) * | 2021-09-30 | 2024-01-16 | 福州大学 | Stereoscopic panoramic video asymmetric transport stream self-adaption method and system |
CN114355287B (en) * | 2022-01-04 | 2023-08-15 | 湖南大学 | Ultra-short baseline underwater sound distance measurement method and system |
WO2023159180A1 (en) * | 2022-02-17 | 2023-08-24 | Nutech Ventures | Single-pass 3d reconstruction of internal surface of pipelines using depth camera array |
CN116800947A (en) * | 2022-03-16 | 2023-09-22 | 安霸国际有限合伙企业 | Rapid RGB-IR calibration verification for mass production process |
KR20250030499A (en) * | 2022-07-01 | 2025-03-05 | 구글 엘엘씨 | 3D video highlights from camera sources |
US12277733B2 (en) * | 2022-12-05 | 2025-04-15 | Verizon Patent And Licensing Inc. | Calibration methods and systems for an under-calibrated camera capturing a scene |
WO2024144805A1 (en) * | 2022-12-29 | 2024-07-04 | Innopeak Technology, Inc. | Methods and systems for image processing with eye gaze redirection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5926400A (en) * | 1996-11-21 | 1999-07-20 | Intel Corporation | Apparatus and method for determining the intensity of a sound in a virtual world |
US20120155680A1 (en) * | 2010-12-17 | 2012-06-21 | Microsoft Corporation | Virtual audio environment for multidimensional conferencing |
US8411126B2 (en) * | 2010-06-24 | 2013-04-02 | Hewlett-Packard Development Company, L.P. | Methods and systems for close proximity spatial audio rendering |
Family Cites Families (106)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5602903A (en) | 1994-09-28 | 1997-02-11 | Us West Technologies, Inc. | Positioning system and method |
US6327381B1 (en) | 1994-12-29 | 2001-12-04 | Worldscape, Llc | Image transformation and synthesis methods |
US5850352A (en) | 1995-03-31 | 1998-12-15 | The Regents Of The University Of California | Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images |
JP3461980B2 (en) | 1995-08-25 | 2003-10-27 | 株式会社東芝 | High-speed drawing method and apparatus |
US6163337A (en) | 1996-04-05 | 2000-12-19 | Matsushita Electric Industrial Co., Ltd. | Multi-view point image transmission method and multi-view point image display method |
US6064771A (en) | 1997-06-23 | 2000-05-16 | Real-Time Geometry Corp. | System and method for asynchronous, adaptive moving picture compression, and decompression |
US6072496A (en) | 1998-06-08 | 2000-06-06 | Microsoft Corporation | Method and system for capturing and representing 3D geometry, color and shading of facial expressions and other animated objects |
US6226003B1 (en) | 1998-08-11 | 2001-05-01 | Silicon Graphics, Inc. | Method for rendering silhouette and true edges of 3-D line drawings with occlusion |
US6556199B1 (en) | 1999-08-11 | 2003-04-29 | Advanced Research And Technology Institute | Method and apparatus for fast voxelization of volumetric models |
US6509902B1 (en) | 2000-02-28 | 2003-01-21 | Mitsubishi Electric Research Laboratories, Inc. | Texture filtering for surface elements |
US7522186B2 (en) | 2000-03-07 | 2009-04-21 | L-3 Communications Corporation | Method and apparatus for providing immersive surveillance |
US6968299B1 (en) | 2000-04-14 | 2005-11-22 | International Business Machines Corporation | Method and apparatus for reconstructing a surface using a ball-pivoting algorithm |
US6750873B1 (en) | 2000-06-27 | 2004-06-15 | International Business Machines Corporation | High quality texture reconstruction from multiple scans |
US7538764B2 (en) | 2001-01-05 | 2009-05-26 | Interuniversitair Micro-Elektronica Centrum (Imec) | System and method to obtain surface structures of multi-dimensional objects, and to represent those surface structures for animation, transmission and display |
US6919906B2 (en) | 2001-05-08 | 2005-07-19 | Microsoft Corporation | Discontinuity edge overdraw |
GB2378337B (en) | 2001-06-11 | 2005-04-13 | Canon Kk | 3D Computer modelling apparatus |
US7909696B2 (en) | 2001-08-09 | 2011-03-22 | Igt | Game interaction in 3-D gaming environments |
US6990681B2 (en) | 2001-08-09 | 2006-01-24 | Sony Corporation | Enhancing broadcast of an event with synthetic scene using a depth map |
US6781591B2 (en) | 2001-08-15 | 2004-08-24 | Mitsubishi Electric Research Laboratories, Inc. | Blending multiple images using local and global information |
US7023432B2 (en) | 2001-09-24 | 2006-04-04 | Geomagic, Inc. | Methods, apparatus and computer program products that reconstruct surfaces from data point sets |
US7096428B2 (en) | 2001-09-28 | 2006-08-22 | Fuji Xerox Co., Ltd. | Systems and methods for providing a spatially indexed panoramic video |
EP1473678A4 (en) | 2002-02-06 | 2008-02-13 | Digital Process Ltd | Three-dimensional shape displaying program, three-dimensional shape displaying method, and three-dimensional shape displaying device |
US20040217956A1 (en) | 2002-02-28 | 2004-11-04 | Paul Besl | Method and system for processing, compressing, streaming, and interactive rendering of 3D color image data |
US7515173B2 (en) | 2002-05-23 | 2009-04-07 | Microsoft Corporation | Head pose tracking system |
US7030875B2 (en) | 2002-09-04 | 2006-04-18 | Honda Motor Company Ltd. | Environmental reasoning using geometric data structure |
US7106358B2 (en) | 2002-12-30 | 2006-09-12 | Motorola, Inc. | Method, system and apparatus for telepresence communications |
US20050017969A1 (en) | 2003-05-27 | 2005-01-27 | Pradeep Sen | Computer graphics rendering using boundary information |
US7480401B2 (en) | 2003-06-23 | 2009-01-20 | Siemens Medical Solutions Usa, Inc. | Method for local surface smoothing with application to chest wall nodule segmentation in lung CT data |
US7321669B2 (en) * | 2003-07-10 | 2008-01-22 | Sarnoff Corporation | Method and apparatus for refining target position and size estimates using image and depth data |
GB2405775B (en) | 2003-09-05 | 2008-04-02 | Canon Europa Nv | 3D computer surface model generation |
US7184052B2 (en) | 2004-06-18 | 2007-02-27 | Microsoft Corporation | Real-time texture rendering using generalized displacement maps |
US7292257B2 (en) | 2004-06-28 | 2007-11-06 | Microsoft Corporation | Interactive viewpoint video system and process |
US20060023782A1 (en) | 2004-07-27 | 2006-02-02 | Microsoft Corporation | System and method for off-line multi-view video compression |
US7671893B2 (en) | 2004-07-27 | 2010-03-02 | Microsoft Corp. | System and method for interactive multi-view video |
US7142209B2 (en) | 2004-08-03 | 2006-11-28 | Microsoft Corporation | Real-time rendering system and process for interactive viewpoint video that was generated using overlapping images of a scene captured from viewpoints forming a grid |
US7561620B2 (en) | 2004-08-03 | 2009-07-14 | Microsoft Corporation | System and process for compressing and decompressing multiple, layered, video streams employing spatial and temporal encoding |
US7221366B2 (en) | 2004-08-03 | 2007-05-22 | Microsoft Corporation | Real-time rendering system and process for interactive viewpoint video |
US8477173B2 (en) | 2004-10-15 | 2013-07-02 | Lifesize Communications, Inc. | High definition videoconferencing system |
JPWO2006062199A1 (en) | 2004-12-10 | 2008-06-12 | 国立大学法人京都大学 | Three-dimensional image data compression apparatus, method, program, and recording medium |
WO2006084385A1 (en) | 2005-02-11 | 2006-08-17 | Macdonald Dettwiler & Associates Inc. | 3d imaging system |
DE102005023195A1 (en) | 2005-05-19 | 2006-11-23 | Siemens Ag | Method for expanding the display area of a volume recording of an object area |
US8228994B2 (en) | 2005-05-20 | 2012-07-24 | Microsoft Corporation | Multi-view video coding based on temporal and view decomposition |
WO2007005752A2 (en) | 2005-07-01 | 2007-01-11 | Dennis Christensen | Visual and aural perspective management for enhanced interactive video telepresence |
JP4595733B2 (en) | 2005-08-02 | 2010-12-08 | カシオ計算機株式会社 | Image processing device |
US7551232B2 (en) | 2005-11-14 | 2009-06-23 | Lsi Corporation | Noise adaptive 3D composite noise reduction |
US7623127B2 (en) | 2005-11-29 | 2009-11-24 | Siemens Medical Solutions Usa, Inc. | Method and apparatus for discrete mesh filleting and rounding through ball pivoting |
US7577491B2 (en) | 2005-11-30 | 2009-08-18 | General Electric Company | System and method for extracting parameters of a cutting tool |
KR100810268B1 (en) | 2006-04-06 | 2008-03-06 | 삼성전자주식회사 | Implementation Method for Color Weaknesses in Mobile Display Devices |
US7778491B2 (en) | 2006-04-10 | 2010-08-17 | Microsoft Corporation | Oblique image stitching |
US7679639B2 (en) | 2006-04-20 | 2010-03-16 | Cisco Technology, Inc. | System and method for enhancing eye gaze in a telepresence system |
EP1862969A1 (en) | 2006-06-02 | 2007-12-05 | Eidgenössische Technische Hochschule Zürich | Method and system for generating a representation of a dynamically changing 3D scene |
US20080043024A1 (en) | 2006-06-26 | 2008-02-21 | Siemens Corporate Research, Inc. | Method for reconstructing an object subject to a cone beam using a graphic processor unit (gpu) |
USD610105S1 (en) | 2006-07-10 | 2010-02-16 | Cisco Technology, Inc. | Telepresence system |
US20080095465A1 (en) | 2006-10-18 | 2008-04-24 | General Electric Company | Image registration system and method |
US8213711B2 (en) | 2007-04-03 | 2012-07-03 | Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Industry, Through The Communications Research Centre Canada | Method and graphical user interface for modifying depth maps |
GB0708676D0 (en) | 2007-05-04 | 2007-06-13 | Imec Inter Uni Micro Electr | A Method for real-time/on-line performing of multi view multimedia applications |
US8253770B2 (en) | 2007-05-31 | 2012-08-28 | Eastman Kodak Company | Residential video communication system |
US8063901B2 (en) | 2007-06-19 | 2011-11-22 | Siemens Aktiengesellschaft | Method and apparatus for efficient client-server visualization of multi-dimensional data |
JP4947593B2 (en) | 2007-07-31 | 2012-06-06 | Kddi株式会社 | Apparatus and program for generating free viewpoint image by local region segmentation |
US8223192B2 (en) | 2007-10-31 | 2012-07-17 | Technion Research And Development Foundation Ltd. | Free viewpoint video |
US8451265B2 (en) | 2007-11-16 | 2013-05-28 | Sportvision, Inc. | Virtual viewpoint animation |
US8160345B2 (en) | 2008-04-30 | 2012-04-17 | Otismed Corporation | System and method for image segmentation in generating computer models of a joint to undergo arthroplasty |
CN102016877B (en) * | 2008-02-27 | 2014-12-10 | 索尼计算机娱乐美国有限责任公司 | Methods for capturing depth data of a scene and applying computer actions |
TWI357582B (en) | 2008-04-18 | 2012-02-01 | Univ Nat Taiwan | Image tracking system and method thereof |
US8442355B2 (en) | 2008-05-23 | 2013-05-14 | Samsung Electronics Co., Ltd. | System and method for generating a multi-dimensional image |
US7840638B2 (en) | 2008-06-27 | 2010-11-23 | Microsoft Corporation | Participant positioning in multimedia conferencing |
US8106924B2 (en) | 2008-07-31 | 2012-01-31 | Stmicroelectronics S.R.L. | Method and system for video rendering, computer program product therefor |
WO2010023580A1 (en) | 2008-08-29 | 2010-03-04 | Koninklijke Philips Electronics, N.V. | Dynamic transfer of three-dimensional image data |
US20110169824A1 (en) | 2008-09-29 | 2011-07-14 | Nobutoshi Fujinami | 3d image processing device and method for reducing noise in 3d image processing device |
EP2327059B1 (en) | 2008-10-02 | 2014-08-27 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Intermediate view synthesis and multi-view data signal extraction |
US8200041B2 (en) | 2008-12-18 | 2012-06-12 | Intel Corporation | Hardware accelerated silhouette detection |
US8436852B2 (en) | 2009-02-09 | 2013-05-07 | Microsoft Corporation | Image editing consistent with scene geometry |
US8477175B2 (en) | 2009-03-09 | 2013-07-02 | Cisco Technology, Inc. | System and method for providing three dimensional imaging in a network environment |
JP5222205B2 (en) | 2009-04-03 | 2013-06-26 | Kddi株式会社 | Image processing apparatus, method, and program |
US20100259595A1 (en) | 2009-04-10 | 2010-10-14 | Nokia Corporation | Methods and Apparatuses for Efficient Streaming of Free View Point Video |
US8719309B2 (en) | 2009-04-14 | 2014-05-06 | Apple Inc. | Method and apparatus for media data transmission |
US8665259B2 (en) | 2009-04-16 | 2014-03-04 | Autodesk, Inc. | Multiscale three-dimensional navigation |
US8755569B2 (en) | 2009-05-29 | 2014-06-17 | University Of Central Florida Research Foundation, Inc. | Methods for recognizing pose and action of articulated objects with collection of planes in motion |
US8629866B2 (en) | 2009-06-18 | 2014-01-14 | International Business Machines Corporation | Computer method and apparatus providing interactive control and remote identity through in-world proxy |
US9648346B2 (en) | 2009-06-25 | 2017-05-09 | Microsoft Technology Licensing, Llc | Multi-view video compression and streaming based on viewpoints of remote viewer |
KR101070591B1 (en) * | 2009-06-25 | 2011-10-06 | (주)실리콘화일 | distance measuring apparatus having dual stereo camera |
US8194149B2 (en) | 2009-06-30 | 2012-06-05 | Cisco Technology, Inc. | Infrared-aided depth estimation |
US8633940B2 (en) | 2009-08-04 | 2014-01-21 | Broadcom Corporation | Method and system for texture compression in a system having an AVC decoder and a 3D engine |
US8908958B2 (en) | 2009-09-03 | 2014-12-09 | Ron Kimmel | Devices and methods of generating three dimensional (3D) colored models |
US8284237B2 (en) | 2009-09-09 | 2012-10-09 | Nokia Corporation | Rendering multiview content in a 3D video system |
US8441482B2 (en) | 2009-09-21 | 2013-05-14 | Caustic Graphics, Inc. | Systems and methods for self-intersection avoidance in ray tracing |
US20110084983A1 (en) | 2009-09-29 | 2011-04-14 | Wavelength & Resonance LLC | Systems and Methods for Interaction With a Virtual Environment |
US9154730B2 (en) | 2009-10-16 | 2015-10-06 | Hewlett-Packard Development Company, L.P. | System and method for determining the active talkers in a video conference |
US8537200B2 (en) | 2009-10-23 | 2013-09-17 | Qualcomm Incorporated | Depth map generation techniques for conversion of 2D video data to 3D video data |
CN102792699A (en) | 2009-11-23 | 2012-11-21 | 通用仪表公司 | Depth coding as an additional channel to video sequence |
US8487977B2 (en) | 2010-01-26 | 2013-07-16 | Polycom, Inc. | Method and apparatus to virtualize people with 3D effect into a remote room on a telepresence call for true in person experience |
US20110211749A1 (en) | 2010-02-28 | 2011-09-01 | Kar Han Tan | System And Method For Processing Video Using Depth Sensor Information |
US8898567B2 (en) | 2010-04-09 | 2014-11-25 | Nokia Corporation | Method and apparatus for generating a virtual interactive workspace |
EP2383696A1 (en) | 2010-04-30 | 2011-11-02 | LiberoVision AG | Method for estimating a pose of an articulated object model |
US20110304619A1 (en) | 2010-06-10 | 2011-12-15 | Autodesk, Inc. | Primitive quadric surface extraction from unorganized point cloud data |
KR20120011653A (en) * | 2010-07-29 | 2012-02-08 | 삼성전자주식회사 | Image processing apparatus and method |
US8659597B2 (en) | 2010-09-27 | 2014-02-25 | Intel Corporation | Multi-view ray tracing using edge detection and shader reuse |
US8787459B2 (en) | 2010-11-09 | 2014-07-22 | Sony Computer Entertainment Inc. | Video coding methods and apparatus |
US9123115B2 (en) * | 2010-11-23 | 2015-09-01 | Qualcomm Incorporated | Depth estimation based on global motion and optical flow |
US8867823B2 (en) * | 2010-12-03 | 2014-10-21 | National University Corporation Nagoya University | Virtual viewpoint image synthesizing method and virtual viewpoint image synthesizing system |
US8156239B1 (en) | 2011-03-09 | 2012-04-10 | Metropcs Wireless, Inc. | Adaptive multimedia renderer |
EP2707834B1 (en) | 2011-05-13 | 2020-06-24 | Vizrt Ag | Silhouette-based pose estimation |
US8867886B2 (en) | 2011-08-08 | 2014-10-21 | Roy Feinson | Surround video playback |
WO2013049388A1 (en) | 2011-09-29 | 2013-04-04 | Dolby Laboratories Licensing Corporation | Representation and coding of multi-view images using tapestry encoding |
US9830743B2 (en) | 2012-04-03 | 2017-11-28 | Autodesk, Inc. | Volume-preserving smoothing brush |
US9058706B2 (en) | 2012-04-30 | 2015-06-16 | Convoy Technologies Llc | Motor vehicle camera and monitoring system |
-
2012
- 2012-08-03 US US13/566,877 patent/US9846960B2/en active Active
- 2012-08-17 US US13/588,917 patent/US20130321586A1/en not_active Abandoned
- 2012-08-29 US US13/598,536 patent/US20130321593A1/en not_active Abandoned
- 2012-08-30 US US13/599,170 patent/US20130321396A1/en not_active Abandoned
- 2012-08-30 US US13/599,263 patent/US8917270B2/en active Active
- 2012-08-30 US US13/598,747 patent/US20130321575A1/en not_active Abandoned
- 2012-08-30 US US13/599,436 patent/US9251623B2/en active Active
- 2012-08-30 US US13/599,678 patent/US20130321566A1/en not_active Abandoned
- 2012-09-13 US US13/614,852 patent/US9256980B2/en active Active
-
2013
- 2013-03-08 US US13/790,158 patent/US20130321413A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5926400A (en) * | 1996-11-21 | 1999-07-20 | Intel Corporation | Apparatus and method for determining the intensity of a sound in a virtual world |
US8411126B2 (en) * | 2010-06-24 | 2013-04-02 | Hewlett-Packard Development Company, L.P. | Methods and systems for close proximity spatial audio rendering |
US20120155680A1 (en) * | 2010-12-17 | 2012-06-21 | Microsoft Corporation | Virtual audio environment for multidimensional conferencing |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9888333B2 (en) * | 2013-11-11 | 2018-02-06 | Google Technology Holdings LLC | Three-dimensional audio rendering techniques |
US10567185B2 (en) * | 2015-02-03 | 2020-02-18 | Dolby Laboratories Licensing Corporation | Post-conference playback system having higher perceived quality than originally heard in the conference |
CN108881784A (en) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Virtual scene implementation method, device, terminal and server |
US11081127B2 (en) | 2018-01-18 | 2021-08-03 | Samsung Electronics Co., Ltd. | Electronic device and control method therefor |
US20200145753A1 (en) * | 2018-11-01 | 2020-05-07 | Sennheiser Electronic Gmbh & Co. Kg | Conference System with a Microphone Array System and a Method of Speech Acquisition In a Conference System |
US10972835B2 (en) * | 2018-11-01 | 2021-04-06 | Sennheiser Electronic Gmbh & Co. Kg | Conference system with a microphone array system and a method of speech acquisition in a conference system |
CN109618122A (en) * | 2018-12-07 | 2019-04-12 | 合肥万户网络技术有限公司 | A kind of virtual office conference system |
WO2021077090A1 (en) * | 2019-10-18 | 2021-04-22 | Msg Entertainment Group, Llc. | Modifying audio according to visual images of a remote venue |
US11202162B2 (en) | 2019-10-18 | 2021-12-14 | Msg Entertainment Group, Llc | Synthesizing audio of a venue |
US11812251B2 (en) | 2019-10-18 | 2023-11-07 | Msg Entertainment Group, Llc | Synthesizing audio of a venue |
US12058510B2 (en) * | 2019-10-18 | 2024-08-06 | Sphere Entertainment Group, Llc | Mapping audio to visual images on a display device having a curved screen |
US12101623B2 (en) | 2019-10-18 | 2024-09-24 | Sphere Entertainment Group, Llc | Synthesizing audio of a venue |
US20240187553A1 (en) * | 2020-04-06 | 2024-06-06 | Eingot Llc | Integration of remote audio into a performance venue |
US12262145B2 (en) * | 2020-04-06 | 2025-03-25 | Eingot Llc | Integration of remote audio into a performance venue |
Also Published As
Publication number | Publication date |
---|---|
US20130321418A1 (en) | 2013-12-05 |
US8917270B2 (en) | 2014-12-23 |
US20130321589A1 (en) | 2013-12-05 |
US20130321396A1 (en) | 2013-12-05 |
US9846960B2 (en) | 2017-12-19 |
US20130321586A1 (en) | 2013-12-05 |
US20130321590A1 (en) | 2013-12-05 |
US20130321410A1 (en) | 2013-12-05 |
US9256980B2 (en) | 2016-02-09 |
US20130321575A1 (en) | 2013-12-05 |
US20130321593A1 (en) | 2013-12-05 |
US20130321413A1 (en) | 2013-12-05 |
US9251623B2 (en) | 2016-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130321566A1 (en) | Audio source positioning using a camera | |
US12205228B2 (en) | Re-creation of virtual environment through a video call | |
AU2022204210B2 (en) | Virtual and real object recording in mixed reality device | |
CN112205005B (en) | Adapting acoustic rendering to image-based objects | |
US10535181B2 (en) | Virtual viewpoint for a participant in an online communication | |
JP4059513B2 (en) | Method and system for communicating gaze in an immersive virtual environment | |
JP6285941B2 (en) | Controlled 3D communication endpoint | |
US20190026945A1 (en) | Real-time immersive mediated reality experiences | |
GB2543913A (en) | Virtual conference room | |
CA2924156A1 (en) | Method, system and apparatus for capture-based immersive telepresence in virtual environment | |
US12118667B2 (en) | Methods and systems for unified rendering of light and sound content for a simulated 3D environment | |
US11776227B1 (en) | Avatar background alteration | |
KR20210056414A (en) | System for controlling audio-enabled connected devices in mixed reality environments | |
CN112162638B (en) | Information processing method and server in Virtual Reality (VR) viewing | |
Thery et al. | Impact of the visual rendering system on subjective auralization assessment in VR | |
Wu et al. | Immersive 3D communication | |
Courgeon et al. | Life-Sized Audiovisual Spatial Social Scenes with Multiple Characters: MARC & SMART-I² | |
Ohkawara | Relightable and Interactive Portraits: Toward Communication Systems for Coexistence in Remote Society | |
WO2024009653A1 (en) | Information processing device, information processing method, and information processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SIMONNET, GUILLAUME;REEL/FRAME:028878/0294 Effective date: 20120827 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0541 Effective date: 20141014 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |