-
Notifications
You must be signed in to change notification settings - Fork 329
Description
This is another proposal for ways to import video frames into WebGPU. With the various methods outlined in #1380 and #1154 that's quite a lot of different proposals which cover a lot of the design space.
The difficulty with video frames is that inside browsers they come in many forms: RGB8 data, YUV 420, YUV 422, RGB16Float and many more formats. There's also metadata for each frame like its colorspace, clipping rectangle, space transform etc. The other challenge is that operations on video are done every frame of the video so small inefficiencies like additional copies and conversions can add up.#1411
This proposal is similar in its goal to #1380 in that it theoretically allows for videos in whatever format to be used with 0 copies in WebGPU. Instead of exposing all of the complexity of video frames to the application, it encapsulates it all in an opaque GPUVideoTexture
object. Internally in browsers the GPUVideoTexture
will most likely be a collection of GPUTextureViews
, one per plane of the source texture, and a uniform GPUBuffer
that contains all the metadata for the frame. This is the same concept as GL_OES_EGL_image_external
in which an external image in any format is exposed as an "opaque sampler".
Show me the WebIDL!
interface GPUVideoTexture : GPUObjectBase {
destroy();
};
// A GPUVideoTexture is imported by calling GPUDevice.importVideoFrame
// with either an HTMLVideoElement or a VideoFrame in the desciptor
// (or some other video-like object, like a WebRTC stream?)
dictionary GPUVideoTextureDescriptor : GPUObjectDescriptorBase {
required (HTMLVideoElement or VideoFrame) source;
};
partial interface GPUDevice {
GPUVideoTexture importVideoFrame(GPUVideoTextureDescriptor descriptor);
}
// BindGroupLayouts have a new "video texture" binding type.
dictionary GPUVideoTextureBindingLayout {
// nothing for now.
};
partial dictionary GPUBindGroupLayoutEntry {
GPUVideoTextureBindingLayout videoTexture;
};
// A GPUVideoTexture is a new type of GPUBindingResource.
typedef (... or GPUVideoTexture) GPUBindingResource;
Using this a GPUVideoTexture
on the JS side works as you'd expect:
const videoTexture = device.importVideoTexture({
source: myVideoElement
});
const layout = device.createBindGroupLayout({
entries: [{
binding: 0,
visibility: GPUShaderStage.FRAGMENT,
videoTexture: {}
}]
});
const bg = device.createBindGroup({
layout,
entries: [{
binding: 0,
resource: videoTexture,
}]
});
// Create a pipeline using `layout`, render using it and `bg`.
Show me the WGSL!
// Video textures are a new interface type, they act a bit like texture_2d<f32>:
// - You defined them as an interface variable
[[group(0), binding(0)]] var<uniform> video : video_texture;
// - You sample them with a sampler, but that's the only type of builtin!
fn loadVideo(vec2<f32> coords) -> vec4<f32> {
return textureSample(video, mySampler, coords);
}
How does it work?
Internally a GPUVideoTexture
is a collection of 2-3 texture views, one for each plane and unused views set to a dummy texture filled with zeroes (for example if the frame is already RGB), and one uniform buffer that contains all the metadata to select code paths in the shader (see below). Hence when used in a GPUBindGroupLayout
a video texture binding uses 3 slots for sampled textures and 1 slot for a uniform buffer.
On the shader side, the following type of code expansion is performed:
// [[group(0), binding(0)]] var<uniform> video : video_texture;
// becomes something like:
[[block]] struct VideoFrameParams {
// stuff
};
[[group(0), binding(0.1)]] var<uniform> plane0 : texture_2d<f32>;
[[group(0), binding(0.2)]] var<uniform> plane1 : texture_2d<f32>;
[[group(0), binding(0.3)]] var<uniform> plane2 : texture_2d<f32>;
[[group(0), binding(0.4)]] var<uniform> frameParams : VideoFrameParams;
// textureSample of video_texture gets shimmed:
fn videoTextureSample(
t0: texture_2d<f32>, t1: texture_2d<f32>, t2: texture_2d<f32>,
params: VideoFrameParams, s: sampler, coords: vec4<f32>) -> vec4<f32> {
var transformedCoords = performTransformAndClipping(coords, params);
var rawColor = loadPerPlaneDataFromCorrectPlanes(plane0, plane1, plane2, s, params, transformedCoords);
return doColorTransform(rawColor, params);
}
fn loadVideo(vec2<f32> coords) -> vec4<f32> {
// return textureSample(video, mySampler, coords);
// becomes something like:
return videoTextureSample(plane0, plane1, plane2, frameParams, mySampler, coords);
}
Open questions
Should the application be able to create GPUVideoTexture
from GPUTexture
? This would allow them to test the code paths using GPUVideoTexture
without having to instanciate an HTMLVideoElement
with an actual video file. If we decide the answer is yes, we could have a GPUDevice.createVideoTexture
that takes an rgba8unorm
or rgba16float
texture and makes a single-planar GPUVideoTexture
.
What's the lifetime of imported GPUVideoTextures
? Accelerated video decoding often likes to reuse the same decode buffers in different frames. This means that if we do nothing we could have a race between the application reading the GPUVideoTexture
and the decoder. A solution would be to prevent the decoder from making progress, which could work for VideoFrame
but would be really weird for an HTMLVideoElement
displayed on the page. Instead @kainino0x suggested that a solution could be that GPUVideoTextures
imported from HTMLVideoElements
could only be valid until the end of the microtask / end of the RAF / end of the requestVideoFrame
. This is an appealing solution.
Your question here.