Proposal for a GPUVideoTexture object.

This is another proposal for ways to import video frames into WebGPU. With the various methods outlined in #1380 and #1154 that's quite a lot of different proposals which cover a lot of the design space.

The difficulty with video frames is that inside browsers they come in many forms: RGB8 data, YUV 420, YUV 422, RGB16Float and many more formats. There's also metadata for each frame like its colorspace, clipping rectangle, space transform etc. The other challenge is that operations on video are done every frame of the video so small inefficiencies like additional copies and conversions can add up.#1411 

This proposal is similar in its goal to #1380 in that it theoretically allows for videos in whatever format to be used with 0 copies in WebGPU. Instead of exposing all of the complexity of video frames to the application, it encapsulates it all in an opaque `GPUVideoTexture` object. Internally in browsers the `GPUVideoTexture` will most likely be a collection of `GPUTextureViews`, one per plane of the source texture, and a uniform `GPUBuffer` that contains all the metadata for the frame. This is the same concept as `GL_OES_EGL_image_external` in which an external image in any format is exposed as an "opaque sampler".

## Show me the WebIDL!

```webidl
interface GPUVideoTexture : GPUObjectBase {
    destroy();
};

// A GPUVideoTexture is imported by calling GPUDevice.importVideoFrame
// with either an HTMLVideoElement or a VideoFrame in the desciptor
// (or some other video-like object, like a WebRTC stream?)
dictionary GPUVideoTextureDescriptor : GPUObjectDescriptorBase {
     required (HTMLVideoElement or VideoFrame) source;
};
partial interface GPUDevice {
    GPUVideoTexture importVideoFrame(GPUVideoTextureDescriptor descriptor);
}

// BindGroupLayouts have a new "video texture" binding type.
dictionary GPUVideoTextureBindingLayout {
    // nothing for now.
};
partial dictionary GPUBindGroupLayoutEntry {
    GPUVideoTextureBindingLayout videoTexture;
};

// A GPUVideoTexture is a new type of GPUBindingResource.
typedef (... or GPUVideoTexture) GPUBindingResource;
```

Using this a `GPUVideoTexture` on the JS side works as you'd expect:

```js
const videoTexture = device.importVideoTexture({
    source: myVideoElement
});

const layout = device.createBindGroupLayout({
    entries: [{
        binding: 0,
        visibility: GPUShaderStage.FRAGMENT,
        videoTexture: {}
    }]
});

const bg = device.createBindGroup({
    layout,
    entries: [{
        binding: 0,
        resource: videoTexture,
    }]
});

// Create a pipeline using `layout`, render using it and `bg`.
```

## Show me the WGSL!

```rust
// Video textures are a new interface type, they act a bit like texture_2d<f32>:
//   - You defined them as an interface variable
[[group(0), binding(0)]] var<uniform> video : video_texture;

// - You sample them with a sampler, but that's the only type of builtin!
fn loadVideo(vec2<f32> coords) -> vec4<f32> {
     return textureSample(video, mySampler, coords);
}
```

## How does it work?

Internally a `GPUVideoTexture` is a collection of 2-3 texture views, one for each plane and unused views set to a dummy texture filled with zeroes (for example if the frame is already RGB), and one uniform buffer that contains all the metadata to select code paths in the shader (see below). Hence when used in a `GPUBindGroupLayout` a video texture binding uses 3 slots for sampled textures and 1 slot for a uniform buffer.

On the shader side, the following type of code expansion is performed:

```rust
// [[group(0), binding(0)]] var<uniform> video : video_texture;
// becomes something like:
[[block]] struct VideoFrameParams {
    // stuff
};
[[group(0), binding(0.1)]] var<uniform> plane0 : texture_2d<f32>;
[[group(0), binding(0.2)]] var<uniform> plane1 : texture_2d<f32>;
[[group(0), binding(0.3)]] var<uniform> plane2 : texture_2d<f32>;
[[group(0), binding(0.4)]] var<uniform> frameParams : VideoFrameParams;

// textureSample of video_texture gets shimmed:
fn videoTextureSample(
        t0: texture_2d<f32>, t1: texture_2d<f32>, t2: texture_2d<f32>,
        params: VideoFrameParams, s: sampler, coords: vec4<f32>) -> vec4<f32> {
    var transformedCoords = performTransformAndClipping(coords, params);
    var rawColor = loadPerPlaneDataFromCorrectPlanes(plane0, plane1, plane2, s, params, transformedCoords);
    return doColorTransform(rawColor, params);
}

fn loadVideo(vec2<f32> coords) -> vec4<f32> {
     // return textureSample(video, mySampler, coords);
     // becomes something like:
     return videoTextureSample(plane0, plane1, plane2, frameParams, mySampler, coords);
}
```

## Open questions

**Should the application be able to create `GPUVideoTexture` from `GPUTexture`?** This would allow them to test the code paths using `GPUVideoTexture` without having to instanciate an `HTMLVideoElement` with an actual video file. If we decide the answer is yes, we could have a `GPUDevice.createVideoTexture` that takes an `rgba8unorm` or `rgba16float` texture and makes a single-planar `GPUVideoTexture`.

**What's the lifetime of imported `GPUVideoTextures`?** Accelerated video decoding often likes to reuse the same decode buffers in different frames. This means that if we do nothing we could have a race between the application reading the `GPUVideoTexture` and the decoder. A solution would be to prevent the decoder from making progress, which could work for `VideoFrame` but would be really weird for an `HTMLVideoElement` displayed on the page. Instead @kainino0x suggested that a solution could be that `GPUVideoTextures` imported from `HTMLVideoElements` could only be valid until the end of the microtask / end of the RAF / end of the `requestVideoFrame`. This is an appealing solution.

**Your question here.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for a GPUVideoTexture object. #1415

Show me the WebIDL!

Show me the WGSL!

How does it work?

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal for a GPUVideoTexture object. #1415

Description

Show me the WebIDL!

Show me the WGSL!

How does it work?

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions