Investigation: Import VideoFrame from WebCodec to WebGPU

This is based on [#1154](https://github.com/gpuweb/gpuweb/issues/1154) and focus on uploading [VideoFrame](https://wicg.github.io/web-codecs/#videoframe-interface) from [WebCodecs](https://wicg.github.io/web-codecs/) and Kangz@ inputs.

## Rational

An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.

Unlike HTMLVideoElement, the upcoming [WebCodecs](https://wicg.github.io/web-codecs) API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.
 
In current status, WebCodec can only interact with WebGPU through `CopyImageBitmapToTexture`,  by uploading video contents to `GPUTexture`. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.

In WebGL, [WEBGL_webcodecs_video_frame](https://github.com/KhronosGroup/WebGL/tree/master/extensions/proposals/WEBGL_webcodecs_video_frame) extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).  

So it is reasonable for WebGPU to use similar `Import` API to achieve an effiecient uploading path to interact with WebCodecs. 


## Proposals for Import API

The purpose of this API are:
-   Providing an uploading path for WebGPU with `VideoFrame` from WebCodecs with minimum copy/transform.
-  Expose planes in `VideoFrame` so developers could have abilities to handle single plane (See feedbacks [here](https://github.com/gpuweb/gpuweb/issues/625#issuecomment-610047034) and [here](https://github.com/gpuweb/gpuweb/issues/625#issuecomment-610367754)).

Current idl of `VideoFrame` contains lots of metadata about the frame content for js developer to get :
```webidl
[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};
```

There are several basic ideas:
###  Per-Plane Per-Readonly-Texture
This API will import each plane in `VideoFrame` to a separate `GPUTexture` object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as a `GPUTexture`.
webidl:
```webidl
interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoFrame plane format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame.plane as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };
```
Using this API to import `VideoFrame` to WebGPU could like:
```js
function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    // User can get the default formats by calling getOptimalImportOptions.
    
    const plane0 = device.importTexture({
        source: videoFrame.plane[0],
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: "r8unorm" // compatible plane format for "I420"
    });
    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });
    
    const plane0View = plane0.createView();
    const plane1View = plane1.createView();
    
    // Using shaders to access each plane and do some ops.
    
    // Destory plane textures to release VideoFrame.
    plane0.destroy();
    plane1.destroy();
}
```
User needs to provide:
- `VideoFrame` object
- Correct plane size (Could get this from `VideoFrame`)
- Compatible texture format
-  Expected usage (Readonly one)

The `VideoFrame` will be 'locked' if any plane has been imported. And the `VideoFrame` will be 'released' by calling `GPUTexture.destroy()` for all imported `planes`
#### Pros:
- Users have `GPUTexture` objects that can be used in copy operations and to create texture views with format reinterpretation.
- It is possible to call `GPUTexture.destroy()` eagerly.
#### Challenges:
- It introduces implementation complexity for creating multiple `GPUTexture` wrapping individual planes of a native multi-planar texture(e.g. subresource states transition).
- Texture format reinterpretation may need some special restrictions.


###  Readonly Multiplanar Texture
This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs).  And users could create texture view on single plane by using `aspect` for this new format. 
webidl:
```webidl
enum GPUTextureAspect {
    "all",
    "stencil-only",
    "depth-only",
    "plane0",
    "plane1",
    "plane2",
    "plane3" // Maximum plane number of multi-planar format is 4
};

enum GPUTextureFormat {
// Multi-planar texture formats
"I420",
...
};

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoPixelFormat which can be know from VideoFrame.format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };
```
Using this API to import `VideoFrame` to WebGPU could like:
```js
function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    
    const frame = device.importTexture({
        source: videoFrame,
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: videoFrame.format // assume it is 'I420'
    });
    
    const plane0View = frame.createView({aspect: 'plane0'});
    const plane1View = frame.createView({aspect: 'plane1'});
    
    // Using shaders to access each plane view and do some ops.
    
    // Destory plane textures to release VideoFrame.
    frame.destroy();
}
```
The API will import `VideoFrame` to a GPUTexture object with multi-planar texture format. User needs to provide:
- `VideoFrame` obejct.
- Correct `VideoFrame` size (Could get this from `VideoFrame`)
- Compatible multi-planar texture format
-  Expected usage (Readonly one)

The `VideoFrame` will be 'locked' if it has been imported. And the `VideoFrame` will be 'released' by calling `GPUTexture.destroy()` on imported `GPUTexture` object.
#### Pros:
- It is possible to call GPUTexture.destroy() eagerly.  
-   The multi-planar formats would match the VideoFrame pixel format, which is a bit less confusing than the other alternatives.
- Users have `GPUTexture` objects that can be used in copy operations and to create texture views with format reinterpretation.
#### Challenges:
-   The concept of multiplanar textures is exposed through the WebGPU API so they need to be well-specified and all operations made secure. This is a lot of work because the texture would have per-aspect mip size, weird copy restrictions, etc.
-   Users would likely not be able to create a multi-planar texture themselves (because the formats aren't supported universally) so they wouldn't be able to test their code with fake multiplanar textures.
-  Users need to be quite farmiliar to video pixel formats.

###  Per-Plane Per-Readonly-Texture-Views
This introduces some new APIs based on [#1154](https://github.com/gpuweb/gpuweb/issues/1154) to import the `GPUTextureSource` but return multiple `GPUTextureView`. An API is introduced to release the imported resouce explictly.
```webidl
interface GPUTextureViewImportDescriptor {
    (VideoFrame as GPUTextureSource) source;
    sequence<GPUTextureFormat>? formats;
};

interface GPUDevice {
    GPUTextureViewImportDescriptor
        getOptGPUTextureViewImportDescriptor((VideoFrame as GPUTextureSource) source);
    FrozenArray<GPUTextureView> importTextureView(GPUTextureViewImportDescriptor desc.);
    undefined releaseTextureSource((VideoFrame as GPUTextureSource) source);
 };
```
Using this API to import `VideoFrame` to WebGPU could like:
```js
function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420' and users can know plane compatible formats.
    // Or users could call getOptGPUTextureViewImportDescriptor(videoFrame)
    // to get default view formats and choose shaders.
    
    const [plane0View, plane1View] = device.importTextureView({ source: videoFrame});

    // Using shaders to access each plane view and do some ops.
    
    // release VideoFrame.
    releaseImportSource(videoFrame);
}
```
The API will import `VideoFrame` to several `GPUTextureView` objects. User needs to provide:
- `VideoFrame` object.

The `VideoFrame` will be 'locked' if it has been imported. And the `VideoFrame` will be 'released' by calling `releaseImportSource` with `VideoFrame` object as parameter.
#### Pros:
-   We have `GPUTextureView` which can be set in the bind group directly.   
-   The concept of multi-planar textures isn't exposed to WebGPU.
-   Validation and implemenation are not complex.
#### Challenges:
-   No support for copying from the texture views, no format reinterpretation and no `GPUTexture.destroy()` to release.
-   The browser still needs to implement multi-planar format support internally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

Rational

Proposals for Import API

Per-Plane Per-Readonly-Texture

Pros:

Challenges:

Readonly Multiplanar Texture

Pros:

Challenges:

Per-Plane Per-Readonly-Texture-Views

Pros:

Challenges:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

Description

Rational

Proposals for Import API

Per-Plane Per-Readonly-Texture

Pros:

Challenges:

Readonly Multiplanar Texture

Pros:

Challenges:

Per-Plane Per-Readonly-Texture-Views

Pros:

Challenges:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions