这是indexloc提供的服务,不要输入任何密码
Skip to content

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

@shaoboyan

Description

@shaoboyan

This is based on #1154 and focus on uploading VideoFrame from WebCodecs and Kangz@ inputs.

Rational

An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.

Unlike HTMLVideoElement, the upcoming WebCodecs API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.

In current status, WebCodec can only interact with WebGPU through CopyImageBitmapToTexture, by uploading video contents to GPUTexture. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.

In WebGL, WEBGL_webcodecs_video_frame extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).

So it is reasonable for WebGPU to use similar Import API to achieve an effiecient uploading path to interact with WebCodecs.

Proposals for Import API

The purpose of this API are:

  • Providing an uploading path for WebGPU with VideoFrame from WebCodecs with minimum copy/transform.
  • Expose planes in VideoFrame so developers could have abilities to handle single plane (See feedbacks here and here).

Current idl of VideoFrame contains lots of metadata about the frame content for js developer to get :

[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

There are several basic ideas:

Per-Plane Per-Readonly-Texture

This API will import each plane in VideoFrame to a separate GPUTexture object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as a GPUTexture.
webidl:

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoFrame plane format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame.plane as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    // User can get the default formats by calling getOptimalImportOptions.
    
    const plane0 = device.importTexture({
        source: videoFrame.plane[0],
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: "r8unorm" // compatible plane format for "I420"
    });
    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });
    
    const plane0View = plane0.createView();
    const plane1View = plane1.createView();
    
    // Using shaders to access each plane and do some ops.
    
    // Destory plane textures to release VideoFrame.
    plane0.destroy();
    plane1.destroy();
}

User needs to provide:

  • VideoFrame object
  • Correct plane size (Could get this from VideoFrame)
  • Compatible texture format
  • Expected usage (Readonly one)

The VideoFrame will be 'locked' if any plane has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() for all imported planes

Pros:

  • Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.
  • It is possible to call GPUTexture.destroy() eagerly.

Challenges:

  • It introduces implementation complexity for creating multiple GPUTexture wrapping individual planes of a native multi-planar texture(e.g. subresource states transition).
  • Texture format reinterpretation may need some special restrictions.

Readonly Multiplanar Texture

This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs). And users could create texture view on single plane by using aspect for this new format.
webidl:

enum GPUTextureAspect {
    "all",
    "stencil-only",
    "depth-only",
    "plane0",
    "plane1",
    "plane2",
    "plane3" // Maximum plane number of multi-planar format is 4
};

enum GPUTextureFormat {
// Multi-planar texture formats
"I420",
...
};

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoPixelFormat which can be know from VideoFrame.format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    
    const frame = device.importTexture({
        source: videoFrame,
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: videoFrame.format // assume it is 'I420'
    });
    
    const plane0View = frame.createView({aspect: 'plane0'});
    const plane1View = frame.createView({aspect: 'plane1'});
    
    // Using shaders to access each plane view and do some ops.
    
    // Destory plane textures to release VideoFrame.
    frame.destroy();
}

The API will import VideoFrame to a GPUTexture object with multi-planar texture format. User needs to provide:

  • VideoFrame obejct.
  • Correct VideoFrame size (Could get this from VideoFrame)
  • Compatible multi-planar texture format
  • Expected usage (Readonly one)

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() on imported GPUTexture object.

Pros:

  • It is possible to call GPUTexture.destroy() eagerly.
  • The multi-planar formats would match the VideoFrame pixel format, which is a bit less confusing than the other alternatives.
  • Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.

Challenges:

  • The concept of multiplanar textures is exposed through the WebGPU API so they need to be well-specified and all operations made secure. This is a lot of work because the texture would have per-aspect mip size, weird copy restrictions, etc.
  • Users would likely not be able to create a multi-planar texture themselves (because the formats aren't supported universally) so they wouldn't be able to test their code with fake multiplanar textures.
  • Users need to be quite farmiliar to video pixel formats.

Per-Plane Per-Readonly-Texture-Views

This introduces some new APIs based on #1154 to import the GPUTextureSource but return multiple GPUTextureView. An API is introduced to release the imported resouce explictly.

interface GPUTextureViewImportDescriptor {
    (VideoFrame as GPUTextureSource) source;
    sequence<GPUTextureFormat>? formats;
};

interface GPUDevice {
    GPUTextureViewImportDescriptor
        getOptGPUTextureViewImportDescriptor((VideoFrame as GPUTextureSource) source);
    FrozenArray<GPUTextureView> importTextureView(GPUTextureViewImportDescriptor desc.);
    undefined releaseTextureSource((VideoFrame as GPUTextureSource) source);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420' and users can know plane compatible formats.
    // Or users could call getOptGPUTextureViewImportDescriptor(videoFrame)
    // to get default view formats and choose shaders.
    
    const [plane0View, plane1View] = device.importTextureView({ source: videoFrame});

    // Using shaders to access each plane view and do some ops.
    
    // release VideoFrame.
    releaseImportSource(videoFrame);
}

The API will import VideoFrame to several GPUTextureView objects. User needs to provide:

  • VideoFrame object.

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling releaseImportSource with VideoFrame object as parameter.

Pros:

  • We have GPUTextureView which can be set in the bind group directly.
  • The concept of multi-planar textures isn't exposed to WebGPU.
  • Validation and implemenation are not complex.

Challenges:

  • No support for copying from the texture views, no format reinterpretation and no GPUTexture.destroy() to release.
  • The browser still needs to implement multi-planar format support internally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions