-
Notifications
You must be signed in to change notification settings - Fork 344
Description
This is based on #1154 and focus on uploading VideoFrame from WebCodecs and Kangz@ inputs.
Rational
An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.
Unlike HTMLVideoElement, the upcoming WebCodecs API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.
In current status, WebCodec can only interact with WebGPU through CopyImageBitmapToTexture, by uploading video contents to GPUTexture. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.
In WebGL, WEBGL_webcodecs_video_frame extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).
So it is reasonable for WebGPU to use similar Import API to achieve an effiecient uploading path to interact with WebCodecs.
Proposals for Import API
The purpose of this API are:
- Providing an uploading path for WebGPU with
VideoFramefrom WebCodecs with minimum copy/transform. - Expose planes in
VideoFrameso developers could have abilities to handle single plane (See feedbacks here and here).
Current idl of VideoFrame contains lots of metadata about the frame content for js developer to get :
[Exposed=(Window,Worker)]
interface VideoFrame {
constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
VideoFrameInit frameInit);
readonly attribute PixelFormat format;
readonly attribute FrozenArray<Plane> planes;
readonly attribute unsigned long codedWidth;
readonly attribute unsigned long codedHeight;
readonly attribute unsigned long cropLeft;
readonly attribute unsigned long cropTop;
readonly attribute unsigned long cropWidth;
readonly attribute unsigned long cropHeight;
readonly attribute unsigned long displayWidth;
readonly attribute unsigned long displayHeight;
readonly attribute unsigned long long? duration;
readonly attribute unsigned long long? timestamp;
undefined destroy();
VideoFrame clone();
Promise<ImageBitmap> createImageBitmap(
optional ImageBitmapOptions options = {});
};There are several basic ideas:
Per-Plane Per-Readonly-Texture
This API will import each plane in VideoFrame to a separate GPUTexture object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as a GPUTexture.
webidl:
interface GPUTextureImportOptions {
// ReadOnly Usages
GPUTextureUsage usage;
// Need to compatible with VideoFrame plane format
GPUTextureFormat format;
};
interface GPUTextureImportDescriptor : GPUTextureImportOptions {
(VideoFrame.plane as GPUTextureSource) source;
GPUExtent3D size; // Can always be known from VideoFrame
};
interface GPUDevice {
GPUTexture importTexture( GPUTextureImportDescriptor desc);
};Using this API to import VideoFrame to WebGPU could like:
function frame() {
const videoFrame = getSomeWebCodecVideoFrame();
// From videoFrame.format knows that this frame is 'I420'.
// User can get the default formats by calling getOptimalImportOptions.
const plane0 = device.importTexture({
source: videoFrame.plane[0],
size: [videoFrame.codedWidth, videoFrame.codedHeight],
usage: GPUTextureUsage.SAMPLED,
format: "r8unorm" // compatible plane format for "I420"
});
const plane1 = device.importTexture({
source: videoFrame.plane[1],
size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
usage: GPUTextureUsage.SAMPLED,
format: "rg8unorm" // compatible plane format for "I420"
});
const plane0View = plane0.createView();
const plane1View = plane1.createView();
// Using shaders to access each plane and do some ops.
// Destory plane textures to release VideoFrame.
plane0.destroy();
plane1.destroy();
}User needs to provide:
VideoFrameobject- Correct plane size (Could get this from
VideoFrame) - Compatible texture format
- Expected usage (Readonly one)
The VideoFrame will be 'locked' if any plane has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() for all imported planes
Pros:
- Users have
GPUTextureobjects that can be used in copy operations and to create texture views with format reinterpretation. - It is possible to call
GPUTexture.destroy()eagerly.
Challenges:
- It introduces implementation complexity for creating multiple
GPUTexturewrapping individual planes of a native multi-planar texture(e.g. subresource states transition). - Texture format reinterpretation may need some special restrictions.
Readonly Multiplanar Texture
This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs). And users could create texture view on single plane by using aspect for this new format.
webidl:
enum GPUTextureAspect {
"all",
"stencil-only",
"depth-only",
"plane0",
"plane1",
"plane2",
"plane3" // Maximum plane number of multi-planar format is 4
};
enum GPUTextureFormat {
// Multi-planar texture formats
"I420",
...
};
interface GPUTextureImportOptions {
// ReadOnly Usages
GPUTextureUsage usage;
// Need to compatible with VideoPixelFormat which can be know from VideoFrame.format
GPUTextureFormat format;
};
interface GPUTextureImportDescriptor : GPUTextureImportOptions {
(VideoFrame as GPUTextureSource) source;
GPUExtent3D size; // Can always be known from VideoFrame
};
interface GPUDevice {
GPUTexture importTexture( GPUTextureImportDescriptor desc);
};Using this API to import VideoFrame to WebGPU could like:
function frame() {
const videoFrame = getSomeWebCodecVideoFrame();
// From videoFrame.format knows that this frame is 'I420'.
const frame = device.importTexture({
source: videoFrame,
size: [videoFrame.codedWidth, videoFrame.codedHeight],
usage: GPUTextureUsage.SAMPLED,
format: videoFrame.format // assume it is 'I420'
});
const plane0View = frame.createView({aspect: 'plane0'});
const plane1View = frame.createView({aspect: 'plane1'});
// Using shaders to access each plane view and do some ops.
// Destory plane textures to release VideoFrame.
frame.destroy();
}The API will import VideoFrame to a GPUTexture object with multi-planar texture format. User needs to provide:
VideoFrameobejct.- Correct
VideoFramesize (Could get this fromVideoFrame) - Compatible multi-planar texture format
- Expected usage (Readonly one)
The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() on imported GPUTexture object.
Pros:
- It is possible to call GPUTexture.destroy() eagerly.
- The multi-planar formats would match the VideoFrame pixel format, which is a bit less confusing than the other alternatives.
- Users have
GPUTextureobjects that can be used in copy operations and to create texture views with format reinterpretation.
Challenges:
- The concept of multiplanar textures is exposed through the WebGPU API so they need to be well-specified and all operations made secure. This is a lot of work because the texture would have per-aspect mip size, weird copy restrictions, etc.
- Users would likely not be able to create a multi-planar texture themselves (because the formats aren't supported universally) so they wouldn't be able to test their code with fake multiplanar textures.
- Users need to be quite farmiliar to video pixel formats.
Per-Plane Per-Readonly-Texture-Views
This introduces some new APIs based on #1154 to import the GPUTextureSource but return multiple GPUTextureView. An API is introduced to release the imported resouce explictly.
interface GPUTextureViewImportDescriptor {
(VideoFrame as GPUTextureSource) source;
sequence<GPUTextureFormat>? formats;
};
interface GPUDevice {
GPUTextureViewImportDescriptor
getOptGPUTextureViewImportDescriptor((VideoFrame as GPUTextureSource) source);
FrozenArray<GPUTextureView> importTextureView(GPUTextureViewImportDescriptor desc.);
undefined releaseTextureSource((VideoFrame as GPUTextureSource) source);
};Using this API to import VideoFrame to WebGPU could like:
function frame() {
const videoFrame = getSomeWebCodecVideoFrame();
// From videoFrame.format knows that this frame is 'I420' and users can know plane compatible formats.
// Or users could call getOptGPUTextureViewImportDescriptor(videoFrame)
// to get default view formats and choose shaders.
const [plane0View, plane1View] = device.importTextureView({ source: videoFrame});
// Using shaders to access each plane view and do some ops.
// release VideoFrame.
releaseImportSource(videoFrame);
}The API will import VideoFrame to several GPUTextureView objects. User needs to provide:
VideoFrameobject.
The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling releaseImportSource with VideoFrame object as parameter.
Pros:
- We have
GPUTextureViewwhich can be set in the bind group directly. - The concept of multi-planar textures isn't exposed to WebGPU.
- Validation and implemenation are not complex.
Challenges:
- No support for copying from the texture views, no format reinterpretation and no
GPUTexture.destroy()to release. - The browser still needs to implement multi-planar format support internally.