这是indexloc提供的服务,不要输入任何密码
Skip to content

Proposal for importing Web platform images in WebGPU #1154

@Kangz

Description

@Kangz

This is based on #700 and additional insights that @shaoboyan provided offline but also on conversations with many other participants in the group (@jdashg @kenrussell @kvark, @austinEng, etc). It is also related to #625.

Rational

It is important that WebGPU is able to interoperate nicely with the rest of the Web platform and in many cases it means sharing image data with other Web platform object. Copying of image data is expensive, especially if done every frame, so we should have an efficient way to do get image data in (and maybe out) of WebGPU. There are many use cases:

  • Compositing a video stream from an HTMLVideoElement (or WebRTC?) with additional effects using WebGPU.
  • Running compute-shader based machine learning pipelines on a video stream.
  • Rendering the UI of your WebGPU application using 2D canvas and compositing it with the rest of the 3D scene.
  • Importing data from HTMLImageElement to load the textures for your application (although that's not done every frame).
  • Interoperating between WebGL and WebGPU inside the same application by sharing images (via HTMLCanvasElement or a new mechanism to be invented).

A long time ago the group agreed that a path forward could be GPUQueue.copyImageBitmapToTexture which is like a copyTextureToTexture except that one of the texture is actually an ImageBitmap. Part of the idea was that ImageBitmap would eventually represent resources that are most probably resident on the GPU. This isn't really the case in Firefox that doesn't have any ImageBitmap backend by GPU memory at this time, nor in Chromium where there's a lot of fallbacks happening that cause the ImageBitmap to be CPU-backed (and when on the GPU it might be on different devices or on different APIs). I don't know how ImageBitmaps work in Safari.

So if ImageBitmaps were always backed by GPU memory that can be used for an efficient texture-to-texture copy to a GPUTexture (on its specific GPU device in multi-GPU systems) and if producing ImageBitmaps didn't involve copies (because its source was already on the GPU, so it would just take a reference), then copyImageBitmapToTexture would always be a one-copy path, that can sometimes copy through a shader to take care of y-flip or un/premultiplying the alpha channel.

@shaoboyan spent a lot of time studying how to implement an efficient copyImageBitmapToTexture and ran into many blockers:

  • ImageBitmap is supposed to take a snapshot of its source "for free" but this isn't always the case, and can be actually be very difficult for things like video decoder outputs.
  • ImageBitmap might not be allocated in a way that makes it possible for the WebGPU implementation to copy from it directly (e.g. on the CPU, on a different GPU, or on the same GPU with an incompatible driver).
  • The content of the ImageBitmap is immutable but it can be hard to ensure it is readable by multiple users of it at once (for example the DOM and copyImageBitmapToTexture).

Allowing more than ImageBitmap

WebGL can do an operation similar to copyImageBitmapToTexture by passing various types of object to the gl.texImage2D family of operation. It uses the concept of a TexImageSource that's defined like this:

typedef (ImageBitmap or
         ImageData or
         HTMLImageElement or
         HTMLCanvasElement or
         HTMLVideoElement or
         OffscreenCanvas) TexImageSource;

As discussed above forcing data to go through ImageBitmap can often add an extra copy, so we could add direct copies from other texture sources. WebGPU likely doesn't need to support the full breadth of these types, for example ImageBitmap, ImageData, and HTMLImageElement could just be ImageData, but it seem important to have direct support for objects producing data every frame. This means that the prototype for copyImageBitmapToTexture becomes:

typedef (ImageBitmap or
         HTMLCanvasElement or
         HTMLVideoElement or
         OffscreenCanvas) GPUTextureSource;

dictionary GPUTextureSourceView {
    required GPUTextureSource source;
    GPUOrigin2D origin = {};
    // Other arguments like Y-flip, un/premultiply alpha?
};

partial interface GPUQueue {
    void copyFromTextureSource(
        GPUTextureSourceView source,
        GPUTextureCopyView destination,
        GPUExtent3D copySize);
};

Importing textures instead of copying them

Another problem with copyImageBitmapToTexture is that it always forces at least one copy, as the name implies. Often developers want to have all of the texture data from the texture source, for example when compositing a canvas / video, or when using HTMLImageElement to decode image assets. In the initial discussions about copyImageBitmapToTexture we agreed to come back and find a solution that could potentially avoid that copy.

The biggest problem with importing a GPUTexture directly from the texture source is that depending on the environment (browser, OS, but also video or image file encoding), the optimal GPUTextureDescriptor to use isn't obvious. For example an image or a video can be using 8bit per pixels, or more. Maybe requesting the OUTPUT_ATTACHMENT usage is free, but maybe it will require a copy because the texture source wasn't previously allocated with that usage, etc.

Here's a proposal for what a GPUDevice.importTexture could look like:

// Options that can be used when importing a texture. It is separated
// from GPUTextureImportDescriptor so that it can be returned by
// GPUDevice.getOptimalImportOptions.
interface GPUTextureImportOptions {
    // SAMPLED and COPY_SRC are always guaranteed
    GPUTextureUsage usage;
    GPUTextureFormat format;
    // Other things like Y-flip, un/pre-multiplyalpha.
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    GPUTextureSource source;
    GPUExtent3D size; // Can always be known for all GPUTextureSource types
};

partial interface GPUDevice {
    // Might need to be asynchronous?
    GPUTextureImportOptions getOptimalImportOptions(GPUTextureSource source);

    // Will work even when `desc` doesn't match the result of
    // `getOptimalImportOptions` by performing implicit conversions etc.
    GPUTexture importTexture(GPUTextureImportDescriptor desc);
};

Using it to import canvas data during a frame of a WebGPU application could look like this:

function frame() {
    drawUI(canvas);

    const texture = device.importTexture({
        source: canvas,
        size: [canvas.width, canvas.height],
        usage: GPUTextureUsage.SAMPLED,
        format: `rgba8unrom`,
    });
    
    compositeUI(texture);
    texture.destroy();
}

If the developer wants to ensure a more optimal path portably they could do the following:

function frame() {
    drawUI(canvas);

    const bestOptions = device.getOptimalImportOptions(canvas);
    if (!canSupportCanvasFormat()) {
        bestOptions.format = 'rgba8unorm';
    }
    assert(bestOptions.usage & GPUTextureUsage.SAMPLED);

    const texture = device.importTexture({
        source: canvas,
        size: [canvas.width, canvas.height],
        usage: GPUTextureUsage.SAMPLED,
        format: bestOptions.format,
    });
    
    compositeUI(texture);
    texture.destroy();
}

There are many open questions:

  • What happens for mutable images once they are imported (like HTMLCanvasElement), are they teared from their canvas and replaced with an empty canvas image, or are they tagged for copy-on-write?
  • Video decoders can have a fixed number of output buffers, what happens if WebGPU has references on all of them? Is the video decoder blocked from making progress? Maybe WebGPU is only allowed a single frame and the previous frame gets detached (like GPUTexture.destroy()) when the next one is queried?
  • Can we guarantee textures will have textureComponentType: 'float' or should we plan for other types of textures to come through, like depth textures?
  • How long do are the results of getOptimalImportOptions valid? Can they stay valid forever for a canvas as long as you don't touch it? What about video that might change codecs/encodings in the middle?

Hints for image producers

Another problem discussed above is that texture sources can produce data that can't efficiently be used by WebGPU. For copies that happen once like HTMLImageElement an extra copy to make the data visible to WebGPU might be okay, but for copies happening every frame like for HTMLCanvasElement or HTMLVideoElement, there is a large performance and power-consumption cost.

There should be a mechanism, either visible to developers, either implementation details, that help texture sources produce data directly visible to WebGPU. There's been several potential solutions discussed in the past:

  • Adding a forGPUDevice attribut in the ImageBitmap descriptor that hints it will be used for a specific GPUDevice.
  • Similarly adding a property on HTMLCanvasElement or HTMLVideoElement and other sources to tell them they will be used on a GPUDevice.
  • Having a feedback mechanism in browsers that tells a texture source to change how it produces texture after it has been used to import on a GPUDevice multiple times.

Future: exporting WebGPU textures?

This proposal only discusses how to efficiently import texture in WebGPU, but in the future we could imagine that some API will want to import WebGPU texture efficiently. How would that work?

An idea could be to add an exportable boolean to GPUTextureDescriptor that adds extra restriction but allows calling a ImageBitmap GPUTexture.export() method that from the point of view of the WebGPU API is the same as GPUTexture.destroy() but produces an ImageBitmap with the content of that texture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions