这是indexloc提供的服务,不要输入任何密码
Skip to content

Proposal of FP16 WGSL extension #2512

@jiangzhaoming

Description

@jiangzhaoming

1. Introduction

In this issue we propose an extension to enable FP16, the 16-bit floating point types, to be allowed in WGSL for proposed usages. To keep the issue in a reasonable length, detailed information of each backend’s (D3D12, Metal, Vulkan, and OpenGL) support for each usage is omitted, and can be found in a more detailed proposal design doc for this extension.

Although other low-bit-width data types, e.g. INT16 and INT8, may also be supported by backends, we focus on FP16 in this extension.

2. Proposal usages

This proposed extension, tentatively named fp16-in-shaders-and-storage, enables certain usages of FP16 in WGSL.

The considered usages of FP16 in WGSL and their support of different backends can be summarized in the table below. We propose to enable all usages except atomic FP16, using in uniform storage class and pipeline input/output in this extension, as a tradeoff between functionality and capability (some Vulkan devices support using FP16 types within shaders and storage buffer, but don’t support using them in uniform buffer). Usages of uniform storage class and pipeline input/output can also be emulated on devices that don’t support them natively. If such emulation is enabled, we can enable all usages except atomic FP16 in this extension, making this extension more general. Otherwise, the uniform storage class or pipeline input/output usages may be enabled by other extensions.

WGSL usage Vulkan (SPIR-V) D3D12 (HLSL) Metal (MSL) OpenGL (GLSL)
Defining FP16 scalar, vector and matrix types Requiring at least one Vulkan feature listed in this column Require SM6.2 or higher, using DXIL, and Native16BitShaderOpsSupported == true Support Requiring extension AMD_gpu_shader_half_float
Defining array and structure types with FP16
Conversion and bit-casting between FP16 and other types
Atomic type/operation of FP16 No support
Using variables of FP16 types in private, function and workgroup storage classes Require Vulkan feature shaderFloat16 (overall support rate 39.6%, reported by gpuinfo.org, the same for others in this column) Require SM6.2 or higher, using DXC, and Native16BitShaderOpsSupported == true Support Requiring extension AMD_gpu_shader_half_float
Using FP16 as parameters and results of user-defined value
Using FP16 with built-in function (arithmetical)
Using FP16 types in uniform storage class Require Vulkan feature uniformAndStorageBuffer16BitAccess (overall support rate 53.1%)
Using FP16 types in Storage storage class Require Vulkan feature storageBuffer16BitAccess (overall support rate 58.6%)
Using FP16 types in pipeline input/output user-defined variables Require Vulkan feature storageInputOutput16 (overall support rate 24.4%)

Please see the capability chapter in the proposed design doc for detailed investigation of different backends’ support for different usages.

The following sections will describe these usages.

3. FP16 data types

FP16 data types are supported with different requirements on each backend.If the backend supports above features, we will be able to enable using FP16 types in WGSL. We also propose to add value conversion and bitcast expressions for FP16.

Atomic type or operation for FP16 is not allowed.

3.1. Scalar, vector and matrix types of FP16

Currently in WGSL we only have 32-bits floating scalar type, f32, floating point vectors and matrices all have component type of f32. Here we propose the 16-bit floating point scalar type, f16, and corresponding vector and matrix types. If the backend meets the requirement of supporting FP16, the proposed WGSL types may be used in WGSL code.

A brief list of WGSL f16 types enabled by this extension is shown below. A detailed table for WGSL FP16 types and corresponding backend types is listed in the appendix of design doc.

WGSL Type Description
f16 The f16 type is the set of 16-bit floating point values of the IEEE-754 binary16 (half precision) format.
vecN<f16> Vector of N elements of type f16
matNxM<f16> Matrix of N columns and M rows

3.2. Array and structure with FP16 types

HLSL, MSL, SPIRV, and GLSL all support defining FP16 members in arrays and structures, if FP16 is supported. Therefore, we may enable FP16 types in arrays and structures when FP16 scalar, vector and matrix types are enabled.

Using FP16 types in uniform buffers and storage buffers may have additional memory layout restrictions, please see the corresponding chapters in the proposed design doc(uniform, storage).

3.3. Type constructor expressions and conversion expressions

This extension proposes the scalar, vector and matrix constructor expressions for f16 in WGSL in the same ways as f32.

We also propose value conversion expressions between f16 and any other types (f32, u32, i32 and bool). Value conversion expressions between matrix types (f16 and f32 matrices) are introduced in this extension, which do not exist before as there is only f32 type of matrix in WGSL. Please see the appendix for a detailed table.

Bit-casting between f16 and u32 can be done by packing and unpacking WGSL built-in functions. The most natural bit-casting, between f16 and u16, are not proposed in this extension because we don’t have u16 types yet. Bit-casting between f16 and types other than u16 or u32 shall not be allowed in WGSL.

3.4. Atomic type of FP16

No backend has a useful support of atomic FP16 types and/or operations. Therefore, we are unlikely going to support atomic type for FP16. The details for each backend are summarized in the table below.

Backend Details
D3D12 HLSL does not support atomic operations for float16.
Vulkan SPIR-V only supports OpAtomicFAddEXT for 32-bit and 64-bit floating point, requiring AtomicFloat32AddEXT and AtomicFloat64AddEXT. The extension SPV_EXT_shader_atomic_float_min_max adds atomic min and max instruction on floating-point numbers, and the extension SPV_EXT_shader_atomic_float16_add proposes supporting atomically adding to 16-bit floating-point numbers in memory. However these are still in internal status. Vulkan extension VK_EXT_shader_atomic_float2 (features) suggests 16-bit floating-point atomic operations on buffer and workgroup memory as well as floating-point atomic minimum and maximum operations on buffer, workgroup, and image memory. However, such features are generally unsupported (reported 99.8%).
Metal MSL only supports atomic types of int, uint and bool.
OpenGL OpenGL only supports atomic types of int, uint and bool.

OpenGL only has the atomic counter of integer type.

3.5. FP16 literal

In WGSL we have floating suffix f for f32. Following the GLSL extension, we propose the suffix hf for f16. The form of a numeric literal will be as shown below.

Type form
decimal_float_literal /((-?[0-9]*\.[0-9]+|-?[0-9]+\.[0-9]*)((e|E)(\+|-)?[0-9]+)?(h?f)?)|(-?[0-9]+(e|E)(\+|-)?[0-9]+(h?f)?)/
hex_float_literal /-?0[x|X]((([0-9a-fA-F]*\.[0-9a-fA-F]+|[0-9a-fA-F]+\.[0-9a-fA-F]*)((p|P)(\+|-)?[0-9]+(h?f)?)?)|([0-9a-fA-F]+(p|P)(\+|-)?[0-9]+(h?f)?))/

Some examples for the proposed FP16 literal are -.23E+8hf, 123e-4hf, 0xABC.DEFhf and 1A2BP-3hf.

Note

HLSL and MSL use the suffix h instead of hf for FP16.

4. Defining FP16 variables in private, function and workgroup storage class

This extension will enable using FP16 types in defining variables in private, function and workgroup storage classes. If a backend device is suitable to enable this extension, we can directly translate the variable definitions of these storage classes into the backend’s shader code.

5. Using FP16 types as function parameter and result

This extension enables using FP16 types as parameters and results of built-in and user-defined functions, in the same way of using FP32 types. Some FP16-specific built-in functions should be added into WGSL. In this section we will discuss this usage.

5.1. User defined function

All backends support defining functions with 16-bit floating point parameters and/or return values, as long as they support using fp16 in function storage class. Therefore, WGSL functions using fp16 as input/output can map to backends as same as other functions.

5.2. Built-in functions

All WGSL built-in functions that take f32 should support using f16 as parameters and results, as most floating point built-in functions have half-precision reload on all backends.

All data packing and unpacking built-in functions should have overloads that take FP16 types as parameters and results where originally f32 goes. Bit-casting between f16 and u32 can be done by these packing and unpacking functions. HLSL seems to have no packing/unpacking functions for FP16, nor bitCast between FP16 and uint32, and we may manually implement them.

6. Using FP16 types in uniform and storage storage class

The variables with uniform and storage storage classes work as the interactions between gpu and host memory, therefore they have to meet more requirements. We will propose the memory layout requirements for FP16 in this chapter.

6.1. Alignment and Size

The following table should be merged into WGSL Spec table Alignment and size for host-shareable types.

WGSL Type Alignment (bytes) Size (bytes)
f16 2 2
vec2<f16> 4 4
vec3<f16> 8 6
vec4<f16> 8 8
mat2x2<f16> 4 8
mat2x3<f16> 8 16
mat2x4<f16> 8 16
mat3x2<f16> 4 12
mat3x3<f16> 8 24
mat3x4<f16> 8 24
mat4x2<f16> 4 16
mat4x3<f16> 8 32
mat4x4<f16> 8 32

These values match the MSL and Vulkan Spec. If the size or alignment of structure members is not explicitly defined in WGSL code, the listed value should be used. If explicitly defined, they should satisfy the layout constraints for corresponding storage classes shown below.

Note

The Internal Layout of Values in WGSL Spec should be also updated.

6.2. Using FP16 types in uniform storage class

When using FP16 types in uniform storage class, besides the alignment requirement above should be met, the extra buffer layout constraints for uniform is also required, as is shown below. These requirements are identical to the current WGSL spec.

Host-shareable type S RequiredAlignOf(S, uniform)
i32, u32, f32, or f16 AlignOf(S)
atomic<T>, T is i32, u32, or f32 AlignOf(S)
vecN<T> AlignOf(S)
matNxM<T>, T is f32 or f16 AlignOf(S)
array<T, N> round(16, AlignOf(S))
array<T> round(16, AlignOf(S))
struct S round(16, AlignOf(S))

The matrix matNxM<f16> types should be stored as N distinct vecM<f16> (not an array of vectors), and constructed back to a matrix by generated code. In this way we workaround the matrix stride requirements in different backends.

We can do emulation for Vulkan devices with no native support for this usage (uniformAndStorageBuffer16BitAccess) by packing FP16 into UINT32 in the backend buffer structure and do the unpacking in the backend when loading in WGSL.

6.3. Using FP16 types in storage storage class

When using FP16 types in the storage storage class, the buffer layout constraints for storage are validated, as is shown below.

Host-shareable type S RequiredAlignOf(S, storage)
i32, u32, f32, or f16 AlignOf(S)
atomic<T>, T is i32, u32, or f32 AlignOf(S)
vecN<T> AlignOf(S)
matNxM<T>, T is f32 or f16 AlignOf(S)
array<T, N> AlignOf(S)
array<T> AlignOf(S)
struct S AlignOf(S)

In fact this is just requesting everything aligned to its type's alignment.

7. Using FP16 as pipeline input and output

It is possible to use FP16 in user-defined pipeline input/output variables. However some Vulkan devices, which support other usages in this extension like FP16 built-in function and storage buffer, don’t support this usage. For capabilities reasons, we may suggest not enabling using FP16 as pipeline input or output with this extension. However, this usage can also be emulated on Vulkan devices with no native support for this usage (storageInputOutput16).

7.1. Built-in input/output variable

Built-in floating-point pipeline input and output variables (e.g. position and frag_depth) should be kept as f32.

7.2. User-defined input/output variable

Using user-defined variables of FP16 types as shader input and output (and intra-shader variables) is partially supported by backends. MSL, HLSL (if supporting native 16-bit mode) and GLSL (with AMD_gpu_shader_half_float extension) supports this usage, but Vulkan devices diverged. E.g., Nvidia Vulkan Windows drivers, Intel Linux Mesa drivers, and Google Pixel (before Pixel 6) all don’t support the required Vulkan feature storageInputOutput16. It is possible to emulate using FP16 types as intra-shader variables for devices not supporting storageInputOutput16.

8. Capability

Different vendors seem to have different support for FP16 related features, and different backend drivers of a vendor also have different support competence.

Detailed investigation on devices’ support are shown in the proposed design doc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wgslWebGPU Shading Language Issues

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions