-
Notifications
You must be signed in to change notification settings - Fork 329
Description
1. Introduction
In this issue we propose an extension to enable FP16, the 16-bit floating point types, to be allowed in WGSL for proposed usages. To keep the issue in a reasonable length, detailed information of each backend’s (D3D12, Metal, Vulkan, and OpenGL) support for each usage is omitted, and can be found in a more detailed proposal design doc for this extension.
Although other low-bit-width data types, e.g. INT16 and INT8, may also be supported by backends, we focus on FP16 in this extension.
2. Proposal usages
This proposed extension, tentatively named fp16-in-shaders-and-storage
, enables certain usages of FP16 in WGSL.
The considered usages of FP16 in WGSL and their support of different backends can be summarized in the table below. We propose to enable all usages except atomic FP16, using in uniform
storage class and pipeline input/output in this extension, as a tradeoff between functionality and capability (some Vulkan devices support using FP16 types within shaders and storage buffer, but don’t support using them in uniform buffer). Usages of uniform
storage class and pipeline input/output can also be emulated on devices that don’t support them natively. If such emulation is enabled, we can enable all usages except atomic FP16 in this extension, making this extension more general. Otherwise, the uniform
storage class or pipeline input/output usages may be enabled by other extensions.
WGSL usage | Vulkan (SPIR-V) | D3D12 (HLSL) | Metal (MSL) | OpenGL (GLSL) |
Defining FP16 scalar, vector and matrix types | Requiring at least one Vulkan feature listed in this column | Require SM6.2 or higher, using DXIL, and Native16BitShaderOpsSupported == true
|
Support | Requiring extension AMD_gpu_shader_half_float
|
Defining array and structure types with FP16 | ||||
Conversion and bit-casting between FP16 and other types | ||||
Atomic type/operation of FP16 | No support | |||
Using variables of FP16 types in private , function and workgroup storage classes
|
Require Vulkan feature shaderFloat16 (overall support rate 39.6%, reported by gpuinfo.org, the same for others in this column)
|
Require SM6.2 or higher, using DXC, and Native16BitShaderOpsSupported == true
|
Support | Requiring extension AMD_gpu_shader_half_float
|
Using FP16 as parameters and results of user-defined value | ||||
Using FP16 with built-in function (arithmetical) | ||||
Using FP16 types in uniform storage class
|
Require Vulkan feature uniformAndStorageBuffer16BitAccess (overall support rate 53.1%)
|
|||
Using FP16 types in Storage storage class
|
Require Vulkan feature storageBuffer16BitAccess (overall support rate 58.6%)
|
|||
Using FP16 types in pipeline input/output user-defined variables | Require Vulkan feature storageInputOutput16 (overall support rate 24.4%)
|
Please see the capability chapter in the proposed design doc for detailed investigation of different backends’ support for different usages.
The following sections will describe these usages.
3. FP16 data types
FP16 data types are supported with different requirements on each backend.If the backend supports above features, we will be able to enable using FP16 types in WGSL. We also propose to add value conversion and bitcast expressions for FP16.
Atomic type or operation for FP16 is not allowed.
3.1. Scalar, vector and matrix types of FP16
Currently in WGSL we only have 32-bits floating scalar type, f32
, floating point vectors and matrices all have component type of f32
. Here we propose the 16-bit floating point scalar type, f16
, and corresponding vector and matrix types. If the backend meets the requirement of supporting FP16, the proposed WGSL types may be used in WGSL code.
A brief list of WGSL f16
types enabled by this extension is shown below. A detailed table for WGSL FP16 types and corresponding backend types is listed in the appendix of design doc.
WGSL Type | Description |
f16
|
The f16 type is the set of 16-bit floating point values of the IEEE-754 binary16 (half precision) format.
|
vecN<f16>
|
Vector of N elements of type f16
|
matNxM<f16>
|
Matrix of N columns and M rows
|
3.2. Array and structure with FP16 types
HLSL, MSL, SPIRV, and GLSL all support defining FP16 members in arrays and structures, if FP16 is supported. Therefore, we may enable FP16 types in arrays and structures when FP16 scalar, vector and matrix types are enabled.
Using FP16 types in uniform buffers and storage buffers may have additional memory layout restrictions, please see the corresponding chapters in the proposed design doc(uniform, storage).
3.3. Type constructor expressions and conversion expressions
This extension proposes the scalar, vector and matrix constructor expressions for f16
in WGSL in the same ways as f32
.
We also propose value conversion expressions between f16
and any other types (f32
, u32
, i32
and bool
). Value conversion expressions between matrix types (f16
and f32
matrices) are introduced in this extension, which do not exist before as there is only f32
type of matrix in WGSL. Please see the appendix for a detailed table.
Bit-casting between f16
and u32
can be done by packing and unpacking WGSL built-in functions. The most natural bit-casting, between f16
and u16
, are not proposed in this extension because we don’t have u16
types yet. Bit-casting between f16
and types other than u16
or u32
shall not be allowed in WGSL.
3.4. Atomic type of FP16
No backend has a useful support of atomic FP16 types and/or operations. Therefore, we are unlikely going to support atomic type for FP16. The details for each backend are summarized in the table below.
Backend | Details |
D3D12 | HLSL does not support atomic operations for float16. |
Vulkan | SPIR-V only supports OpAtomicFAddEXT for 32-bit and 64-bit floating point, requiring AtomicFloat32AddEXT and AtomicFloat64AddEXT . The extension SPV_EXT_shader_atomic_float_min_max adds atomic min and max instruction on floating-point numbers, and the extension SPV_EXT_shader_atomic_float16_add proposes supporting atomically adding to 16-bit floating-point numbers in memory. However these are still in internal status.
Vulkan extension VK_EXT_shader_atomic_float2 (features) suggests 16-bit floating-point atomic operations on buffer and workgroup memory as well as floating-point atomic minimum and maximum operations on buffer, workgroup, and image memory. However, such features are generally unsupported (reported 99.8%).
|
Metal | MSL only supports atomic types of int , uint and bool .
|
OpenGL | OpenGL only supports atomic types of int , uint and bool .
OpenGL only has the atomic counter of integer type. |
3.5. FP16 literal
In WGSL we have floating suffix f
for f32
. Following the GLSL extension, we propose the suffix hf
for f16
. The form of a numeric literal will be as shown below.
Type | form |
decimal_float_literal
|
/((-?[0-9]*\.[0-9]+|-?[0-9]+\.[0-9]*)((e|E)(\+|-)?[0-9]+)?(h?f)?)|(-?[0-9]+(e|E)(\+|-)?[0-9]+(h?f)?)/
|
hex_float_literal
|
/-?0[x|X]((([0-9a-fA-F]*\.[0-9a-fA-F]+|[0-9a-fA-F]+\.[0-9a-fA-F]*)((p|P)(\+|-)?[0-9]+(h?f)?)?)|([0-9a-fA-F]+(p|P)(\+|-)?[0-9]+(h?f)?))/
|
Some examples for the proposed FP16 literal are -.23E+8hf
, 123e-4hf
, 0xABC.DEFhf
and 1A2BP-3hf
.
Note
HLSL and MSL use the suffix
h
instead ofhf
for FP16.
4. Defining FP16 variables in private
, function
and workgroup
storage class
This extension will enable using FP16 types in defining variables in private
, function
and workgroup
storage classes. If a backend device is suitable to enable this extension, we can directly translate the variable definitions of these storage classes into the backend’s shader code.
5. Using FP16 types as function parameter and result
This extension enables using FP16
types as parameters and results of built-in and user-defined functions, in the same way of using FP32
types. Some FP16
-specific built-in functions should be added into WGSL. In this section we will discuss this usage.
5.1. User defined function
All backends support defining functions with 16-bit floating point parameters and/or return values, as long as they support using fp16 in function
storage class. Therefore, WGSL functions using fp16
as input/output can map to backends as same as other functions.
5.2. Built-in functions
All WGSL built-in functions that take f32
should support using f16
as parameters and results, as most floating point built-in functions have half-precision reload on all backends.
All data packing and unpacking built-in functions should have overloads that take FP16 types as parameters and results where originally f32
goes. Bit-casting between f16
and u32
can be done by these packing and unpacking functions. HLSL seems to have no packing/unpacking functions for FP16, nor bitCast between FP16 and uint32, and we may manually implement them.
6. Using FP16 types in uniform
and storage
storage class
The variables with uniform
and storage
storage classes work as the interactions between gpu and host memory, therefore they have to meet more requirements. We will propose the memory layout requirements for FP16 in this chapter.
6.1. Alignment and Size
The following table should be merged into WGSL Spec table Alignment and size for host-shareable types
.
WGSL Type | Alignment (bytes) | Size (bytes) |
f16
|
2 | 2 |
vec2<f16>
|
4 | 4 |
vec3<f16>
|
8 | 6 |
vec4<f16>
|
8 | 8 |
mat2x2<f16>
|
4 | 8 |
mat2x3<f16>
|
8 | 16 |
mat2x4<f16>
|
8 | 16 |
mat3x2<f16>
|
4 | 12 |
mat3x3<f16>
|
8 | 24 |
mat3x4<f16>
|
8 | 24 |
mat4x2<f16>
|
4 | 16 |
mat4x3<f16>
|
8 | 32 |
mat4x4<f16>
|
8 | 32 |
These values match the MSL and Vulkan Spec. If the size or alignment of structure members is not explicitly defined in WGSL code, the listed value should be used. If explicitly defined, they should satisfy the layout constraints for corresponding storage classes shown below.
Note
The Internal Layout of Values in WGSL Spec should be also updated.
6.2. Using FP16 types in uniform
storage class
When using FP16 types in uniform
storage class, besides the alignment requirement above should be met, the extra buffer layout constraints for uniform is also required, as is shown below. These requirements are identical to the current WGSL spec.
Host-shareable type S | RequiredAlignOf(S, uniform )
|
i32 , u32 , f32 , or f16
|
AlignOf(S) |
atomic<T> , T is i32 , u32 , or f32
|
AlignOf(S) |
vecN<T>
|
AlignOf(S) |
matNxM<T> , T is f32 or f16
|
AlignOf(S) |
array<T, N>
|
round(16, AlignOf(S)) |
array<T>
|
round(16, AlignOf(S)) |
struct S
|
round(16, AlignOf(S)) |
The matrix matNxM<f16>
types should be stored as N distinct vecM<f16>
(not an array of vectors), and constructed back to a matrix by generated code. In this way we workaround the matrix stride requirements in different backends.
We can do emulation for Vulkan devices with no native support for this usage (uniformAndStorageBuffer16BitAccess
) by packing FP16 into UINT32 in the backend buffer structure and do the unpacking in the backend when loading in WGSL.
6.3. Using FP16 types in storage
storage class
When using FP16 types in the storage
storage class, the buffer layout constraints for storage are validated, as is shown below.
Host-shareable type S | RequiredAlignOf(S, storage )
|
i32 , u32 , f32 , or f16
|
AlignOf(S) |
atomic<T> , T is i32 , u32 , or f32
|
AlignOf(S) |
vecN<T>
|
AlignOf(S) |
matNxM<T> , T is f32 or f16
|
AlignOf(S) |
array<T, N>
|
AlignOf(S) |
array<T>
|
AlignOf(S) |
struct S
|
AlignOf(S) |
In fact this is just requesting everything aligned to its type's alignment.
7. Using FP16 as pipeline input and output
It is possible to use FP16 in user-defined pipeline input/output variables. However some Vulkan devices, which support other usages in this extension like FP16 built-in function and storage buffer, don’t support this usage. For capabilities reasons, we may suggest not enabling using FP16 as pipeline input or output with this extension. However, this usage can also be emulated on Vulkan devices with no native support for this usage (storageInputOutput16
).
7.1. Built-in input/output variable
Built-in floating-point pipeline input and output variables (e.g. position
and frag_depth
) should be kept as f32
.
7.2. User-defined input/output variable
Using user-defined variables of FP16 types as shader input and output (and intra-shader variables) is partially supported by backends. MSL, HLSL (if supporting native 16-bit mode) and GLSL (with AMD_gpu_shader_half_float
extension) supports this usage, but Vulkan devices diverged. E.g., Nvidia Vulkan Windows drivers, Intel Linux Mesa drivers, and Google Pixel (before Pixel 6) all don’t support the required Vulkan feature storageInputOutput16
. It is possible to emulate using FP16 types as intra-shader variables for devices not supporting storageInputOutput16
.
8. Capability
Different vendors seem to have different support for FP16 related features, and different backend drivers of a vendor also have different support competence.
Detailed investigation on devices’ support are shown in the proposed design doc.