这是indexloc提供的服务,不要输入任何密码
Skip to content

Investigation: Querying Subgroup Support #78

@mehmetoguzderin

Description

@mehmetoguzderin

Better performance with divergent kernels and more localized data sharing is a strength of modern SIMT hardware, making massively parallel algorithm considerations viable for more fields than before. The usage of this level of thread grouping was mostly implicit, but with new iterations of GPGPU APIs, explicit control has been introduced. This handling, either implicit or explicit, is accomplished by the grouping of threads which are called by various names like waves, wavefronts, warps by hardware architecture designers but this article will refer to this structure as subgroups. This grouping mostly takes place at the hardware level, giving an opportunity to coordinate data and execution between a specific amount of threads faster than shared and global coordination. Explicit handling of this fourth grouping level (system, device global, device local, device subgroup) gives algorithm designers a chance to fit their algorithms to more tight runtime budgets by using deeper understanding and pre-consideration of given problem compared to implicit handling, which makes this fit with aims of modern GPGPU APIs.

Subgroup implementations have different capabilities. But it is possible to categorize these capabilities in a way that creates overlaps and let the developer decide the best way to proceed with given hardware by having a query interface for subgroup capabilities. This article presents query interfaces of different APIs, a proposed query pseudointerface and how each API maps to it.

Existing Query Interfaces

DirectX 12 Feature Level 12.0

ID3D12Device has member function CheckFeatureSupport which provides a D3D12_FEATURE_DATA_D3D12_OPTIONS1 structure when D3D12_FEATURE Feature argument is set to D3D12_FEATURE_D3D12_OPTIONS1 0 1. D3D12_FEATURE_DATA_D3D12_OPTIONS1 structure has a BOOL WaveOps member that is set to true if the device supports all subgroup operations (this statement needs clarification and/or confirmation) and subgroup size range is also provided as UINT WaveLaneCountMin and UINT WaveLaneCountMax members which are the inclusive minimum and maximum of the range 2. Subgroup operations including quad operations are only supported at the fragment and compute stages 3.

HRESULT CheckFeatureSupport(
  D3D12_FEATURE Feature,
  void          *pFeatureSupportData,
  UINT          FeatureSupportDataSize
);
typedef struct D3D12_FEATURE_DATA_D3D12_OPTIONS1 {
  BOOL WaveOps;
  UINT WaveLaneCountMin;
  UINT WaveLaneCountMax;
  UINT TotalLaneCount;
  BOOL ExpandedComputeResourceStates;
  BOOL Int64ShaderOps;
};

Metal 2.1

MTLDevice has supportsFeatureSet member function that takes in a feature set and returns true if it is supported 4 5. Which feature set supports which subgroup operations are provided with tables to be directly derived from supported feature set6 and documentation to be indirectly derived from the supported feature set by extracting OS information from feature set 7. A temporary MTLComputePipelineState needs to be created in order to get subgroup size range, which is provided with threadExecutionWidth as the same value for inclusive minimum and maximum of range 8 9. Subgroup operations including quad operations are only supported at the fragment and compute stages.

func supportsFeatureSet(_ featureSet: MTLFeatureSet) -> Bool
var threadExecutionWidth: Int { get }

Vulkan 1.1

VkPhysicalDevice has vkGetPhysicalDeviceProperties2 which provides VkPhysicalDeviceProperties2 structure which can contain a pointer to VkPhysicalDeviceSubgroupProperties structure in its void* pNext member’s pointer chain if implementation details are provided 10 11 12. If implementation details are not provided, subgroup operations are not supported (this statement needs clarification and/or confirmation) 13. Supported subgroup operations are stated using VkSubgroupFeatureFlags supportedOperations member of VkPhysicalDeviceSubgroupProperties. Subgroup size range is provided with uint32_t subgroupSize which is the same value for inclusive minimum and maximum of the range. Subgroup operation support in specific stages are exposed using VkShaderStageFlags supportedStages with an extra VkBool32 quadOperationsInAllStages member which limits support for quad operations only to fragment and compute stages if set to false.

void vkGetPhysicalDeviceProperties2(
    VkPhysicalDevice                            physicalDevice,
    VkPhysicalDeviceProperties2*                pProperties);
typedef struct VkPhysicalDeviceProperties2 {
    VkStructureType               sType;
    void*                         pNext;
    VkPhysicalDeviceProperties    properties;
} VkPhysicalDeviceProperties2;
typedef struct VkPhysicalDeviceSubgroupProperties {
    VkStructureType           sType;
    void*                     pNext;
    uint32_t                  subgroupSize;
    VkShaderStageFlags        supportedStages;
    VkSubgroupFeatureFlags    supportedOperations;
    VkBool32                  quadOperationsInAllStages;
} VkPhysicalDeviceSubgroupProperties;

Proposed Pseudointerface

The proposed pseudointerface of this article resonates more with Vulkan API but adds a member to convert subgroupSize to an inclusive range. It would be better to bake this structure into WebGPUDevice at WebGPU API initialization since some backends require temporary object creation and deletion to get the required information which can be detrimental to overall runtime if repeated at every query 13. All APIs listed above are mappable to this interface using only documentations and tables. API versions that have no query or implementation support for subgroups can be detected, and support flags can be set to false.

typedef [EnforceRange] unsigned long GPUSubgroupFeatureFlags;
interface GPUSubgroupFeatureFlags {
    const GPUSubgroupFeatureFlags BASIC      = 0x1;
    const GPUSubgroupFeatureFlags VOTE       = 0x2;
    const GPUSubgroupFeatureFlags BALLOT     = 0x4;
    const GPUSubgroupFeatureFlags ARITHMETIC = 0x8;
    const GPUSubgroupFeatureFlags QUAD       = 0xF;
};
interface GPUSubgroupProperties {
    readonly GPUSize32 subgroupSizeInclusiveMinimum;
    readonly GPUSize32 subgroupSizeInclusiveMaximum;
    readonly GPUShaderStageFlags supportedStages;
    readonly GPUSubgroupFeatureFlags supportedOperations;
    readonly boolean quadOperationsInAllStages;
}

Coarse Analysis of Device Support

DirectX 12 Feature Level 12.0 Devices

Not able to find a database with subgroup capabilities information of devices.

Metal 2.1 Devices

MacOS hardware already has extensive support for subgroup features. Although iOS hardware just got limited support for subgroup features, it is enough to improve the performance of applications 7.

Vulkan 1.1 Devices

Desktop Vulkan hardware already has extensive support for subgroup features. Mobile Vulkan hardware does not have mainstream Vulkan 1.1 support at the moment, but beta information indicates that some part of subgroup features will be exposed 14.

WHLSL Support

See 15.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions