这是indexloc提供的服务,不要输入任何密码
Skip to content

Memory barriers investigations #27

@kvark

Description

@kvark

Memory barrier is an abstraction provided to the graphics API user that allows controlling the internal mutable state of otherwise immutable objects. Such states are device/driver dependent and may include:

  • cache flushes
  • memory layout/access changes
  • compression states

Two failure cases (from AMD GDC 2016 presentation):

  • too many or too broad: bad performance
  • missing barriers: corruptions (*)

General information

Metal

Memory barriers are inserted automatically by the runtime/driver.

Direct3D 12

Quote from MSDN:

In Direct3D 11, drivers were required to track this state in the background. This is expensive from a CPU perspective and significantly complicates any sort of multi-threaded design.

Direct3D has 3 kinds of barriers:

  1. State barrier: to tell that a resource needs to transition into a different state.
  2. Alias barrier: to tell that one alias of a resource is going to be used instead of another.
  3. UAV barrier: to wait for all operations on an UAV to finish before another operation on this UAV.

Resource states

A (sub-)resource can be either in a single read-write state, or in a combination of read-only states. Read-write states are:

  • D3D12_RESOURCE_STATE_RENDER_TARGET
  • D3D12_RESOURCE_STATE_STREAM_OUT
  • D3D12_RESOURCE_STATE_COPY_DEST
  • D3D12_RESOURCE_STATE_UNORDERED_ACCESS

For presentation, a resource must be in D3D12_RESOURCE_STATE_PRESENT state, which is equal to D3D12_RESOURCE_STATE_COMMON.

There are special rules for resource state promotion from the COMMON state and decay into COMMON. These transitions are implicit and specified to incur no GPU cost.

The barrier can span over multiple draw calls:

Split barriers provide hints to the GPU that a resource in state A will next be used in state B sometime later. This gives the GPU the option to optimize the transition workload, possibly reducing or eliminating execution stalls.

Vulkan

Typical synchonization use-cases

Pipeline barriers

Vulkan as a lot of knobs to configure the barriers in a finest detail. For example, user provides separate masks for source and target pipeline stages. By spreading out the source and target barriers, we can give GPU/driver more time to do the actual transition and minimize the stalls.

There are 3 types of barriers:

  1. Global memory barrier: specifies access flags for all memory objects that exist at the time of its execution.
  2. Buffer memory barrier: similar to a global barrier, but limited to a specified sub-range of buffer memory.
  3. Image memory barrier: similar to a global barrier, but limited to a sub-range of image memory. In addition to changing the access flags, image barrier also includes the transition between image layouts.

Similarities with D3D12:

  • explicit barriers
  • both source and destination layout/states are requested, i.e. the driver doesn't track the current layout and expects/trusts the user to insert optimal barriers/transitions
  • image sub-resources carry independent layouts that can be changed individually or in bulk

Vulkan can transition to any layout if the current contents are discarded.

Note: barriers also allow resource transitions between queue families.

Implicit barriers

Barriers are inserted automatically between sub-passes of a render pass, based on the follow information:

  • initial and final layouts provided for each attachment
  • a layout provided for each attachment for each sub-pass
  • set of sub-pass dependencies, each specifying what parts of what destination sub-pass stages depend on some results of some stages of a source sub-pass

Vulkan implementation also automatically inserts layout transitions for read-only layouts of a resource used in multiple sub-passes.

Events

Vulkan event is a synchronization primitive that can be used to define memory dependencies within a command queue. Arguments of vkCmdWaitEvents are almost identical to vkCmdPipelineBarrier. The difference is an ability to move the start of transition earlier in the queue, similarly in concept to D3D12 split barriers.

Analysis

Tips for best performance (for AMD):

  • combine transitions
  • use the most specific state, but also - combine states
  • give driver time to handle the transition
    • D3D12: split barriers
    • Vulkan: vkCmdSetEvent + vkCmdWaitEvents

Nitrous engine (Oxide Games, GDC 2017 presentation slide 36) approach:

  • engine is auto-tracking the current state, the user requests new state only
  • extended (from D3D12) resource state range that maps to Vulkan barriers

Overall, in terms of flexibility/configuration, Vulkan barriers >> D3D12 barriers >> Metal. Developers seem to prefer D3D12 style (TODO: confirm with more developers!).

Translation between APIs

Metal API running on D3D12/Vulkan

We'd have to replicate the analysis already done by D3D11 and Metal drivers, but without a low-level access to the command buffer structure.

D3D12/Vulkan API running on Metal

All barriers become no-ops.

D3D12 API running on Vulkan

Given that D3D12 appears to have a smaller API surface and stricter set of allowed resources states (e.g. no multiple read/write states allowed), it seems possible to emulate (conservatively) D3D12 states on top of Vulkan. Prototyping would probably help here to narrow down the fine details.

Vulkan API on D3D12

Largely comes down to the following aspects:

  • ignoring the given pipeline stages
  • translating (image layout, access mask) -> D3D12 resource state
  • vkCmdWaitEvents should be possible to translate to a D3D12 split barrier, but more experiments are needed to confirm

Security/corruption issues

We've done some research with IHVs on how the hardware behaves when the resources are used in the case of a mismatched resource layout/state. E.g. an operation expects image to be in a shader-readable state, while the image is not.

The conclusion we got is that in most situations this workload will end up in either a GPU page fault (crash), or visual corruption with user data. It's relatively straightforward for Vulkan to add an extension, and for IHVs to implement it, that would guarantee security of such mismatched layout access. The extension would be defined similarly to robustBufferAccess and specify the exact behavior of the hardware and the lack of access to non-initialized memory not owned by the current instance.

Automation versus Validation

Inserting optimal Vulkan/D3D12 barriers at the right times appears to be a complex task, especially when taking multiple independent queues into consideration. It requires knowledge ahead of time on how a resource is going to be used in the future, and thus would need us to defer actual command buffer recording until we get more data on how resources are used. This would add more CPU overhead to command recording.

Simply validating that current transitions are sufficient appears to be more feasible, since it doesn't require patching command buffers and that logic can be moved completely into the validation layer.

Concrete proposals

TODO

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions