这是indexloc提供的服务,不要输入任何密码
Skip to content

Create timestamp-query2 for tiled architectures #4995

@greggman

Description

@greggman

The 'timestamp-query' query feature seems unportable (unless it's just a bug)

For example, this example does 9 render passes. Each pass is timed with timestamps. Each pass is drawing a bunch of cube instances with no depth test, no culling, and blending on so effectively, every pixel is required to be rendered. The number of instances per pass changes over time from 1 to 3000.

The expectation is that less instances take less time, more instances take more time and on Windows 11 with NVidia that's what I see

0:   3015µs numInst:  1646
1:    655µs numInst:   323
2:     66µs numInst:    82
3:   1376µs numInst:  1146
4:   3146µs numInst:  2535
5:   3867µs numInst:  2973
6:   3736µs numInst:  2057
7:   1376µs numInst:   629
8:     66µs numInst:     2

But on M1 Mac that's not what I see. Instead, the last pass takes the most time

0:   1835µs numInst:  1689
1:   2294µs numInst:   350
2:   2294µs numInst:    69
3:   3211µs numInst:  1104
4:   5439µs numInst:  2503
5:   8258µs numInst:  2981
6:  10420µs numInst:  2097
7:  11207µs numInst:   665
8:  11010µs numInst:     1

I'm guessing the issue is that on M1 (and other tiled hardware?), the passes are being combined into one pass so the timing of each pass has the same start time but different end times ... or something along those lines.

This makes 'timestamp-query' on these devices problematic. You need to somehow guess what kind of hardware you're on and then adjust your code appropriately. Should 'timestamp-query' work the same everywhere it's available? Assuming that it can not work the same everywhere, should it not be available on those devices where it behaves like this? And, if possible a different 'timestamp-query-xxx' feature be specified for these devices? That way, at least the developer can know how to use the results?

A confounding issue is compute. At least in my tests, 'timestamp-query' with compute passes behaves as expected on these devices (a pass doing more work takes more time), example. So maybe we'd need to split features into 'timestamp-query-compute' and 'timestamp-query-renderpass-combined' for these devices that don't support 'timestamp-query' in render passes as separate timings?

In other words, devices where each compute and render pass gets its own timestamps advertize 'timestamp-query'. devices were render passes are combined and compute passes are not advertise 'timestamp-query-renderpass-combined' and 'timestamp-query-compute'

or maybe there are other solutions.

I think some might suggest two other solutions

  • ignore this issue - timing isn't important

  • Update the specs to say how it works is even more undefined.

    note: the spec already says the values are implementation-defined. maybe that's enough. It seems strange to have such a feature though. If it's going to stay that way, maybe it should be moved to a developer only feature as it's not portable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiWebGPU API

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions