Add the code for `vello_api` #827

LaurenzV · 2025-03-03T09:57:15Z

It`s possible I later on realize there is something else missing, but I think for now this should cover most of what is needed for that crate, based on my current version. One thing that will need to be added later on is the render context trait.

tomcur

These types make sense as a first merge to get some code in, but I'd expect most of these types to go into sparse_strips/vello_common. I've added two notes inline, based on the assumption that sparse_strips/vello_api should only contain types intended to be used by consumers of the renderer.

sparse_strips/vello_api/src/tile.rs

sparse_strips/vello_api/src/flatten.rs

tomcur · 2025-03-03T15:34:31Z

sparse_strips/vello_api/src/execute.rs

+#[derive(Copy, Clone, Debug)]
+/// The execution mode used for the rendering process.
+pub enum ExecutionMode {
+    /// Only use scalar execution. This is recommended if you want to have
+    /// consistent results across different platforms and want to avoid unsafe code,
+    /// and is the only option if you disabled the `simd` feature. Performance will be
+    /// worse, though.
+    Scalar,
+    /// Select the best execution mode according to what is available on the host system.
+    /// This is the recommended option for highest performance.
+    #[cfg(feature = "simd")]
+    Auto,
+    /// Force the usage of neon SIMD instructions. This will lead to panics in case
+    /// the CPU doesn't support the target feature `neon`.
+    #[cfg(all(target_arch = "aarch64", feature = "simd"))]
+    Neon,
+    /// Force the usage of AVX2 SIMD instructions. This will lead to panics in case
+    /// the CPU doesn't support the target features `avx2` and `fma`.
+    #[cfg(all(target_arch = "x86_64", feature = "simd"))]
+    Avx2,
+}


One alternative would be to lift this to the type-level to allow for free branching due to monomorphization, but perhaps that should be an implementation detail and not be exposed on the API. On the other hand, if we ever get something like struct target features, the consumer could guide autovectorization through monomorphization.

That said, I'd be happy to land this as-is and revisit when we have a clearer picture.

If I understand you correctly that's what already happens in my code (https://github.com/LaurenzV/cpu-sparse-experiments/blob/main/crates/sparse_primitives/src/lib.rs), the main reason for having it as an enum in addition is that you can select a mode at runtime, which is a nice feature to have IMO. But yes, I think those details can be discussed at a later point. :D

tomcur

One bikeshed regarding the peniko import.

sparse_strips/vello_api/src/lib.rs

Co-authored-by: Tom Churchman <thomas@kepow.org>

tomcur

LGTM, the exact module structure may be something to be revisited when we have a clearer picture, but perhaps concrete renderer implementations just re-export the types they need.

grebmeg · 2025-03-03T23:45:49Z

Cargo.toml

@@ -86,6 +86,7 @@ clippy.wildcard_dependencies = "warn"

 [workspace.dependencies]
 vello = { version = "0.4.0", path = "vello" }
+vello_api = { path = "sparse_strips/vello_api" }


[nit]: Maybe I’m wrong, but I see this change as a sign that vello_api is now considered stable or at least working well enough to be fully integrated into the workspace, rather than remaining a separate experimental component. It suggests that vello_api is now a fundamental part of the project and can be used consistently across different crates.

Hmm, I wouldn't see it this way, the only reason I added it here is so that everywhere else we use it we can just do vello_api = {workspace = true} to avoid duplicating the path to the crate. I think everyone who works with this repo is aware that this is in-progress and experimental, so I wouldn't worry about it too much. Especially since we aren't gonna publish it.

I don't think adding a crate to the workspace signals readiness, but an argument could be made for ordering the sparse strip crates after the main Vello crates. A note could then be added stating they're the experimental sparse strip crates.

grebmeg · 2025-03-04T00:23:32Z

sparse_strips/vello_api/src/paint.rs

+// a `Pattern` type.
+/// A paint used for filling or stroking paths.
+#[derive(Debug, Clone)]
+pub enum Paint {


Can this be used only internally, or is it also available for external consumers?

My initial thought about vello_api was that it would act as an entry point for external usage — kind of like how wgpu is structured. As I understand it, wgpu-api itself is implemented under the wgpu crate, but the real core is in wgpu-core, which then uses wgpu-hal. wgpu-hal abstracts over different graphics APIs with backends for Vulkan, Metal, Direct3D, and OpenGL.

I was imagining something similar here — where consumers of vello_api could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

With that in mind, I think it’d be great to add an example showcasing how to use it later. Also, adding some common types and structures to vello_common for now seems like a safe move — we can always upstream them into vello_api if they prove useful for consumers.

That said, at this stage, vello_api doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.

Curious to hear your thoughts — maybe we can chat about this during Office Hours?

For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes

For the web target, I believe it would be better to introduce a build-time feature, such as cpu, hybrid, etc., to prevent unnecessary increases in bundle size. This is something we’ll need to work out.

I was imagining something similar here — where consumers of vello_api could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

I think that will already be possible with the RenderContext trait, no? Since both renderers will implement that trait you can dynamically choose whether you want to use the CPU-only version or the hybrid version.

That said, at this stage, vello_api doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.

Yeah as I mentioned, to me it's still not 100% clear what the purpose of vello_api will be (I've never used wgpu, so unfortunately I can't really comment on that either^^), but I think we can figure it out as we go.

Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes

So, for vello_cpu I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on. With that said, to address your concerns:

I would be open to adding additional feature flags so that you can disable specific instruction sets you don't want to use at build time, if you think that's better. Then you can have the best of both worlds.

My understanding was that for vello_hybrid you are mostly targeting WASM? In which case this wouldn't be relevant for you anyway, because the other SIMD kernels will only get compiled for x86/aarch64. Or am I missing something?

Since in vello_common all of the SIMD stuff is behind feature flags, it is possible to build vello_cpu with runtime detection in mind, while vello_hybrid can combine the code in a way so that everything happens at build time, so I don't think there should be any big issues. :)

So, for vello_cpu I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on.

By lifting the CPU feature sets to the type-level the compiler is able to remove unused specializations after monomorphization, also for explicit branches like checking for TypeId::of. That could help reduce the burden of feature flag sprinkling a bit, by not requiring every individual specialization to be feature flagged.

I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?

Certainly the SIMD stuff seems a bit out-of-place in this crate.

There seemingly were some different ideas on what exactly vello_api should be. If it's types that are public and common to both vello_cpu and vello_hybrid, then this makes sense, but if vello_api does e.g. dynamic dispatch to CPU or GPU, then the relationship flips, and all of this should indeed be in vello_common.

In my mind, the crate which does the dynamic dispatch would be Vello, probably.
Vello would probably only use the CPU mode if:

You ask it to directly; OR

The adapter available is detected to be a CPU fallback and you don't ask to use it anyway; OR

You don't provide a GPU (which is equivalent to 1, really). That is, if there's no wgpu adapter available, I guess

In my mind, the crate which does the dynamic dispatch would be Vello, probably.

How would that work if you use vello_cpu directly, though? I was under the assumption that those crates can be used independently, no? Or do you think in this case it's the responsibility of the user to do the dispatching themselves?

I agree with @DJMcNab on deferring the use of vello_api.

My baseline assumption is that Vello API would be the equivalent of Piet, and is something we've decided to defer entirely:

It looks like there are different understandings of that crate. How about we discuss it later — either in Zulip or during Office Hours? For now, what do you think about using vello_common instead of vello_api?

grebmeg · 2025-03-04T00:42:20Z

sparse_strips/vello_api/src/execute.rs

+    /// Select the best execution mode according to what is available on the host system.
+    /// This is the recommended option for highest performance.
+    #[cfg(feature = "simd")]
+    Auto,


Do I understand correctly that WebAssembly has its own wasm32-simd128 instruction set (currently in beta), and that this instruction set is not currently supported in the current implementation?

Yes, I currently did not write any SIMD code for that, but it's definitely planned. But it probably won't be a focus of my thesis (unless I implement everything else and still have time).

Can you clarify which part of this is in beta? caniuse indicates that it's baseline, and the instructions have a stabilisation Rustc version of 1.54. E.g. v128.

I don’t think it’s beta, at least tiny-skia uses WASM intrinsics and works fine on stable, AFAIK?

Oh, yeah, sorry! I shouldn’t have written “beta”. What I meant is that there’s still a large group of users on Safari < 16.4, which don’t support WASM SIMD.

DJMcNab · 2025-03-04T16:51:20Z

sparse_strips/vello_api/src/execute.rs

+
+#[derive(Copy, Clone, Debug)]
+/// The execution mode used for the rendering process.
+pub enum ExecutionMode {


This definitely needs to be non_exhaustive

DJMcNab · 2025-03-04T16:52:02Z

sparse_strips/vello_api/src/execute.rs

+    Scalar,
+    /// Select the best execution mode according to what is available on the host system.
+    /// This is the recommended option for highest performance.
+    #[cfg(feature = "simd")]


I don't think this should be feature flagged; we should just always have this be the default

DJMcNab · 2025-03-04T16:52:27Z

sparse_strips/vello_api/src/execute.rs

+#[cfg(feature = "simd")]
+impl Default for ExecutionMode {
+    fn default() -> Self {
+        Self::Auto
+    }
+}
+
+#[cfg(not(feature = "simd"))]
+impl Default for ExecutionMode {
+    fn default() -> Self {
+        Self::Scalar
+    }
+}


I'm 90% sure that these could use the #[default] attribute with cfg_attr

DJMcNab · 2025-03-04T16:54:19Z

sparse_strips/vello_api/src/paint.rs

+// a `Pattern` type.
+/// A paint used for filling or stroking paths.
+#[derive(Debug, Clone)]
+pub enum Paint {


I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?

Certainly the SIMD stuff seems a bit out-of-place in this crate.

LaurenzV added 9 commits March 3, 2025 10:12

1

d780199

2

e494e87

3

b6d30c7

4

f451d82

5

f139609

Make clippy happz

ac870c2

Add vello_api to workspace dependencies

d523737

Fix missing docs

91e0d02

Add copyright headers

e9e8dfc

tomcur reviewed Mar 3, 2025

View reviewed changes

sparse_strips/vello_api/src/tile.rs Outdated Show resolved Hide resolved

sparse_strips/vello_api/src/flatten.rs Outdated Show resolved Hide resolved

Remove flatten, tiling and strip

6c2356c

tomcur approved these changes Mar 3, 2025

View reviewed changes

tomcur requested changes Mar 3, 2025

View reviewed changes

sparse_strips/vello_api/src/lib.rs Outdated Show resolved Hide resolved

Update sparse_strips/vello_api/src/lib.rs

ee4f55e

Co-authored-by: Tom Churchman <thomas@kepow.org>

tomcur approved these changes Mar 3, 2025

View reviewed changes

LaurenzV added this pull request to the merge queue Mar 3, 2025

Merged via the queue into main with commit 730b4f1 Mar 3, 2025
17 checks passed

LaurenzV deleted the cpu_sparse branch March 3, 2025 16:14

grebmeg reviewed Mar 4, 2025

View reviewed changes

DJMcNab reviewed Mar 4, 2025

View reviewed changes

Add the code for vello_api #827

Add the code for vello_api #827

Uh oh!

Conversation

LaurenzV commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomcur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomcur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomcur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DJMcNab Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LaurenzV Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Add the code for `vello_api` #827

Add the code for `vello_api` #827

LaurenzV commented Mar 3, 2025 •

edited

Loading

DJMcNab Mar 4, 2025 •

edited

Loading

LaurenzV Mar 4, 2025 •

edited

Loading