这是indexloc提供的服务,不要输入任何密码
Skip to content

Add the code for vello_api #827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 3, 2025
Merged

Add the code for vello_api #827

merged 11 commits into from
Mar 3, 2025

Conversation

LaurenzV
Copy link
Contributor

@LaurenzV LaurenzV commented Mar 3, 2025

It`s possible I later on realize there is something else missing, but I think for now this should cover most of what is needed for that crate, based on my current version. One thing that will need to be added later on is the render context trait.

Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types make sense as a first merge to get some code in, but I'd expect most of these types to go into sparse_strips/vello_common. I've added two notes inline, based on the assumption that sparse_strips/vello_api should only contain types intended to be used by consumers of the renderer.

Comment on lines +6 to +26
#[derive(Copy, Clone, Debug)]
/// The execution mode used for the rendering process.
pub enum ExecutionMode {
/// Only use scalar execution. This is recommended if you want to have
/// consistent results across different platforms and want to avoid unsafe code,
/// and is the only option if you disabled the `simd` feature. Performance will be
/// worse, though.
Scalar,
/// Select the best execution mode according to what is available on the host system.
/// This is the recommended option for highest performance.
#[cfg(feature = "simd")]
Auto,
/// Force the usage of neon SIMD instructions. This will lead to panics in case
/// the CPU doesn't support the target feature `neon`.
#[cfg(all(target_arch = "aarch64", feature = "simd"))]
Neon,
/// Force the usage of AVX2 SIMD instructions. This will lead to panics in case
/// the CPU doesn't support the target features `avx2` and `fma`.
#[cfg(all(target_arch = "x86_64", feature = "simd"))]
Avx2,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative would be to lift this to the type-level to allow for free branching due to monomorphization, but perhaps that should be an implementation detail and not be exposed on the API. On the other hand, if we ever get something like struct target features, the consumer could guide autovectorization through monomorphization.

That said, I'd be happy to land this as-is and revisit when we have a clearer picture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly that's what already happens in my code (https://github.com/LaurenzV/cpu-sparse-experiments/blob/main/crates/sparse_primitives/src/lib.rs), the main reason for having it as an enum in addition is that you can select a mode at runtime, which is a nice feature to have IMO. But yes, I think those details can be discussed at a later point. :D

Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bikeshed regarding the peniko import.

Co-authored-by: Tom Churchman <thomas@kepow.org>
Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the exact module structure may be something to be revisited when we have a clearer picture, but perhaps concrete renderer implementations just re-export the types they need.

@LaurenzV LaurenzV added this pull request to the merge queue Mar 3, 2025
Merged via the queue into main with commit 730b4f1 Mar 3, 2025
17 checks passed
@LaurenzV LaurenzV deleted the cpu_sparse branch March 3, 2025 16:14
@@ -86,6 +86,7 @@ clippy.wildcard_dependencies = "warn"

[workspace.dependencies]
vello = { version = "0.4.0", path = "vello" }
vello_api = { path = "sparse_strips/vello_api" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]: Maybe I’m wrong, but I see this change as a sign that vello_api is now considered stable or at least working well enough to be fully integrated into the workspace, rather than remaining a separate experimental component. It suggests that vello_api is now a fundamental part of the project and can be used consistently across different crates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wouldn't see it this way, the only reason I added it here is so that everywhere else we use it we can just do vello_api = {workspace = true} to avoid duplicating the path to the crate. I think everyone who works with this repo is aware that this is in-progress and experimental, so I wouldn't worry about it too much. Especially since we aren't gonna publish it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding a crate to the workspace signals readiness, but an argument could be made for ordering the sparse strip crates after the main Vello crates. A note could then be added stating they're the experimental sparse strip crates.

// a `Pattern` type.
/// A paint used for filling or stroking paths.
#[derive(Debug, Clone)]
pub enum Paint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be used only internally, or is it also available for external consumers?

My initial thought about vello_api was that it would act as an entry point for external usage — kind of like how wgpu is structured. As I understand it, wgpu-api itself is implemented under the wgpu crate, but the real core is in wgpu-core, which then uses wgpu-hal. wgpu-hal abstracts over different graphics APIs with backends for Vulkan, Metal, Direct3D, and OpenGL.

I was imagining something similar here — where consumers of vello_api could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

With that in mind, I think it’d be great to add an example showcasing how to use it later. Also, adding some common types and structures to vello_common for now seems like a safe move — we can always upstream them into vello_api if they prove useful for consumers.

That said, at this stage, vello_api doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.

Curious to hear your thoughts — maybe we can chat about this during Office Hours?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the web target, I believe it would be better to introduce a build-time feature, such as cpu, hybrid, etc., to prevent unnecessary increases in bundle size. This is something we’ll need to work out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining something similar here — where consumers of vello_api could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.

I think that will already be possible with the RenderContext trait, no? Since both renderers will implement that trait you can dynamically choose whether you want to use the CPU-only version or the hybrid version.

That said, at this stage, vello_api doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.

Yeah as I mentioned, to me it's still not 100% clear what the purpose of vello_api will be (I've never used wgpu, so unfortunately I can't really comment on that either^^), but I think we can figure it out as we go.

Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes

So, for vello_cpu I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on. With that said, to address your concerns:

  1. I would be open to adding additional feature flags so that you can disable specific instruction sets you don't want to use at build time, if you think that's better. Then you can have the best of both worlds.
  2. My understanding was that for vello_hybrid you are mostly targeting WASM? In which case this wouldn't be relevant for you anyway, because the other SIMD kernels will only get compiled for x86/aarch64. Or am I missing something?
  3. Since in vello_common all of the SIMD stuff is behind feature flags, it is possible to build vello_cpu with runtime detection in mind, while vello_hybrid can combine the code in a way so that everything happens at build time, so I don't think there should be any big issues. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for vello_cpu I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on.

By lifting the CPU feature sets to the type-level the compiler is able to remove unused specializations after monomorphization, also for explicit branches like checking for TypeId::of. That could help reduce the burden of feature flag sprinkling a bit, by not requiring every individual specialization to be feature flagged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?

Certainly the SIMD stuff seems a bit out-of-place in this crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seemingly were some different ideas on what exactly vello_api should be. If it's types that are public and common to both vello_cpu and vello_hybrid, then this makes sense, but if vello_api does e.g. dynamic dispatch to CPU or GPU, then the relationship flips, and all of this should indeed be in vello_common.

Copy link
Member

@DJMcNab DJMcNab Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the crate which does the dynamic dispatch would be Vello, probably.
Vello would probably only use the CPU mode if:

  1. You ask it to directly; OR
  2. The adapter available is detected to be a CPU fallback and you don't ask to use it anyway; OR
  3. You don't provide a GPU (which is equivalent to 1, really). That is, if there's no wgpu adapter available, I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the crate which does the dynamic dispatch would be Vello, probably.

How would that work if you use vello_cpu directly, though? I was under the assumption that those crates can be used independently, no? Or do you think in this case it's the responsibility of the user to do the dispatching themselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @DJMcNab on deferring the use of vello_api.

My baseline assumption is that Vello API would be the equivalent of Piet, and is something we've decided to defer entirely:

It looks like there are different understandings of that crate. How about we discuss it later — either in Zulip or during Office Hours? For now, what do you think about using vello_common instead of vello_api?

/// Select the best execution mode according to what is available on the host system.
/// This is the recommended option for highest performance.
#[cfg(feature = "simd")]
Auto,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that WebAssembly has its own wasm32-simd128 instruction set (currently in beta), and that this instruction set is not currently supported in the current implementation?

Copy link
Contributor Author

@LaurenzV LaurenzV Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I currently did not write any SIMD code for that, but it's definitely planned. But it probably won't be a focus of my thesis (unless I implement everything else and still have time).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify which part of this is in beta? caniuse indicates that it's baseline, and the instructions have a stabilisation Rustc version of 1.54. E.g. v128.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think it’s beta, at least tiny-skia uses WASM intrinsics and works fine on stable, AFAIK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, sorry! I shouldn’t have written “beta”. What I meant is that there’s still a large group of users on Safari < 16.4, which don’t support WASM SIMD.


#[derive(Copy, Clone, Debug)]
/// The execution mode used for the rendering process.
pub enum ExecutionMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs to be non_exhaustive

Scalar,
/// Select the best execution mode according to what is available on the host system.
/// This is the recommended option for highest performance.
#[cfg(feature = "simd")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be feature flagged; we should just always have this be the default

Comment on lines +28 to +40
#[cfg(feature = "simd")]
impl Default for ExecutionMode {
fn default() -> Self {
Self::Auto
}
}

#[cfg(not(feature = "simd"))]
impl Default for ExecutionMode {
fn default() -> Self {
Self::Scalar
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 90% sure that these could use the #[default] attribute with cfg_attr

// a `Pattern` type.
/// A paint used for filling or stroking paths.
#[derive(Debug, Clone)]
pub enum Paint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?

Certainly the SIMD stuff seems a bit out-of-place in this crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants