-
Notifications
You must be signed in to change notification settings - Fork 169
Add the code for vello_api
#827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types make sense as a first merge to get some code in, but I'd expect most of these types to go into sparse_strips/vello_common
. I've added two notes inline, based on the assumption that sparse_strips/vello_api
should only contain types intended to be used by consumers of the renderer.
#[derive(Copy, Clone, Debug)] | ||
/// The execution mode used for the rendering process. | ||
pub enum ExecutionMode { | ||
/// Only use scalar execution. This is recommended if you want to have | ||
/// consistent results across different platforms and want to avoid unsafe code, | ||
/// and is the only option if you disabled the `simd` feature. Performance will be | ||
/// worse, though. | ||
Scalar, | ||
/// Select the best execution mode according to what is available on the host system. | ||
/// This is the recommended option for highest performance. | ||
#[cfg(feature = "simd")] | ||
Auto, | ||
/// Force the usage of neon SIMD instructions. This will lead to panics in case | ||
/// the CPU doesn't support the target feature `neon`. | ||
#[cfg(all(target_arch = "aarch64", feature = "simd"))] | ||
Neon, | ||
/// Force the usage of AVX2 SIMD instructions. This will lead to panics in case | ||
/// the CPU doesn't support the target features `avx2` and `fma`. | ||
#[cfg(all(target_arch = "x86_64", feature = "simd"))] | ||
Avx2, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative would be to lift this to the type-level to allow for free branching due to monomorphization, but perhaps that should be an implementation detail and not be exposed on the API. On the other hand, if we ever get something like struct target features, the consumer could guide autovectorization through monomorphization.
That said, I'd be happy to land this as-is and revisit when we have a clearer picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand you correctly that's what already happens in my code (https://github.com/LaurenzV/cpu-sparse-experiments/blob/main/crates/sparse_primitives/src/lib.rs), the main reason for having it as an enum in addition is that you can select a mode at runtime, which is a nice feature to have IMO. But yes, I think those details can be discussed at a later point. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bikeshed regarding the peniko import.
Co-authored-by: Tom Churchman <thomas@kepow.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the exact module structure may be something to be revisited when we have a clearer picture, but perhaps concrete renderer implementations just re-export the types they need.
@@ -86,6 +86,7 @@ clippy.wildcard_dependencies = "warn" | |||
|
|||
[workspace.dependencies] | |||
vello = { version = "0.4.0", path = "vello" } | |||
vello_api = { path = "sparse_strips/vello_api" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]: Maybe I’m wrong, but I see this change as a sign that vello_api
is now considered stable or at least working well enough to be fully integrated into the workspace, rather than remaining a separate experimental component. It suggests that vello_api
is now a fundamental part of the project and can be used consistently across different crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wouldn't see it this way, the only reason I added it here is so that everywhere else we use it we can just do vello_api = {workspace = true}
to avoid duplicating the path to the crate. I think everyone who works with this repo is aware that this is in-progress and experimental, so I wouldn't worry about it too much. Especially since we aren't gonna publish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think adding a crate to the workspace signals readiness, but an argument could be made for ordering the sparse strip crates after the main Vello crates. A note could then be added stating they're the experimental sparse strip crates.
// a `Pattern` type. | ||
/// A paint used for filling or stroking paths. | ||
#[derive(Debug, Clone)] | ||
pub enum Paint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be used only internally, or is it also available for external consumers?
My initial thought about vello_api
was that it would act as an entry point for external usage — kind of like how wgpu is structured. As I understand it, wgpu-api itself is implemented under the wgpu crate, but the real core is in wgpu-core, which then uses wgpu-hal. wgpu-hal abstracts over different graphics APIs with backends for Vulkan, Metal, Direct3D, and OpenGL.
I was imagining something similar here — where consumers of vello_api
could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.
With that in mind, I think it’d be great to add an example showcasing how to use it later. Also, adding some common types and structures to vello_common
for now seems like a safe move — we can always upstream them into vello_api
if they prove useful for consumers.
That said, at this stage, vello_api
doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.
Curious to hear your thoughts — maybe we can chat about this during Office Hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.
Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the web target, I believe it would be better to introduce a build-time feature, such as cpu, hybrid, etc., to prevent unnecessary increases in bundle size. This is something we’ll need to work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining something similar here — where consumers of vello_api could use a single API and choose the rendering path dynamically, depending on the device. For example, they could switch between a CPU renderer, a hybrid CPU/GPU mode, or maybe even enable compute if the device supports it.
I think that will already be possible with the RenderContext
trait, no? Since both renderers will implement that trait you can dynamically choose whether you want to use the CPU-only version or the hybrid version.
That said, at this stage, vello_api doesn’t seem critical yet. We can probably build and refine it after merging the two implementations we have in progress.
Yeah as I mentioned, to me it's still not 100% clear what the purpose of vello_api
will be (I've never used wgpu, so unfortunately I can't really comment on that either^^), but I think we can figure it out as we go.
Would this be a runtime switch or build time feature? Runtime determination could see increased bundle sizes
So, for vello_cpu
I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on. With that said, to address your concerns:
- I would be open to adding additional feature flags so that you can disable specific instruction sets you don't want to use at build time, if you think that's better. Then you can have the best of both worlds.
- My understanding was that for
vello_hybrid
you are mostly targeting WASM? In which case this wouldn't be relevant for you anyway, because the other SIMD kernels will only get compiled for x86/aarch64. Or am I missing something? - Since in
vello_common
all of the SIMD stuff is behind feature flags, it is possible to buildvello_cpu
with runtime detection in mind, while vello_hybrid can combine the code in a way so that everything happens at build time, so I don't think there should be any big issues. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for vello_cpu I would definitely like to have support for runtime detection, so that you can have a portable binary with the best possible performance regardless of which CPU you run on.
By lifting the CPU feature sets to the type-level the compiler is able to remove unused specializations after monomorphization, also for explicit branches like checking for TypeId::of
. That could help reduce the burden of feature flag sprinkling a bit, by not requiring every individual specialization to be feature flagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?
Certainly the SIMD stuff seems a bit out-of-place in this crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seemingly were some different ideas on what exactly vello_api
should be. If it's types that are public and common to both vello_cpu
and vello_hybrid
, then this makes sense, but if vello_api
does e.g. dynamic dispatch to CPU or GPU, then the relationship flips, and all of this should indeed be in vello_common
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the crate which does the dynamic dispatch would be Vello, probably.
Vello would probably only use the CPU mode if:
- You ask it to directly; OR
- The adapter available is detected to be a CPU fallback and you don't ask to use it anyway; OR
- You don't provide a GPU (which is equivalent to 1, really). That is, if there's no wgpu adapter available, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the crate which does the dynamic dispatch would be Vello, probably.
How would that work if you use vello_cpu
directly, though? I was under the assumption that those crates can be used independently, no? Or do you think in this case it's the responsibility of the user to do the dispatching themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @DJMcNab on deferring the use of vello_api
.
My baseline assumption is that Vello API would be the equivalent of Piet, and is something we've decided to defer entirely:
It looks like there are different understandings of that crate. How about we discuss it later — either in Zulip or during Office Hours? For now, what do you think about using vello_common
instead of vello_api
?
/// Select the best execution mode according to what is available on the host system. | ||
/// This is the recommended option for highest performance. | ||
#[cfg(feature = "simd")] | ||
Auto, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that WebAssembly has its own wasm32-simd128 instruction set (currently in beta), and that this instruction set is not currently supported in the current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I currently did not write any SIMD code for that, but it's definitely planned. But it probably won't be a focus of my thesis (unless I implement everything else and still have time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think it’s beta, at least tiny-skia uses WASM intrinsics and works fine on stable, AFAIK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, sorry! I shouldn’t have written “beta”. What I meant is that there’s still a large group of users on Safari < 16.4, which don’t support WASM SIMD.
|
||
#[derive(Copy, Clone, Debug)] | ||
/// The execution mode used for the rendering process. | ||
pub enum ExecutionMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to be non_exhaustive
Scalar, | ||
/// Select the best execution mode according to what is available on the host system. | ||
/// This is the recommended option for highest performance. | ||
#[cfg(feature = "simd")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be feature flagged; we should just always have this be the default
#[cfg(feature = "simd")] | ||
impl Default for ExecutionMode { | ||
fn default() -> Self { | ||
Self::Auto | ||
} | ||
} | ||
|
||
#[cfg(not(feature = "simd"))] | ||
impl Default for ExecutionMode { | ||
fn default() -> Self { | ||
Self::Scalar | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 90% sure that these could use the #[default]
attribute with cfg_attr
// a `Pattern` type. | ||
/// A paint used for filling or stroking paths. | ||
#[derive(Debug, Clone)] | ||
pub enum Paint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what this merged content meaningfully represents as a grouping. Almost all of this should probably be in Vello Common?
Certainly the SIMD stuff seems a bit out-of-place in this crate.
It`s possible I later on realize there is something else missing, but I think for now this should cover most of what is needed for that crate, based on my current version. One thing that will need to be added later on is the render context trait.