-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Is your feature request related to a problem? Please describe.
Currently, workers that run inference must retrieve the ML model in memory to pass it back to the host. This limits the number of models we can run as Wasm has a memory limit of 4Gb (32-bit).
Describe the solution you'd like
The WASI-NN proposal recently introduced the "Named Models" feature. This feature allows the host to preload a set of models and expose them to the modules using a label. The model no longer need to load the model in memory. It only needs to reference the model using the right label.
wws can take advantage of this feature by adding new configuration parameters to the feature.wasi_nn object. Then, wws will preload those models and expose them to the workers.
The wasmtime-wasi-nn and wasi-nn (see example) crates already supports this feature.
Describe alternatives you've considered
No response
Additional context
No response