-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providerfeature requestrequest for unsupported feature or enhancementrequest for unsupported feature or enhancementplatform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template
Description
Describe the feature request
Assess performance capability without downloading the full model.
Describe scenario use case
For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.
I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.
Originally posted at huggingface/transformers.js#545 (comment)
xenova, andreban and maudnalsxenova, andreban and maudnals
Metadata
Metadata
Assignees
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providerfeature requestrequest for unsupported feature or enhancementrequest for unsupported feature or enhancementplatform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template