-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
Function calling
- Integrate with XGrammar's structural tag: [Feature] Structural tag xgrammar#162, and enable reliable tool use with small models in WebLLM
- Add an E2E MCP-like example, using the structural tag / tool-use stated above
Models
- Support Phi-4 (ongoing)
- Support Gemma3
- Support Gemma3n
Modalities
- Add reliable image input feature (currently have initial support, performance and correctness need to be further investigated)
- Add other modality input (e.g. audio)
Performance
- Profile existing performance, identify bottlenecks and address them if exist
- Switch some existing CPU workload (e.g. sampling) to GPU if performance improves (ongoing: Replace CPU Function Calls with GPU Kernel Invocations #697)
- Subgroup operation support: Use subgroup operations when possible #553
Others
- Better WASM conversion experience (e.g. hosting a huggingface space, so users do not need to set up environment)
- Qwen2.5's 1.5B webgpu-only correctness issue—very obvious with DeepSeek-R1-Distill
- Parse thinking/non-thinking tokens when returning completion response
sbbarragan, rakesh-163, dennisjnnh, nico-martin, mbalabash and 1 more
Metadata
Metadata
Assignees
Labels
No labels