Nexa SDK

Nexa SDK is an on-device inference framework that runs any model on any device, across any backend. It runs on CPUs and GPUs with backend support for CUDA, Metal, and Vulkan. It handles multiple input modalities including text 📝, image 🖼️, and audio 🎧. The SDK includes an OpenAI-compatible API server with support for JSON schema-based function calling and streaming. It supports model formats such as GGUF and MLX, enabling efficient quantized inference across diverse platforms.

Recent updates

📣 2025.07.22: Release of nexaSDK beta, includes:

MLX and GGUF support
VLM and LLMs

Installation

macOS

Windows

Windows Installer

Linux

curl -fsSL https://raw.githubusercontent.com/NexaAI/nexa-sdk/main/release/linux/install.sh -o install.sh && chmod +x install.sh && ./install.sh

Supported Models

You can run any compatible GGUF or MLX model from 🤗 Hugging Face by using the <full repo name>.

GGUF models

Tip

GGUF runs on macOS, Linux, and Windows.

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer ggml-org/Qwen3-1.7B-GGUF

🖼️ Run and chat with Multimodal models, e.g. Qwen2.5-Omni:

nexa infer NexaAI/Qwen2.5-Omni-3B-GGUF

MLX models

Tip

MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably. We recommend starting with models from our curated NexaAI Collection for best results. For example

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer NexaAI/Qwen3-4B-4bit-MLX

🖼️ Run and chat with Multimodal models, e.g. Gemma3n:

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

CLI Reference

Essential Command	What it does
`nexa -h`	show all CLI commands
`nexa pull <repo>`	Interactive download & cache of a model
`nexa infer <repo>`	Local inference
`nexa list`	Show all cached models with sizes
`nexa remove <repo>` / `nexa clean`	Delete one / all cached models
`nexa serve --host 127.0.0.1:8080`	Launch OpenAI‑compatible REST server
`nexa run <repo>`	Chat with a model via an existing server

👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!

See CLI Reference for full commands.

Acknowledgements

We would like to thank the following projects:

Name		Name	Last commit message	Last commit date
Latest commit History 456 Commits
.github/workflows		.github/workflows
assets		assets
release		release
runner		runner
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nexa SDK

Recent updates

📣 2025.07.22: Release of nexaSDK beta, includes:

Installation

macOS

Windows

Linux

Supported Models

GGUF models

MLX models

CLI Reference

Acknowledgements

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 36

Languages

NexaAI/nexa-sdk

Folders and files

Latest commit

History

Repository files navigation

Nexa SDK

Recent updates

📣 2025.07.22: Release of nexaSDK beta, includes:

Installation

macOS

Windows

Linux

Supported Models

GGUF models

MLX models

CLI Reference

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 36

Languages

Packages