Skip to main content
NexaSDK banner NexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLM, VLM, CV, Embedding, Rerank, ASR, TTS). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI’s own .nexa format.

Why NexaSDK

FeatureNexaSDKOllamallama.cppLM Studio
NPU support🟢 NPU-first🟡🟡🔴
Android/iOS SDK support🟢 NPU/GPU/CPU support🟡🟡🔴
Linux support (Docker image)🟢🟢🟢🔴
Support any model in GGUF, MLX, NEXA format🟢 Low-level Control🔴🟡🔴
Full multimodality support🟢 Image, Audio, Text, Embedding, Rerank, ASR, TTS🟡🟡🟡
Cross-platform support🟢 Desktop, Mobile (Android, iOS), Automotive, IoT (Linux)🟡🟡🟡
One line of code to run🟢🟢🟡🟢
OpenAI-compatible API + Function calling🟢🟢🟢🟢
Legend: 🟢 Supported  |  🟡 Partial or limited support  |  🔴 No

Get Started

Quickstart

Easily install NexaSDK and run your first model in minutes.

API Reference

In-depth documentation for all NexaSDK API.

Python Library

Learn how to use NexaSDK from Python with full code examples.

Android

Get started with NexaSDK for Android and embedded platforms.

Community

Nexa Wishlist

Vote for the next models we bring on-device. Request and vote for GGUF, MLX, or NPU models!

Builder Bounty Program

Earn up to 1,500 USD for building open-source projects with NexaSDK. Get rewarded and featured!

Join Discord

Connect with the Nexa community, get support, and stay up-to-date on Discord.

Join Slack

Join our Slack workspace to collaborate, ask questions, and connect with the Nexa team.