Installation
Pull Docker Image
Pull the latest NexaSDK Docker image from Docker Hub:bash
Usage Modes
NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.Server Mode (Detached)
Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:bash
The
--privileged flag is required for NPU access on ARM64 systems. For x64 systems, you may omit this flag if not using NPU. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.http://localhost:18181. For detailed API documentation, see the REST API page.
CLI Mode (Interactive)
Run NexaSDK in interactive CLI mode for direct model inference:bash
The
-it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name.Supported Models
Linux ARM64 (NPU Acceleration)
The following models are supported on Linux ARM64 with NPU acceleration (Dragonwing IQ9):Language Models (LLM)
Vision-Language Models (VLM)
Embedding Models
Reranking Models
Computer Vision (CV)
Automatic Speech Recognition (ASR)
Linux x64
For Linux x64 systems, you can use GGUF format models. Recommended models include:Language Models (LLM)
Vision-Language Models (VLM)
For more information about NPU models and access tokens, see the NPU Models Guide. For GGUF models, see the GGUF Models Guide.
Next Steps
REST API
Learn how to use the REST API endpoints for chat completions, embeddings, reranking, and more.
NPU Models
Explore the full collection of NPU-optimized models available for Qualcomm devices.
Was this page helpful?