Skip to main content

Installation

Pull Docker Image

Pull the latest NexaSDK Docker image from Docker Hub:
bash
docker pull nexa4ai/nexasdk:latest

Usage Modes

NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.

Server Mode (Detached)

Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:
bash
docker run --rm -dp 18181:18181 --privileged \
  -e NEXA_TOKEN="YOUR_LONG_TOKEN_HERE" \
  nexa4ai/nexasdk serve
The --privileged flag is required for NPU access on ARM64 systems. For x64 systems, you may omit this flag if not using NPU. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.
The server will be accessible at http://localhost:18181. For detailed API documentation, see the REST API page.

CLI Mode (Interactive)

Run NexaSDK in interactive CLI mode for direct model inference:
bash
docker run --rm -it --privileged \
  -e NEXA_TOKEN="YOUR_LONG_TOKEN_HERE" \
  nexa4ai/nexasdk infer NexaAI/Granite-4.0-h-350M-NPU
The -it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name.

Supported Models

Linux ARM64 (NPU Acceleration)

The following models are supported on Linux ARM64 with NPU acceleration (Dragonwing IQ9):

Language Models (LLM)

Vision-Language Models (VLM)

Embedding Models

Reranking Models

Computer Vision (CV)

Automatic Speech Recognition (ASR)

Linux x64

For Linux x64 systems, you can use GGUF format models. Recommended models include:

Language Models (LLM)

Vision-Language Models (VLM)

For more information about NPU models and access tokens, see the NPU Models Guide. For GGUF models, see the GGUF Models Guide.

Next Steps