Quickstart

Installation

Pull Docker Image

Pull the latest NexaSDK Docker image from Docker Hub:

bash

docker pull nexa4ai/nexasdk:latest

Usage Modes

NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.

Server Mode (Detached)

Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:

bash

docker run --rm -dp 18181:18181 --privileged \
  -e NEXA_TOKEN="YOUR_LONG_TOKEN_HERE" \
  nexa4ai/nexasdk serve

The --privileged flag is required for NPU access on ARM64 systems. For x64 systems, you may omit this flag if not using NPU. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.

The server will be accessible at http://localhost:18181. For detailed API documentation, see the REST API page.

CLI Mode (Interactive)

Run NexaSDK in interactive CLI mode for direct model inference:

bash

docker run --rm -it --privileged \
  -e NEXA_TOKEN="YOUR_LONG_TOKEN_HERE" \
  nexa4ai/nexasdk infer NexaAI/Granite-4.0-h-350M-NPU

The -it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name.

Supported Models

Linux ARM64 (NPU Acceleration)

The following models are supported on Linux ARM64 with NPU acceleration (Dragonwing IQ9):

Language Models (LLM)

Vision-Language Models (VLM)

NexaAI/OmniNeural-4B

Embedding Models

Reranking Models

NexaAI/jina-v2-rerank-npu

Computer Vision (CV)

Automatic Speech Recognition (ASR)

NexaAI/parakeet-tdt-0.6b-v3-npu

Linux x64

For Linux x64 systems, you can use GGUF format models. Recommended models include:

Language Models (LLM)

Vision-Language Models (VLM)

For more information about NPU models and access tokens, see the NPU Models Guide. For GGUF models, see the GGUF Models Guide.

Next Steps

REST API

Learn how to use the REST API endpoints for chat completions, embeddings, reranking, and more.

NPU Models

Explore the full collection of NPU-optimized models available for Qualcomm devices.

Was this page helpful?

Yes

Get Started

Nexa CLI Usage

Android SDK

Linux Docker

Python Library

Community

Installation

Pull Docker Image

Usage Modes

Server Mode (Detached)

CLI Mode (Interactive)

Supported Models

Linux ARM64 (NPU Acceleration)

Language Models (LLM)

Vision-Language Models (VLM)

Embedding Models

Reranking Models

Computer Vision (CV)

Automatic Speech Recognition (ASR)

Linux x64

Language Models (LLM)

Vision-Language Models (VLM)

Next Steps

REST API

NPU Models

Get Started

Nexa CLI Usage

Android SDK

Linux Docker

Python Library

Community

​Installation

​Pull Docker Image

​Usage Modes

​Server Mode (Detached)

​CLI Mode (Interactive)

​Supported Models

​Linux ARM64 (NPU Acceleration)

​Language Models (LLM)

​Vision-Language Models (VLM)

​Embedding Models

​Reranking Models

​Computer Vision (CV)

​Automatic Speech Recognition (ASR)

​Linux x64

​Language Models (LLM)

​Vision-Language Models (VLM)

​Next Steps

REST API

NPU Models

Installation

Pull Docker Image

Usage Modes

Server Mode (Detached)

CLI Mode (Interactive)

Supported Models

Linux ARM64 (NPU Acceleration)

Language Models (LLM)

Vision-Language Models (VLM)

Embedding Models

Reranking Models

Computer Vision (CV)

Automatic Speech Recognition (ASR)

Linux x64

Language Models (LLM)

Vision-Language Models (VLM)

Next Steps