Quickstart

Usage Modes

NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.

CLI Mode (Interactive)

Run NexaSDK in interactive CLI mode for direct model inference:

bash

export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

The -it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name. For a complete list of supported models, see the Overview page.

Server Mode (Detached)

Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:

bash

export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest pull [MODEL_NAME]
docker run --rm -d -p 18181:18181 --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest serve

The --privileged flag is required for NPU access. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.

nexa serve won’t auto download models. Make sure to pre-download the models you intend to use.

The server will be accessible at http://localhost:18181. For detailed API documentation, see the REST API page.

Tutorial Video

Installation

Pull Docker Image

Usage Modes

CLI Mode (Interactive)

Server Mode (Detached)

Next Steps

REST API

NPU Models

​Tutorial Video

​Installation

​Pull Docker Image

​Usage Modes

​CLI Mode (Interactive)

​Server Mode (Detached)

​Next Steps

REST API

NPU Models

Tutorial Video

Installation

Pull Docker Image

Usage Modes

CLI Mode (Interactive)

Server Mode (Detached)

Next Steps