Skip to main content

Installation

Pull Docker Image

Pull the latest NexaSDK Docker image from Docker Hub:
bash
docker pull nexa4ai/nexasdk:latest

Usage Modes

NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.

CLI Mode (Interactive)

Run NexaSDK in interactive CLI mode for direct model inference:
bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU
The -it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name. For a complete list of supported models, see the Overview page.

Server Mode (Detached)

Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:
bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest pull [MODEL_NAME]
docker run --rm -d -p 18181:18181 --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest serve
The --privileged flag is required for NPU access. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.
nexa serve won’t auto download models. Make sure to pre-download the models you intend to use. The server will be accessible at http://localhost:18181. For detailed API documentation, see the REST API page.

Next Steps