Installation
Pull Docker Image
Pull the latest NexaSDK Docker image from Docker Hub:bash
Usage Modes
NexaSDK Docker supports two usage modes: server mode for REST API access and interactive CLI mode for direct model inference.CLI Mode (Interactive)
Run NexaSDK in interactive CLI mode for direct model inference:bash
The
-it flags enable interactive mode. Replace NexaAI/Granite-4.0-h-350M-NPU with any supported model name. For a complete list of supported models, see the Overview page.Server Mode (Detached)
Run NexaSDK in server mode to expose a REST API endpoint. This mode runs in the background:bash
The
--privileged flag is required for NPU access. Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.nexa serve won’t auto download models. Make sure to pre-download the models you intend to use.
The server will be accessible at http://localhost:18181. For detailed API documentation, see the REST API page.
Next Steps
REST API
Learn how to use the REST API endpoints for chat completions, embeddings, reranking, and more.
NPU Models
Explore the full collection of NPU-optimized models available for Qualcomm devices.
Was this page helpful?