Skip to main content

Getting Started

To use the API, first start the NexaSDK Docker container in server mode:
bash
docker run --rm -dp 18181:18181 --privileged \
  -e NEXA_TOKEN="YOUR_LONG_TOKEN_HERE" \
  nexa4ai/nexasdk serve
The server runs on http://127.0.0.1:18181 by default.
Keep the container running, and make your requests from another terminal or application.
To see a full list of configurable options for the server, you can check the container logs or refer to the Quickstart guide.
Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.
The --privileged flag is required for NPU access on ARM64 systems. For x64 systems, you may omit this flag if not using NPU.

Model Choice

NexaSDK Docker supports both Linux ARM64 and x64 architectures. For a complete list of supported models and their Hugging Face links, see the Quickstart guide.

API Endpoints

The NexaSDK REST API provides OpenAI-compatible endpoints for various AI tasks. For detailed API documentation including request/response formats, examples, and all available endpoints, please refer to the CLI REST API documentation.

Available Endpoints

  • /v1/chat/completions - Creates model responses for conversations (LLM and VLM)
  • /v1/embeddings - Creates embeddings for text input
  • /v1/reranking - Reranks documents based on query relevance
All API endpoints, request/response formats, and usage examples are documented in the CLI REST API page. The API interface is identical whether running via CLI or Docker - only the server startup method differs.