Skip to main content

Getting Started

To use the API, first start the NexaSDK Docker container in server mode:
bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -d -p 18181:18181 --privileged \
  -v /path/to/data:/data \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk serve
The server runs on http://127.0.0.1:18181 by default.
Keep the container running, and make your requests from another terminal or application.
You can also access the interactive Swagger UI documentation at http://127.0.0.1:18181/docs/ui to explore and test the API endpoints directly from your browser.
Replace YOUR_LONG_TOKEN_HERE with your actual Nexa token. You can obtain a token by creating an account at sdk.nexa.ai and generating one in Deployment → Create Token.
The --privileged flag is required for NPU access.

Model Choice

NexaSDK Docker supports Linux ARM64 architecture. For a complete list of supported models and their Hugging Face links, see the Overview page.

API Endpoints

The NexaSDK REST API provides OpenAI-compatible endpoints for various AI tasks. For detailed API documentation including request/response formats, examples, and all available endpoints, please refer to the CLI REST API documentation.

Available Endpoints

  • /v1/chat/completions - Creates model responses for conversations (LLM and VLM)
  • /v1/embeddings - Creates embeddings for text input
  • /v1/reranking - Reranks documents based on query relevance
All API endpoints, request/response formats, and usage examples are documented in the CLI REST API page. The API interface is identical whether running via CLI or Docker - only the server startup method differs.