Skip to main content

Getting Started

If you haven’t already, pull a model before making requests. For example, pull Qwen3:
bash
nexa pull NexaAI/Qwen3-0.6B-GGUF
To use the API, first open a terminal from the project root. Then, start the Nexa server by:
bash
nexa serve
The server runs on http://127.0.0.1:18181 by default.
Keep this terminal open and running, since it hosts the server.
Then, open another terminal (also in the project root directory) and make your API requests.

/v1/chat/completions

Creates a model response for a given conversation. Supports LLM(text-only) and VLM (image+text).

Use LLM

Request body

Example Value
{
  "model": "NexaAI/Qwen3-0.6B-GGUF",
  "messages": [
    {"role": "user", "content": "Hello! Briefly introduce yourself."}
  ],
  "max_tokens": 256,
  "temperature": 0.7,
  "stream": false
}

Usage Example

curl -X POST http://127.0.0.1:18181/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"NexaAI/Qwen3-0.6B-GGUF\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}], \"max_tokens\": 64}"

Use VLM

Request body

Example Value
{
  "model": "NexaAI/qwen3vl-GGUF",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image succinctly."},
        {"type": "image_url", "image_url": {"url": "</path/to/image>"}}
      ]
    }
  ]
}

Usage Example

curl -X POST http://127.0.0.1:18181/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"NexaAI/qwen3vl-GGUF\", \"messages\": [{\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"what is main color of the picture\"}, {\"type\": \"image_url\", \"image_url\": {\"url\": \"</path/to/image>\"}}]}], \"stream\": false}"

/v1/images/generations

Creates an image based on given a prompt.
The example below uses NexaAI/Prefect-illustrious-XL-v2.0p-fp16-cuda as the model, which is recommended for most CUDA (NVIDIA GPU) environments.If you are running on Apple Silicon, use an MLX-compatible model (e.g., nexaml/sdxl-turbo-ryzen-ai).Always make sure the model you select matches your hardware capabilities.

Request body

Example Value
{
  "model": "NexaAI/Prefect-illustrious-XL-v2.0p-fp16-cuda",
  "prompt": "A white cat with blue eyes",
  "n": 1,
  "size": "512x512",
  "response_format": "url"
}

Usage Example

curl -X POST http://127.0.0.1:18181/v1/images/generations -H "Content-Type: application/json" -d "{\"model\":\"NexaAI/Prefect-illustrious-XL-v2.0p-fp16-cuda\",\"prompt\":\"A white cat with blue eyes\",\"n\":1,\"size\":\"512x512\",\"response_format\":\"url\"}"

/v1/embeddings

Creates an embedding for the given input. Use this to convert text (or document chunks) to vectors for indexing.
Use /v1/embeddings when you need to convert text or document chunks into vectors for indexing in a retrieval system.Make sure you select a model that supports embeddings (e.g., djuna/jina-embeddings-*). Calling this API with a non-embedding model will result in an error.

Minimal request body

Example Value
{
  "model": "djuna/jina-embeddings-v2-small-en-Q5_K_M-GGUF",
  "input": "Hello, world!"
}

Usage Example

curl -X POST http://127.0.0.1:18181/v1/embeddings -H "Content-Type: application/json" -d "{\"model\":\"djuna/jina-embeddings-v2-small-en-Q5_K_M-GGUF\",\"input\":\"Hello, world!\"}"

I