Getting Started
If you haven’t already, pull a model before making requests. For example, pull Qwen3:bash
bash
http://127.0.0.1:18181
by default. Keep this terminal open and running, since it hosts the server.
Then, open another terminal (also in the project root directory) and make your API requests.
/v1/chat/completions
Creates a model response for a given conversation. Supports LLM(text-only) and VLM (image+text).Use LLM
Request body
Example Value
Usage Example
Use VLM
Request body
Example Value
Usage Example
/v1/images/generations
Creates an image based on given a prompt.The example below uses
NexaAI/Prefect-illustrious-XL-v2.0p-fp16-cuda
as the model, which is recommended for most CUDA (NVIDIA GPU) environments.If you are running on Apple Silicon, use an MLX-compatible model (e.g., nexaml/sdxl-turbo-ryzen-ai
).Always make sure the model you select matches your hardware capabilities.Request body
Example Value
Usage Example
/v1/embeddings
Creates an embedding for the given input. Use this to convert text (or document chunks) to vectors for indexing.Use
/v1/embeddings
when you need to convert text or document chunks into vectors for indexing in a retrieval system.Make sure you select a model that supports embeddings (e.g., djuna/jina-embeddings-*
). Calling this API with a non-embedding model will result in an error.Minimal request body
Example Value
Usage Example
Was this page helpful?