Getting Started
If you haven’t already, pull a model before making requests. For example, pull Qwen3:bash
bash
http://127.0.0.1:18181
by default.
API Endpoints
/v1/completions
Generate a single-turn text completion from a prompt.endpoint
: type of interaction, for example,http://127.0.0.1:18181/v1/completions
/completions
/chat/completions
/embeddings
/reranking
model-name
: for example,NexaAI/Qwen3-0.6B
/v1/chat/completions
Perform multi-turn conversation with role-based messages. Supportsstream
, multimodal input, and function tools.
Example:
json
Function Calling
Enable models to call functions by providing tool definitions in your request. First, pull a model that supports function calling:bash
Using Image Models
For vision language models (VLMs), first pull a model with image support:bash
Using Audio Models
For audio language models, first pull a model with audio support:bash
/v1/embeddings
Get vector representations of input texts using an embedding model. Prerequisites: Download an embedding model first:bash
/v1/reranking
Given a query and candidate documents, return a relevance score for each. Prerequisites: Download a reranker model first:bash
Was this page helpful?