Local OpenAI-compatible API for text generation, embeddings, and more
llm
http://127.0.0.1:18181
by default.
endpoint
: type of interaction, for example, http://127.0.0.1:18181/v1/completions
/completions
/chat/completions
/embeddings
/reranking
model-name
: for example, ggml-org/Qwen3-0.6B-GGUF
stream
, multimodal input, and function tools.
Example:
llm
vlm
vlm
embedder
reranker