⚙️Local server
Start local server running local model
This document outlines the NexaAI server commands and API endpoints for running local models as OpenAI-compatible APIs. The FastAPI-based server supports various operations including text generation, chat completions, function calling, image generation, and audio processing.
Key Features
Multiple Endpoints: Supports text generation, chat completions, function calling, image generation, and audio processing.
Streaming Support: Enables real-time text generation for interactive experiences.
GPU Acceleration: Utilizes GPU for improved performance.
Customizable Parameters: Allows fine-tuning of generation parameters.
Server Command
You can start a local server using models on your local computer with the nexa server
command. Here's the usage syntax:
Options:
--host
: Host to bind the server to--port
: Port to bind the server to--reload
: Enable automatic reloading on code changes--nctx
: Length of context window
Example Command:
By default
nexa server
will rungguf
models.To run
onnx
models, simply addonnx
afternexa server
API Endpoints
Text Generation: /v1/completions
/v1/completions
Generates text based on a single prompt.
Request body:
Example Response:
Chat Completions: /v1/chat/completions
/v1/chat/completions
Handles chat completions with support for conversation history.
Request body:
Example Response:
Function Calling: /v1/function-calling
/v1/function-calling
Call the most appropriate function based on user's prompt
Request body:
Function format:
Example Response:
Text-to-Image: /v1/txt2img
/v1/txt2img
Generates images based on a single prompt.
Request body:
Example Response:
Image-to-Image: /v1/img2img
/v1/img2img
Modifies existing images based on a single prompt.
Request body:
Example Response:
Audio Transcriptions: /v1/audio/transcriptions
/v1/audio/transcriptions
Transcribes audio files to text.
Parameters:
beam_size
(integer): Beam size for transcription (default: 5)language
(string): Language code (e.g., 'en', 'fr')temperature
(number): Temperature for sampling (default: 0)
Request body:
Example Response:
Audio Translations: /v1/audio/translations
/v1/audio/translations
Translates audio files to text in English.
Parameters:
beam_size
(integer): Beam size for transcription (default: 5)temperature
(number): Temperature for sampling (default: 0)
Request body:
Example Response:
Last updated