This document outlines the NexaAI server commands and API endpoints for running local models as OpenAI-compatible APIs. The FastAPI-based server supports various operations including text generation, chat completions, function calling, image generation, and audio processing.
Key Features
Multiple Endpoints: Supports text generation, chat completions, function calling, image generation, and audio processing.
Streaming Support: Enables real-time text generation for interactive experiences.
GPU Acceleration: Utilizes GPU for improved performance.
Customizable Parameters: Allows fine-tuning of generation parameters.
Server Command
You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:
To run onnx models, simply add onnx after nexa server
API Endpoints
Text Generation: /v1/completions
Generates text based on a single prompt.
Request body:
{"prompt":"Tell me a story","temperature":1,"max_new_tokens":128,"top_k":50,"top_p":1,"stop_words": ["string" ]}
Example Response:
{"result":"Once upon a time, in a small village nestled among rolling hills..."}
Chat Completions: /v1/chat/completions
Handles chat completions with support for conversation history.
Request body:
Multimodal models (VLM):
{"model":"anything","messages": [ {"role":"user","content": [ {"type":"text","text":"What’s in this image?" }, {"type":"image_url","image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
} } ] } ],"max_tokens":300,"temperature":0.7,"top_p":0.95,"top_k":40,"stream":false}
Traditional NLP models:
{"messages": [ {"role":"user","content":"Tell me a story" } ],"max_tokens":128,"temperature":0.1,"stream":false,"stop_words": []}
Example Response:
{"id":"f83502df-7f5a-4825-a922-f5cece4081de","object":"chat.completion","created":1723441724.914671,"choices": [ {"message": {"role":"assistant","content":"In the heart of a mystical forest..." } } ]}
Function Calling: /v1/function-calling
Call the most appropriate function based on user's prompt
Request body:
{"messages": [ {"role":"user","content":"Extract Jason is 25 years old" } ],"tools": [ {"type":"function","function": {"name":"UserDetail","parameters": {"properties": {"name": {"description":"The user's name","type":"string" },"age": {"description":"The user's age","type":"integer" } },"required": ["name","age" ],"type":"object" } } } ],"tool_choice":"auto"}
{"prompt":"A girl, standing in a field of flowers, vivid","image_path":"","cfg_scale":7,"width":256,"height":256,"sample_steps":20,"seed":0,"negative_prompt":""}
Modifies existing images based on a single prompt.
Request body:
{"prompt":"A girl, standing in a field of flowers, vivid","image_path":"path/to/image","cfg_scale":7,"width":256,"height":256,"sample_steps":20,"seed":0,"negative_prompt":""}