This document outlines the NexaAI server commands and API endpoints for running local models as OpenAI-compatible APIs. The FastAPI-based server supports various operations including text generation, chat completions, function calling, image generation, and audio processing.
Key Features
Multiple Endpoints: Supports text generation, chat completions, function calling, image generation, and audio processing.
Streaming Support: Enables real-time text generation for interactive experiences.
GPU Acceleration: Utilizes GPU for improved performance.
Customizable Parameters: Allows fine-tuning of generation parameters.
Server Command
You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:
To run onnx models, simply add onnx after nexa server
API Endpoints
Text Generation: /v1/completions
Generates text based on a single prompt.
Request body:
{"prompt":"Tell me a story","temperature":1,"max_new_tokens":128,"top_k":50,"top_p":1,"stop_words": ["string" ]}
Example Response:
{"result":"Once upon a time, in a small village nestled among rolling hills..."}
Chat Completions: /v1/chat/completions
Handles chat completions with support for conversation history.
Request body:
Multimodal models. It can support both url or path in request body:
{"messages": [ {"role":"user","content": [ {"text":"What’s in this image?","type":"text" }, {"image_url": {"url":"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" },"type":"image_url" } ] } ],"max_tokens":128,"temperature":0.2,"stream":false,"stop_words": [],"top_k":40,"top_p":0.95}
{"messages": [ {"role":"user","content": [ {"text":"What’s in this image?","type":"text" }, {"image_url": {"path":"/path/to/local/image.jpg" },"type":"image_url" } ] } ],"max_tokens":128,"temperature":0.2,"stream":false,"stop_words": [],"top_k":40,"top_p":0.95}
Traditional NLP models:
{"messages": [ {"role":"user","content":"Tell me a story" } ],"max_tokens":128,"temperature":0.1,"stream":false,"stop_words": []}
Example Response:
{"id":"f83502df-7f5a-4825-a922-f5cece4081de","object":"chat.completion","created":1723441724.914671,"choices": [ {"message": {"role":"assistant","content":"In the heart of a mystical forest..." } } ]}
Function Calling: /v1/function-calling
Call the most appropriate function based on user's prompt
Request body:
{"messages": [ {"role":"user","content":"Extract Jason is 25 years old" } ],"tools": [ {"type":"function","function": {"name":"UserDetail","parameters": {"properties": {"name": {"description":"The user's name","type":"string" },"age": {"description":"The user's age","type":"integer" } },"required": ["name","age" ],"type":"object" } } } ],"tool_choice":"auto"}
{"prompt":"A girl, standing in a field of flowers, vivid","image_path":"","cfg_scale":7,"width":256,"height":256,"sample_steps":20,"seed":0,"negative_prompt":""}
Modifies existing images based on a single prompt.
Request body:
{"prompt":"A girl, standing in a field of flowers, vivid","image_path":"path/to/image","cfg_scale":7,"width":256,"height":256,"sample_steps":20,"seed":0,"negative_prompt":""}