⚙️Local server

Start local server running local model

This document outlines the NexaAI server commands and API endpoints for running local models as OpenAI-compatible APIs. The FastAPI-based server supports various operations including text generation, chat completions, function calling, image generation, and audio processing.

Key Features

  • Multiple Endpoints: Supports text generation, chat completions, function calling, image generation, and audio processing.

  • Streaming Support: Enables real-time text generation for interactive experiences.

  • GPU Acceleration: Utilizes GPU for improved performance.

  • Customizable Parameters: Allows fine-tuning of generation parameters.

Server Command

You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:

usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] [--nctx NCTX] [-lp] [-mt MODEL_TYPE] [-hf] model_path

Options:

  • --host: Host to bind the server to

  • --port: Port to bind the server to

  • --reload: Enable automatic reloading on code changes

  • --nctx: Length of context window

  • -lp, --local_path: Indicate the model path is local path, must be used with -mt

  • -mt, --model_type: Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]

  • -hf, --huggingface: Load model from Hugging Face Hub, must be used with -mt

Example Command:

nexa server gemma
nexa server llama2-function-calling
nexa server sd1-5
nexa server faster-whipser-large

By default nexa server will run gguf models.

To run onnx models, simply add onnx after nexa server

API Endpoints

Text Generation: /v1/completions

Generates text based on a single prompt.

Request body:

{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": [
    "string"
  ]
}

Example Response:

{
  "result": "Once upon a time, in a small village nestled among rolling hills..."
}

Chat Completions: /v1/chat/completions

Handles chat completions with support for conversation history.

Request body:

Multimodal models (VLM):

{
  "model": "anything",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 300,
  "temperature": 0.7,
  "top_p": 0.95,
  "top_k": 40,
  "stream": false
}

Traditional NLP models:

{
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ],
  "max_tokens": 128,
  "temperature": 0.1,
  "stream": false,
  "stop_words": []
}

Example Response:

{
  "id": "f83502df-7f5a-4825-a922-f5cece4081de",
  "object": "chat.completion",
  "created": 1723441724.914671,
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "In the heart of a mystical forest..."
      }
    }
  ]
}

Function Calling: /v1/function-calling

Call the most appropriate function based on user's prompt

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "Extract Jason is 25 years old"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "UserDetail",
        "parameters": {
          "properties": {
            "name": {
              "description": "The user's name",
              "type": "string"
            },
            "age": {
              "description": "The user's age",
              "type": "integer"
            }
          },
          "required": [
            "name",
            "age"
          ],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Function format:

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "function_description",
    "parameters": {
      "type": "object",
      "properties": {
        "property_name": {
          "type": "string | number | boolean | object | array",
          "description": "string"
        }
      },
      "required": ["array_of_required_property_names"]
    }
  }
}

Example Response:

{
  "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
  "object": "chat.completion",
  "created": 1724186442,
  "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
            "type": "function",
            "function": {
              "name": "UserDetail",
              "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
            }
          }
        ],
        "function_call": {
          "name": "",
          "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 316,
    "total_tokens": 331
  }
}

Text-to-Image: /v1/txt2img

Generates images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

Image-to-Image: /v1/img2img

Modifies existing images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "path/to/image",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

Audio Transcriptions: /v1/audio/transcriptions

Transcribes audio files to text.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)

  • language (string): Language code (e.g., 'en', 'fr')

  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
}

Audio Translations: /v1/audio/translations

Translates audio files to text in English.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)

  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
}

Generate Embeddings: /v1/embeddings

Generate embeddings for a given text.

Request body:

{
  "input": "I love Nexa AI.",
  "normalize": false,
  "truncate": true
}

Example response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ... (omitted for spacing)
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "/home/ubuntu/models/embedding_models/mxbai-embed-large-q4_0.gguf",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Last updated