Documentations
Join DiscordModel HubGitHub
  • Nexa On-Device AI Hub Overview
  • Getting Started
    • 🔗Installation
    • ▶️Use Model
    • ⬆️Upload Model
  • NEXA SDK
    • 📋CLI Reference
    • Python Interface
      • GGUF
      • ONNX
    • 🚀Inference
      • GGUF
      • ONNX
    • ⚙️Local server
  • Resources
    • ‼️Troubleshoot
Powered by GitBook
On this page
  • Key Features
  • Server Command
  • API Endpoints
  • Text Generation: /v1/completions
  • Chat Completions: /v1/chat/completions
  • Function Calling: /v1/function-calling
  • Text-to-Image: /v1/txt2img
  • Image-to-Image: /v1/img2img
  • Audio Transcriptions: /v1/audio/transcriptions
  • Audio Translations: /v1/audio/translations
  • Generate Embeddings: /v1/embeddings

Was this helpful?

  1. NEXA SDK

Local server

Start local server running local model

This document outlines the NexaAI server commands and API endpoints for running local models as OpenAI-compatible APIs. The FastAPI-based server supports various operations including text generation, chat completions, function calling, image generation, and audio processing.

Key Features

  • Multiple Endpoints: Supports text generation, chat completions, function calling, image generation, and audio processing.

  • Streaming Support: Enables real-time text generation for interactive experiences.

  • GPU Acceleration: Utilizes GPU for improved performance.

  • Customizable Parameters: Allows fine-tuning of generation parameters.

Server Command

You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:

usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] [--nctx NCTX] [-lp] [-mt MODEL_TYPE] [-hf] model_path

Options:

  • --host: Host to bind the server to

  • --port: Port to bind the server to

  • --reload: Enable automatic reloading on code changes

  • --nctx: Length of context window

  • -lp, --local_path: Indicate the model path is local path, must be used with -mt

  • -mt, --model_type: Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]

  • -hf, --huggingface: Load model from Hugging Face Hub, must be used with -mt

Example Command:

nexa server gemma
nexa server llama2-function-calling
nexa server sd1-5
nexa server faster-whipser-large

By default nexa server will run gguf models.

To run onnx models, simply add onnx after nexa server

API Endpoints

Text Generation: /v1/completions

Generates text based on a single prompt.

Request body:

{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": [
    "string"
  ]
}

Example Response:

{
  "result": "Once upon a time, in a small village nestled among rolling hills..."
}

Chat Completions: /v1/chat/completions

Handles chat completions with support for conversation history.

Request body:

Multimodal models. It can support both url or path in request body:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What’s in this image?",
          "type": "text"
        },
        {
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          },
          "type": "image_url"
        }
      ]
    }
  ],
  "max_tokens": 128,
  "temperature": 0.2,
  "stream": false,
  "stop_words": [],
  "top_k": 40,
  "top_p": 0.95
}
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What’s in this image?",
          "type": "text"
        },
        {
          "image_url": {
            "path": "/path/to/local/image.jpg"
          },
          "type": "image_url"
        }
      ]
    }
  ],
  "max_tokens": 128,
  "temperature": 0.2,
  "stream": false,
  "stop_words": [],
  "top_k": 40,
  "top_p": 0.95
}

Traditional NLP models:

{
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ],
  "max_tokens": 128,
  "temperature": 0.1,
  "stream": false,
  "stop_words": []
}

Example Response:

{
  "id": "f83502df-7f5a-4825-a922-f5cece4081de",
  "object": "chat.completion",
  "created": 1723441724.914671,
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "In the heart of a mystical forest..."
      }
    }
  ]
}

Function Calling: /v1/function-calling

Call the most appropriate function based on user's prompt

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "Extract Jason is 25 years old"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "UserDetail",
        "parameters": {
          "properties": {
            "name": {
              "description": "The user's name",
              "type": "string"
            },
            "age": {
              "description": "The user's age",
              "type": "integer"
            }
          },
          "required": [
            "name",
            "age"
          ],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Function format:

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "function_description",
    "parameters": {
      "type": "object",
      "properties": {
        "property_name": {
          "type": "string | number | boolean | object | array",
          "description": "string"
        }
      },
      "required": ["array_of_required_property_names"]
    }
  }
}

Example Response:

{
  "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
  "object": "chat.completion",
  "created": 1724186442,
  "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
            "type": "function",
            "function": {
              "name": "UserDetail",
              "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
            }
          }
        ],
        "function_call": {
          "name": "",
          "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 316,
    "total_tokens": 331
  }
}

Text-to-Image: /v1/txt2img

Generates images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

Image-to-Image: /v1/img2img

Modifies existing images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "path/to/image",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

Audio Transcriptions: /v1/audio/transcriptions

Transcribes audio files to text.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)

  • language (string): Language code (e.g., 'en', 'fr')

  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
}

Audio Translations: /v1/audio/translations

Translates audio files to text in English.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)

  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
}

Generate Embeddings: /v1/embeddings

Generate embeddings for a given text.

Request body:

{
  "input": "I love Nexa AI.",
  "normalize": false,
  "truncate": true
}

Example response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ... (omitted for spacing)
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "/home/ubuntu/models/embedding_models/mxbai-embed-large-q4_0.gguf",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}
PreviousONNXNextTroubleshoot

Last updated 4 months ago

Was this helpful?

⚙️