Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nexa.ai/llms.txt

Use this file to discover all available pages before exploring further.

NexaAI Python SDK API Reference

This document provides comprehensive API documentation for all modules and classes in the NexaAI Python SDK. For platform-specific setup and complete examples, please refer to the platform guides:

Core Modules

LLM (Large Language Model)

The LLM class provides text generation and conversation capabilities.

Initialization

from nexaai import LLM, ModelConfig

# Initialize LLM
config = ModelConfig()
llm = LLM.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="cpu")

Core Methods

generate_stream(prompt, config=None)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
  • Returns: Generator yielding text tokens, returns GenerateResult when exhausted
generate(prompt, config=None, on_token=None)
  • Generates text with optional streaming token callback
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
    • on_token (Callable[[str], bool], optional): Callback function for streaming tokens
  • Returns: GenerateResult containing the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
  • Applies chat template to conversation
  • Parameters:
    • messages (List[LlmChatMessage]): Conversation history
    • tools (str, optional): Optional tool JSON string
    • enable_thinking (bool): Enable thinking mode
  • Returns: Formatted prompt string
reset()
  • Resets conversation state and clears KV cache
save_kv_cache(path)
  • Saves key-value cache to file
  • Parameters:
    • path (str): File path to save cache
load_kv_cache(path)
  • Loads key-value cache from file
  • Parameters:
    • path (str): File path to load cache

VLM (Vision Language Model)

The VLM class provides multimodal understanding and generation capabilities.

Initialization

from nexaai import VLM, ModelConfig

# Initialize VLM
config = ModelConfig()
vlm = VLM.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="")

Core Methods

generate_stream(prompt, config=None)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration with image_paths/audio_paths
  • Returns: Generator yielding text tokens, returns GenerateResult when exhausted
generate(prompt, config=None, on_token=None)
  • Generates text with optional streaming token callback
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
    • on_token (Callable[[str], bool], optional): Callback function for streaming tokens
  • Returns: GenerateResult containing the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
  • Applies chat template to multimodal conversation
  • Parameters:
    • messages (List[VlmChatMessage]): Multimodal conversation history
    • tools (str, optional): Optional tool JSON string
    • enable_thinking (bool): Enable thinking mode
  • Returns: Formatted prompt string
reset()
  • Resets conversation state and clears KV cache

ASR (Automatic Speech Recognition)

The ASR class provides speech-to-text transcription capabilities.

Initialization

from nexaai import ASR

# Initialize ASR
asr = ASR.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

transcribe(audio_path, language=None, timestamps=None, beam_size=5)
  • Transcribes audio file to text
  • Parameters:
    • audio_path (str): Path to audio file
    • language (str, optional): Language code (“en”, “zh”, or None for auto-detect)
    • timestamps (str, optional): Timestamp format (“none”, “segment”, “word”)
    • beam_size (int): Beam size for decoding (default: 5)
  • Returns: TranscribeResult containing transcript, confidence scores, timestamps, and profile data
list_supported_languages()
  • Lists supported languages
  • Returns: List of language codes
stream_begin(language=None, on_transcription=None, chunk_duration=0.5, overlap_duration=0.1, sample_rate=16000, max_queue_size=10, buffer_size=4096, timestamps=None, beam_size=5)
  • Begins streaming ASR transcription
  • Parameters:
    • language (str, optional): Language code
    • on_transcription (Callable[[str], None], optional): Callback function for transcription results
    • chunk_duration (float): Audio chunk duration in seconds
    • overlap_duration (float): Overlap duration in seconds
    • sample_rate (int): Audio sample rate
    • max_queue_size (int): Maximum queue size
    • buffer_size (int): Buffer size
    • timestamps (str, optional): Timestamp format
    • beam_size (int): Beam size for decoding
stream_push_audio(audio_data)
  • Pushes audio data for streaming transcription
  • Parameters:
    • audio_data (List[float]): List of audio samples (float values)
stream_stop(graceful=True)
  • Stops streaming transcription
  • Parameters:
    • graceful (bool): If True, wait for current processing to complete

Embedder

The Embedder class provides text vectorization and similarity computation.

Initialization

from nexaai import Embedder

# Initialize Embedder
embedder = Embedder.from_(model="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

embed(texts=None, input_ids=None, image_paths=None, task_type=None, batch_size=32, normalize=False, normalize_method=None)
  • Generates embeddings for texts, tokens, or images
  • Parameters:
    • texts (List[str], optional): List of text strings to embed
    • input_ids (List[List[int]], optional): List of token ID sequences (alternative to texts)
    • image_paths (List[str], optional): List of image file paths to embed
    • task_type (str, optional): Task type for embedding (e.g., “classification”, “retrieval”)
    • batch_size (int): Batch size for processing (default: 32)
    • normalize (bool): Whether to normalize embeddings (default: False)
    • normalize_method (str, optional): Normalization method (e.g., “l2”)
  • Returns: EmbedResult containing embeddings and profile data
embedding_dim()
  • Returns the dimension of embeddings
  • Returns: Integer dimension size

Reranker

The Reranker class provides document reranking capabilities.

Initialization

from nexaai import Reranker

# Initialize Reranker
reranker = Reranker.from_(model="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

rerank(query, documents, batch_size=32, normalize=False, normalize_method=None)
  • Reranks documents based on query relevance
  • Parameters:
    • query (str): Search query
    • documents (List[str]): List of documents to rank
    • batch_size (int): Batch size for processing (default: 32)
    • normalize (bool): Whether to normalize scores (default: False)
    • normalize_method (str, optional): Normalization method
  • Returns: RerankResult containing scores and profile data

CV (Computer Vision)

The CV class provides computer vision capabilities including OCR.

Initialization

from nexaai import CV

# Initialize CV Model
cv = CV.from_(model="model_name_or_path", capabilities=0, plugin_id="cpu_gpu")

Core Methods

infer(input_image_path)
  • Performs inference on image
  • Parameters:
    • input_image_path (str): Path to input image
  • Returns: CVResult containing detection/classification results

Diarize

The Diarize class provides speaker diarization capabilities.

Initialization

from nexaai import Diarize

# Initialize Diarize
diarize = Diarize.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

infer(audio_path, min_speakers=None, max_speakers=None)
  • Performs speaker diarization on audio file
  • Parameters:
    • audio_path (str): Path to the audio file
    • min_speakers (int, optional): Minimum number of speakers
    • max_speakers (int, optional): Maximum number of speakers
  • Returns: DiarizeResult containing speech segments and metadata

TTS (Text-to-Speech)

The TTS class provides text-to-speech synthesis capabilities.

Initialization

from nexaai import TTS

# Initialize TTS
tts = TTS.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

synthesize(text, output_path, voice=None, speed=1.0, seed=-1, sample_rate=22050)
  • Synthesizes text to speech
  • Parameters:
    • text (str): Text to synthesize
    • output_path (str): Path to save the audio file
    • voice (str, optional): Voice identifier. If None, uses default
    • speed (float): Speech speed multiplier (default: 1.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • sample_rate (int): Audio sample rate (default: 22050)
  • Returns: SynthesizeResult containing audio file path and metadata
list_available_voices()
  • Lists available voices
  • Returns: List of voice identifiers

ImageGen

The ImageGen class provides image generation capabilities.

Initialization

from nexaai import ImageGen, ModelConfig

# Initialize ImageGen
config = ModelConfig()
imagegen = ImageGen.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="cpu")

Core Methods

txt2img(prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=1.0)
  • Generates image from text prompt
  • Parameters:
    • prompt (str): Text prompt for image generation
    • output_path (str): Path to save the generated image
    • negative_prompts (List[str], optional): List of negative prompts
    • height (int): Image height in pixels (default: 512)
    • width (int): Image width in pixels (default: 512)
    • method (str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)
    • steps (int): Number of diffusion steps (default: 50)
    • guidance_scale (float): Guidance scale for classifier-free guidance (default: 7.5)
    • eta (float): Eta parameter for DDIM (default: 0.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • strength (float): Strength parameter (default: 1.0)
  • Returns: ImageGenResult containing output image path
img2img(init_image_path, prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=0.8)
  • Generates image from existing image and text prompt
  • Parameters:
    • init_image_path (str): Path to the initial image
    • prompt (str): Text prompt for image generation
    • output_path (str): Path to save the generated image
    • negative_prompts (List[str], optional): List of negative prompts
    • height (int): Image height in pixels (default: 512)
    • width (int): Image width in pixels (default: 512)
    • method (str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)
    • steps (int): Number of diffusion steps (default: 50)
    • guidance_scale (float): Guidance scale for classifier-free guidance (default: 7.5)
    • eta (float): Eta parameter for DDIM (default: 0.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • strength (float): Strength parameter (0.0-1.0) (default: 0.8)
  • Returns: ImageGenResult containing output image path

Configuration Classes

ModelConfig

General model configuration.
from nexaai import ModelConfig

config = ModelConfig()
# Default configuration is usually sufficient for most use cases

GenerationConfig

Text generation configuration.
from nexaai import GenerationConfig

config = GenerationConfig(
    max_tokens=100,           # Maximum tokens to generate
    image_paths=["path/to/image.jpg"],  # Image paths for VLM (optional)
    audio_paths=["path/to/audio.wav"],  # Audio paths for VLM (optional)
)

Message Classes

LlmChatMessage

Represents a single message in a conversation for LLM.
from nexaai import LlmChatMessage

message = LlmChatMessage(role="user", content="Hello, how are you?")

VlmChatMessage

Represents a multimodal message with multiple content types for VLM.
from nexaai import VlmChatMessage, VlmContent

contents = [
    VlmContent(type="text", text="Describe this image"),
    VlmContent(type="image", text="path/to/image.jpg")
]
message = VlmChatMessage(role="user", contents=contents)

VlmContent

Represents individual content within a multimodal message.
from nexaai import VlmContent