Skip to main content

NexaAI Python SDK API Reference

This document provides comprehensive API documentation for all modules and classes in the NexaAI Python SDK. For platform-specific setup and complete examples, please refer to the platform guides:

Core Modules

LLM (Large Language Model)

The LLM class provides text generation and conversation capabilities.

Initialization

from nexaai import LLM, ModelConfig

# Initialize LLM
config = ModelConfig()
llm = LLM.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="cpu")

Core Methods

generate_stream(prompt, config=None)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
  • Returns: Generator yielding text tokens, returns GenerateResult when exhausted
generate(prompt, config=None, on_token=None)
  • Generates text with optional streaming token callback
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
    • on_token (Callable[[str], bool], optional): Callback function for streaming tokens
  • Returns: GenerateResult containing the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
  • Applies chat template to conversation
  • Parameters:
    • messages (List[LlmChatMessage]): Conversation history
    • tools (str, optional): Optional tool JSON string
    • enable_thinking (bool): Enable thinking mode
  • Returns: Formatted prompt string
reset()
  • Resets conversation state and clears KV cache
save_kv_cache(path)
  • Saves key-value cache to file
  • Parameters:
    • path (str): File path to save cache
load_kv_cache(path)
  • Loads key-value cache from file
  • Parameters:
    • path (str): File path to load cache

VLM (Vision Language Model)

The VLM class provides multimodal understanding and generation capabilities.

Initialization

from nexaai import VLM, ModelConfig

# Initialize VLM
config = ModelConfig()
vlm = VLM.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="")

Core Methods

generate_stream(prompt, config=None)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration with image_paths/audio_paths
  • Returns: Generator yielding text tokens, returns GenerateResult when exhausted
generate(prompt, config=None, on_token=None)
  • Generates text with optional streaming token callback
  • Parameters:
    • prompt (str): Input prompt
    • config (GenerationConfig, optional): Generation configuration
    • on_token (Callable[[str], bool], optional): Callback function for streaming tokens
  • Returns: GenerateResult containing the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
  • Applies chat template to multimodal conversation
  • Parameters:
    • messages (List[VlmChatMessage]): Multimodal conversation history
    • tools (str, optional): Optional tool JSON string
    • enable_thinking (bool): Enable thinking mode
  • Returns: Formatted prompt string
reset()
  • Resets conversation state and clears KV cache

ASR (Automatic Speech Recognition)

The ASR class provides speech-to-text transcription capabilities.

Initialization

from nexaai import ASR

# Initialize ASR
asr = ASR.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

transcribe(audio_path, language=None, timestamps=None, beam_size=5)
  • Transcribes audio file to text
  • Parameters:
    • audio_path (str): Path to audio file
    • language (str, optional): Language code (“en”, “zh”, or None for auto-detect)
    • timestamps (str, optional): Timestamp format (“none”, “segment”, “word”)
    • beam_size (int): Beam size for decoding (default: 5)
  • Returns: TranscribeResult containing transcript, confidence scores, timestamps, and profile data
list_supported_languages()
  • Lists supported languages
  • Returns: List of language codes
stream_begin(language=None, on_transcription=None, chunk_duration=0.5, overlap_duration=0.1, sample_rate=16000, max_queue_size=10, buffer_size=4096, timestamps=None, beam_size=5)
  • Begins streaming ASR transcription
  • Parameters:
    • language (str, optional): Language code
    • on_transcription (Callable[[str], None], optional): Callback function for transcription results
    • chunk_duration (float): Audio chunk duration in seconds
    • overlap_duration (float): Overlap duration in seconds
    • sample_rate (int): Audio sample rate
    • max_queue_size (int): Maximum queue size
    • buffer_size (int): Buffer size
    • timestamps (str, optional): Timestamp format
    • beam_size (int): Beam size for decoding
stream_push_audio(audio_data)
  • Pushes audio data for streaming transcription
  • Parameters:
    • audio_data (List[float]): List of audio samples (float values)
stream_stop(graceful=True)
  • Stops streaming transcription
  • Parameters:
    • graceful (bool): If True, wait for current processing to complete

Embedder

The Embedder class provides text vectorization and similarity computation.

Initialization

from nexaai import Embedder

# Initialize Embedder
embedder = Embedder.from_(model="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

embed(texts=None, input_ids=None, image_paths=None, task_type=None, batch_size=32, normalize=False, normalize_method=None)
  • Generates embeddings for texts, tokens, or images
  • Parameters:
    • texts (List[str], optional): List of text strings to embed
    • input_ids (List[List[int]], optional): List of token ID sequences (alternative to texts)
    • image_paths (List[str], optional): List of image file paths to embed
    • task_type (str, optional): Task type for embedding (e.g., “classification”, “retrieval”)
    • batch_size (int): Batch size for processing (default: 32)
    • normalize (bool): Whether to normalize embeddings (default: False)
    • normalize_method (str, optional): Normalization method (e.g., “l2”)
  • Returns: EmbedResult containing embeddings and profile data
embedding_dim()
  • Returns the dimension of embeddings
  • Returns: Integer dimension size

Reranker

The Reranker class provides document reranking capabilities.

Initialization

from nexaai import Reranker

# Initialize Reranker
reranker = Reranker.from_(model="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

rerank(query, documents, batch_size=32, normalize=False, normalize_method=None)
  • Reranks documents based on query relevance
  • Parameters:
    • query (str): Search query
    • documents (List[str]): List of documents to rank
    • batch_size (int): Batch size for processing (default: 32)
    • normalize (bool): Whether to normalize scores (default: False)
    • normalize_method (str, optional): Normalization method
  • Returns: RerankResult containing scores and profile data

CV (Computer Vision)

The CV class provides computer vision capabilities including OCR.

Initialization

from nexaai import CV

# Initialize CV Model
cv = CV.from_(model="model_name_or_path", capabilities=0, plugin_id="cpu_gpu")

Core Methods

infer(input_image_path)
  • Performs inference on image
  • Parameters:
    • input_image_path (str): Path to input image
  • Returns: CVResult containing detection/classification results

Diarize

The Diarize class provides speaker diarization capabilities.

Initialization

from nexaai import Diarize

# Initialize Diarize
diarize = Diarize.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

infer(audio_path, min_speakers=None, max_speakers=None)
  • Performs speaker diarization on audio file
  • Parameters:
    • audio_path (str): Path to the audio file
    • min_speakers (int, optional): Minimum number of speakers
    • max_speakers (int, optional): Maximum number of speakers
  • Returns: DiarizeResult containing speech segments and metadata

TTS (Text-to-Speech)

The TTS class provides text-to-speech synthesis capabilities.

Initialization

from nexaai import TTS

# Initialize TTS
tts = TTS.from_(model="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

synthesize(text, output_path, voice=None, speed=1.0, seed=-1, sample_rate=22050)
  • Synthesizes text to speech
  • Parameters:
    • text (str): Text to synthesize
    • output_path (str): Path to save the audio file
    • voice (str, optional): Voice identifier. If None, uses default
    • speed (float): Speech speed multiplier (default: 1.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • sample_rate (int): Audio sample rate (default: 22050)
  • Returns: SynthesizeResult containing audio file path and metadata
list_available_voices()
  • Lists available voices
  • Returns: List of voice identifiers

ImageGen

The ImageGen class provides image generation capabilities.

Initialization

from nexaai import ImageGen, ModelConfig

# Initialize ImageGen
config = ModelConfig()
imagegen = ImageGen.from_(model="model_name_or_path", config=config, plugin_id="cpu_gpu", device_id="cpu")

Core Methods

txt2img(prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=1.0)
  • Generates image from text prompt
  • Parameters:
    • prompt (str): Text prompt for image generation
    • output_path (str): Path to save the generated image
    • negative_prompts (List[str], optional): List of negative prompts
    • height (int): Image height in pixels (default: 512)
    • width (int): Image width in pixels (default: 512)
    • method (str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)
    • steps (int): Number of diffusion steps (default: 50)
    • guidance_scale (float): Guidance scale for classifier-free guidance (default: 7.5)
    • eta (float): Eta parameter for DDIM (default: 0.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • strength (float): Strength parameter (default: 1.0)
  • Returns: ImageGenResult containing output image path
img2img(init_image_path, prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=0.8)
  • Generates image from existing image and text prompt
  • Parameters:
    • init_image_path (str): Path to the initial image
    • prompt (str): Text prompt for image generation
    • output_path (str): Path to save the generated image
    • negative_prompts (List[str], optional): List of negative prompts
    • height (int): Image height in pixels (default: 512)
    • width (int): Image width in pixels (default: 512)
    • method (str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)
    • steps (int): Number of diffusion steps (default: 50)
    • guidance_scale (float): Guidance scale for classifier-free guidance (default: 7.5)
    • eta (float): Eta parameter for DDIM (default: 0.0)
    • seed (int): Random seed. -1 for random (default: -1)
    • strength (float): Strength parameter (0.0-1.0) (default: 0.8)
  • Returns: ImageGenResult containing output image path

Configuration Classes

ModelConfig

General model configuration.
from nexaai import ModelConfig

config = ModelConfig()
# Default configuration is usually sufficient for most use cases

GenerationConfig

Text generation configuration.
from nexaai import GenerationConfig

config = GenerationConfig(
    max_tokens=100,           # Maximum tokens to generate
    image_paths=["path/to/image.jpg"],  # Image paths for VLM (optional)
    audio_paths=["path/to/audio.wav"],  # Audio paths for VLM (optional)
)

Message Classes

LlmChatMessage

Represents a single message in a conversation for LLM.
from nexaai import LlmChatMessage

message = LlmChatMessage(role="user", content="Hello, how are you?")

VlmChatMessage

Represents a multimodal message with multiple content types for VLM.
from nexaai import VlmChatMessage, VlmContent

contents = [
    VlmContent(type="text", text="Describe this image"),
    VlmContent(type="image", text="path/to/image.jpg")
]
message = VlmChatMessage(role="user", contents=contents)

VlmContent

Represents individual content within a multimodal message.
from nexaai import VlmContent

# Text content
text_content = VlmContent(type="text", text="Hello")

# Image content
image_content = VlmContent(type="image", text="path/to/image.jpg")

# Audio content
audio_content = VlmContent(type="audio", text="path/to/audio.wav")

Result Classes

GenerateResult

Result of text generation from LLM or VLM.
result = llm.generate(prompt, GenerationConfig(max_tokens=100))
print(result.full_text)  # Generated text
print(result.profile_data)  # Performance profiling data

TranscribeResult

Result of ASR transcription.
result = asr.transcribe(audio_path="audio.wav")
print(result.transcript)  # Transcription text
print(result.confidence_scores)  # Confidence scores
print(result.timestamps)  # Timestamps
print(result.profile_data)  # Performance profiling data

EmbedResult

Result of embedding generation.
result = embedder.embed(texts=["Hello", "World"])
print(result.embeddings)  # List of embedding vectors
print(result.profile_data)  # Performance profiling data

RerankResult

Result of document reranking.
result = reranker.rerank(query="query", documents=["doc1", "doc2"])
print(result.scores)  # List of relevance scores
print(result.profile_data)  # Performance profiling data

CVResult

Result of CV inference.
result = cv.infer(image_path="image.jpg")
print(result.results)  # List of CVResultItem objects
for item in result.results:
    print(item.text)  # Detected text
    print(item.confidence)  # Confidence score
    print(item.bbox)  # Bounding box (if available)

DiarizeResult

Result of speaker diarization.
result = diarize.infer(audio_path="audio.wav")
print(result.num_speakers)  # Number of speakers
print(result.duration)  # Audio duration
print(result.segments)  # List of SpeechSegment objects
for segment in result.segments:
    print(segment.start_time)  # Start time
    print(segment.end_time)  # End time
    print(segment.speaker_label)  # Speaker label

SynthesizeResult

Result of TTS synthesis.
result = tts.synthesize(text="Hello", output_path="output.wav")
print(result.audio_path)  # Path to generated audio file
print(result.duration_seconds)  # Audio duration
print(result.sample_rate)  # Sample rate
print(result.channels)  # Number of channels
print(result.num_samples)  # Number of samples

ImageGenResult

Result of image generation.
result = imagegen.txt2img(prompt="A beautiful sunset", output_path="output.png")
print(result.output_image_path)  # Path to generated image

Plugin ID Options

The plugin_id parameter supports different backends:
  • cpu_gpu: Default, supports both CPU and GPU
  • metal: Apple Silicon optimized (macOS)
  • mlx: MLX backend (macOS)
  • npu: NPU acceleration (Windows ARM64, Snapdragon X Elite)
  • qnn: QNN backend
  • nexaml: NexaML optimized backend
  • llama_cpp: For GGUF format models
  • onnx: ONNX runtime backend

Error Handling

The SDK provides comprehensive error handling with descriptive error messages. Common error scenarios include:
  • Invalid model paths
  • Unsupported plugin/device combinations
  • Authentication token issues
  • Memory allocation failures
  • Model loading errors
For detailed troubleshooting, refer to the platform-specific guides.

Performance Tips

  1. Model Selection: Choose models optimized for your platform and use case
  2. Batch Processing: Use appropriate batch sizes for embedding and reranking tasks
  3. Memory Management: Monitor memory usage, especially for large models
  4. Caching: Use KV cache for conversational applications
  5. Profiling: Access profile_data from result objects to monitor performance

Next Steps