NexaAI Python SDK API Reference
This document provides comprehensive API documentation for all modules and classes in the NexaAI Python SDK. For platform-specific setup and complete examples, please refer to the platform guides:- macOS Guide - Apple Silicon optimization
- Windows x64 Guide - CPU/GPU acceleration
- Windows ARM64 Guide - NPU acceleration
Core Modules
LLM (Large Language Model)
TheLLM class provides text generation and conversation capabilities.
Initialization
Core Methods
generate_stream(prompt, config=None)
- Generates text tokens in streaming fashion
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configuration
- Returns: Generator yielding text tokens, returns
GenerateResultwhen exhausted
generate(prompt, config=None, on_token=None)
- Generates text with optional streaming token callback
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configurationon_token(Callable[[str], bool], optional): Callback function for streaming tokens
- Returns:
GenerateResultcontaining the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
- Applies chat template to conversation
- Parameters:
messages(List[LlmChatMessage]): Conversation historytools(str, optional): Optional tool JSON stringenable_thinking(bool): Enable thinking mode
- Returns: Formatted prompt string
reset()
- Resets conversation state and clears KV cache
save_kv_cache(path)
- Saves key-value cache to file
- Parameters:
path(str): File path to save cache
load_kv_cache(path)
- Loads key-value cache from file
- Parameters:
path(str): File path to load cache
VLM (Vision Language Model)
TheVLM class provides multimodal understanding and generation capabilities.
Initialization
Core Methods
generate_stream(prompt, config=None)
- Generates text tokens in streaming fashion
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configuration with image_paths/audio_paths
- Returns: Generator yielding text tokens, returns
GenerateResultwhen exhausted
generate(prompt, config=None, on_token=None)
- Generates text with optional streaming token callback
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configurationon_token(Callable[[str], bool], optional): Callback function for streaming tokens
- Returns:
GenerateResultcontaining the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
- Applies chat template to multimodal conversation
- Parameters:
messages(List[VlmChatMessage]): Multimodal conversation historytools(str, optional): Optional tool JSON stringenable_thinking(bool): Enable thinking mode
- Returns: Formatted prompt string
reset()
- Resets conversation state and clears KV cache
ASR (Automatic Speech Recognition)
TheASR class provides speech-to-text transcription capabilities.
Initialization
Core Methods
transcribe(audio_path, language=None, timestamps=None, beam_size=5)
- Transcribes audio file to text
- Parameters:
audio_path(str): Path to audio filelanguage(str, optional): Language code (“en”, “zh”, or None for auto-detect)timestamps(str, optional): Timestamp format (“none”, “segment”, “word”)beam_size(int): Beam size for decoding (default: 5)
- Returns:
TranscribeResultcontaining transcript, confidence scores, timestamps, and profile data
list_supported_languages()
- Lists supported languages
- Returns: List of language codes
stream_begin(language=None, on_transcription=None, chunk_duration=0.5, overlap_duration=0.1, sample_rate=16000, max_queue_size=10, buffer_size=4096, timestamps=None, beam_size=5)
- Begins streaming ASR transcription
- Parameters:
language(str, optional): Language codeon_transcription(Callable[[str], None], optional): Callback function for transcription resultschunk_duration(float): Audio chunk duration in secondsoverlap_duration(float): Overlap duration in secondssample_rate(int): Audio sample ratemax_queue_size(int): Maximum queue sizebuffer_size(int): Buffer sizetimestamps(str, optional): Timestamp formatbeam_size(int): Beam size for decoding
stream_push_audio(audio_data)
- Pushes audio data for streaming transcription
- Parameters:
audio_data(List[float]): List of audio samples (float values)
stream_stop(graceful=True)
- Stops streaming transcription
- Parameters:
graceful(bool): If True, wait for current processing to complete
Embedder
TheEmbedder class provides text vectorization and similarity computation.
Initialization
Core Methods
embed(texts=None, input_ids=None, image_paths=None, task_type=None, batch_size=32, normalize=False, normalize_method=None)
- Generates embeddings for texts, tokens, or images
- Parameters:
texts(List[str], optional): List of text strings to embedinput_ids(List[List[int]], optional): List of token ID sequences (alternative to texts)image_paths(List[str], optional): List of image file paths to embedtask_type(str, optional): Task type for embedding (e.g., “classification”, “retrieval”)batch_size(int): Batch size for processing (default: 32)normalize(bool): Whether to normalize embeddings (default: False)normalize_method(str, optional): Normalization method (e.g., “l2”)
- Returns:
EmbedResultcontaining embeddings and profile data
embedding_dim()
- Returns the dimension of embeddings
- Returns: Integer dimension size
Reranker
TheReranker class provides document reranking capabilities.
Initialization
Core Methods
rerank(query, documents, batch_size=32, normalize=False, normalize_method=None)
- Reranks documents based on query relevance
- Parameters:
query(str): Search querydocuments(List[str]): List of documents to rankbatch_size(int): Batch size for processing (default: 32)normalize(bool): Whether to normalize scores (default: False)normalize_method(str, optional): Normalization method
- Returns:
RerankResultcontaining scores and profile data
CV (Computer Vision)
TheCV class provides computer vision capabilities including OCR.
Initialization
Core Methods
infer(input_image_path)
- Performs inference on image
- Parameters:
input_image_path(str): Path to input image
- Returns:
CVResultcontaining detection/classification results
Diarize
TheDiarize class provides speaker diarization capabilities.
Initialization
Core Methods
infer(audio_path, min_speakers=None, max_speakers=None)
- Performs speaker diarization on audio file
- Parameters:
audio_path(str): Path to the audio filemin_speakers(int, optional): Minimum number of speakersmax_speakers(int, optional): Maximum number of speakers
- Returns:
DiarizeResultcontaining speech segments and metadata
TTS (Text-to-Speech)
TheTTS class provides text-to-speech synthesis capabilities.
Initialization
Core Methods
synthesize(text, output_path, voice=None, speed=1.0, seed=-1, sample_rate=22050)
- Synthesizes text to speech
- Parameters:
text(str): Text to synthesizeoutput_path(str): Path to save the audio filevoice(str, optional): Voice identifier. If None, uses defaultspeed(float): Speech speed multiplier (default: 1.0)seed(int): Random seed. -1 for random (default: -1)sample_rate(int): Audio sample rate (default: 22050)
- Returns:
SynthesizeResultcontaining audio file path and metadata
list_available_voices()
- Lists available voices
- Returns: List of voice identifiers
ImageGen
TheImageGen class provides image generation capabilities.
Initialization
Core Methods
txt2img(prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=1.0)
- Generates image from text prompt
- Parameters:
prompt(str): Text prompt for image generationoutput_path(str): Path to save the generated imagenegative_prompts(List[str], optional): List of negative promptsheight(int): Image height in pixels (default: 512)width(int): Image width in pixels (default: 512)method(str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)steps(int): Number of diffusion steps (default: 50)guidance_scale(float): Guidance scale for classifier-free guidance (default: 7.5)eta(float): Eta parameter for DDIM (default: 0.0)seed(int): Random seed. -1 for random (default: -1)strength(float): Strength parameter (default: 1.0)
- Returns:
ImageGenResultcontaining output image path
img2img(init_image_path, prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=0.8)
- Generates image from existing image and text prompt
- Parameters:
init_image_path(str): Path to the initial imageprompt(str): Text prompt for image generationoutput_path(str): Path to save the generated imagenegative_prompts(List[str], optional): List of negative promptsheight(int): Image height in pixels (default: 512)width(int): Image width in pixels (default: 512)method(str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)steps(int): Number of diffusion steps (default: 50)guidance_scale(float): Guidance scale for classifier-free guidance (default: 7.5)eta(float): Eta parameter for DDIM (default: 0.0)seed(int): Random seed. -1 for random (default: -1)strength(float): Strength parameter (0.0-1.0) (default: 0.8)
- Returns:
ImageGenResultcontaining output image path
Configuration Classes
ModelConfig
General model configuration.GenerationConfig
Text generation configuration.Message Classes
LlmChatMessage
Represents a single message in a conversation for LLM.VlmChatMessage
Represents a multimodal message with multiple content types for VLM.VlmContent
Represents individual content within a multimodal message.Was this page helpful?