NexaAI Python SDK API Reference
This document provides comprehensive API documentation for all modules and classes in the NexaAI Python SDK. For platform-specific setup and complete examples, please refer to the platform guides:- macOS Guide - Apple Silicon optimization
- Windows x64 Guide - CPU/GPU acceleration
- Windows ARM64 Guide - NPU acceleration
Core Modules
LLM (Large Language Model)
TheLLM class provides text generation and conversation capabilities.
Initialization
Core Methods
generate_stream(prompt, config=None)
- Generates text tokens in streaming fashion
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configuration
- Returns: Generator yielding text tokens, returns
GenerateResultwhen exhausted
generate(prompt, config=None, on_token=None)
- Generates text with optional streaming token callback
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configurationon_token(Callable[[str], bool], optional): Callback function for streaming tokens
- Returns:
GenerateResultcontaining the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
- Applies chat template to conversation
- Parameters:
messages(List[LlmChatMessage]): Conversation historytools(str, optional): Optional tool JSON stringenable_thinking(bool): Enable thinking mode
- Returns: Formatted prompt string
reset()
- Resets conversation state and clears KV cache
save_kv_cache(path)
- Saves key-value cache to file
- Parameters:
path(str): File path to save cache
load_kv_cache(path)
- Loads key-value cache from file
- Parameters:
path(str): File path to load cache
VLM (Vision Language Model)
TheVLM class provides multimodal understanding and generation capabilities.
Initialization
Core Methods
generate_stream(prompt, config=None)
- Generates text tokens in streaming fashion
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configuration with image_paths/audio_paths
- Returns: Generator yielding text tokens, returns
GenerateResultwhen exhausted
generate(prompt, config=None, on_token=None)
- Generates text with optional streaming token callback
- Parameters:
prompt(str): Input promptconfig(GenerationConfig, optional): Generation configurationon_token(Callable[[str], bool], optional): Callback function for streaming tokens
- Returns:
GenerateResultcontaining the generated text and profile data
apply_chat_template(messages, tools=None, enable_thinking=False)
- Applies chat template to multimodal conversation
- Parameters:
messages(List[VlmChatMessage]): Multimodal conversation historytools(str, optional): Optional tool JSON stringenable_thinking(bool): Enable thinking mode
- Returns: Formatted prompt string
reset()
- Resets conversation state and clears KV cache
ASR (Automatic Speech Recognition)
TheASR class provides speech-to-text transcription capabilities.
Initialization
Core Methods
transcribe(audio_path, language=None, timestamps=None, beam_size=5)
- Transcribes audio file to text
- Parameters:
audio_path(str): Path to audio filelanguage(str, optional): Language code (“en”, “zh”, or None for auto-detect)timestamps(str, optional): Timestamp format (“none”, “segment”, “word”)beam_size(int): Beam size for decoding (default: 5)
- Returns:
TranscribeResultcontaining transcript, confidence scores, timestamps, and profile data
list_supported_languages()
- Lists supported languages
- Returns: List of language codes
stream_begin(language=None, on_transcription=None, chunk_duration=0.5, overlap_duration=0.1, sample_rate=16000, max_queue_size=10, buffer_size=4096, timestamps=None, beam_size=5)
- Begins streaming ASR transcription
- Parameters:
language(str, optional): Language codeon_transcription(Callable[[str], None], optional): Callback function for transcription resultschunk_duration(float): Audio chunk duration in secondsoverlap_duration(float): Overlap duration in secondssample_rate(int): Audio sample ratemax_queue_size(int): Maximum queue sizebuffer_size(int): Buffer sizetimestamps(str, optional): Timestamp formatbeam_size(int): Beam size for decoding
stream_push_audio(audio_data)
- Pushes audio data for streaming transcription
- Parameters:
audio_data(List[float]): List of audio samples (float values)
stream_stop(graceful=True)
- Stops streaming transcription
- Parameters:
graceful(bool): If True, wait for current processing to complete
Embedder
TheEmbedder class provides text vectorization and similarity computation.
Initialization
Core Methods
embed(texts=None, input_ids=None, image_paths=None, task_type=None, batch_size=32, normalize=False, normalize_method=None)
- Generates embeddings for texts, tokens, or images
- Parameters:
texts(List[str], optional): List of text strings to embedinput_ids(List[List[int]], optional): List of token ID sequences (alternative to texts)image_paths(List[str], optional): List of image file paths to embedtask_type(str, optional): Task type for embedding (e.g., “classification”, “retrieval”)batch_size(int): Batch size for processing (default: 32)normalize(bool): Whether to normalize embeddings (default: False)normalize_method(str, optional): Normalization method (e.g., “l2”)
- Returns:
EmbedResultcontaining embeddings and profile data
embedding_dim()
- Returns the dimension of embeddings
- Returns: Integer dimension size
Reranker
TheReranker class provides document reranking capabilities.
Initialization
Core Methods
rerank(query, documents, batch_size=32, normalize=False, normalize_method=None)
- Reranks documents based on query relevance
- Parameters:
query(str): Search querydocuments(List[str]): List of documents to rankbatch_size(int): Batch size for processing (default: 32)normalize(bool): Whether to normalize scores (default: False)normalize_method(str, optional): Normalization method
- Returns:
RerankResultcontaining scores and profile data
CV (Computer Vision)
TheCV class provides computer vision capabilities including OCR.
Initialization
Core Methods
infer(input_image_path)
- Performs inference on image
- Parameters:
input_image_path(str): Path to input image
- Returns:
CVResultcontaining detection/classification results
Diarize
TheDiarize class provides speaker diarization capabilities.
Initialization
Core Methods
infer(audio_path, min_speakers=None, max_speakers=None)
- Performs speaker diarization on audio file
- Parameters:
audio_path(str): Path to the audio filemin_speakers(int, optional): Minimum number of speakersmax_speakers(int, optional): Maximum number of speakers
- Returns:
DiarizeResultcontaining speech segments and metadata
TTS (Text-to-Speech)
TheTTS class provides text-to-speech synthesis capabilities.
Initialization
Core Methods
synthesize(text, output_path, voice=None, speed=1.0, seed=-1, sample_rate=22050)
- Synthesizes text to speech
- Parameters:
text(str): Text to synthesizeoutput_path(str): Path to save the audio filevoice(str, optional): Voice identifier. If None, uses defaultspeed(float): Speech speed multiplier (default: 1.0)seed(int): Random seed. -1 for random (default: -1)sample_rate(int): Audio sample rate (default: 22050)
- Returns:
SynthesizeResultcontaining audio file path and metadata
list_available_voices()
- Lists available voices
- Returns: List of voice identifiers
ImageGen
TheImageGen class provides image generation capabilities.
Initialization
Core Methods
txt2img(prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=1.0)
- Generates image from text prompt
- Parameters:
prompt(str): Text prompt for image generationoutput_path(str): Path to save the generated imagenegative_prompts(List[str], optional): List of negative promptsheight(int): Image height in pixels (default: 512)width(int): Image width in pixels (default: 512)method(str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)steps(int): Number of diffusion steps (default: 50)guidance_scale(float): Guidance scale for classifier-free guidance (default: 7.5)eta(float): Eta parameter for DDIM (default: 0.0)seed(int): Random seed. -1 for random (default: -1)strength(float): Strength parameter (default: 1.0)
- Returns:
ImageGenResultcontaining output image path
img2img(init_image_path, prompt, output_path, negative_prompts=None, height=512, width=512, method='ddpm', steps=50, guidance_scale=7.5, eta=0.0, seed=-1, strength=0.8)
- Generates image from existing image and text prompt
- Parameters:
init_image_path(str): Path to the initial imageprompt(str): Text prompt for image generationoutput_path(str): Path to save the generated imagenegative_prompts(List[str], optional): List of negative promptsheight(int): Image height in pixels (default: 512)width(int): Image width in pixels (default: 512)method(str): Sampling method (e.g., ‘ddpm’, ‘ddim’) (default: ‘ddpm’)steps(int): Number of diffusion steps (default: 50)guidance_scale(float): Guidance scale for classifier-free guidance (default: 7.5)eta(float): Eta parameter for DDIM (default: 0.0)seed(int): Random seed. -1 for random (default: -1)strength(float): Strength parameter (0.0-1.0) (default: 0.8)
- Returns:
ImageGenResultcontaining output image path
Configuration Classes
ModelConfig
General model configuration.GenerationConfig
Text generation configuration.Message Classes
LlmChatMessage
Represents a single message in a conversation for LLM.VlmChatMessage
Represents a multimodal message with multiple content types for VLM.VlmContent
Represents individual content within a multimodal message.Result Classes
GenerateResult
Result of text generation from LLM or VLM.TranscribeResult
Result of ASR transcription.EmbedResult
Result of embedding generation.RerankResult
Result of document reranking.CVResult
Result of CV inference.DiarizeResult
Result of speaker diarization.SynthesizeResult
Result of TTS synthesis.ImageGenResult
Result of image generation.Plugin ID Options
Theplugin_id parameter supports different backends:
cpu_gpu: Default, supports both CPU and GPUmetal: Apple Silicon optimized (macOS)mlx: MLX backend (macOS)npu: NPU acceleration (Windows ARM64, Snapdragon X Elite)qnn: QNN backendnexaml: NexaML optimized backendllama_cpp: For GGUF format modelsonnx: ONNX runtime backend
Error Handling
The SDK provides comprehensive error handling with descriptive error messages. Common error scenarios include:- Invalid model paths
- Unsupported plugin/device combinations
- Authentication token issues
- Memory allocation failures
- Model loading errors
Performance Tips
- Model Selection: Choose models optimized for your platform and use case
- Batch Processing: Use appropriate batch sizes for embedding and reranking tasks
- Memory Management: Monitor memory usage, especially for large models
- Caching: Use KV cache for conversational applications
- Profiling: Access
profile_datafrom result objects to monitor performance
Next Steps
- Explore platform-specific examples in the macOS Guide
- Check out the Windows x64 Guide for CPU/GPU optimization
- Visit the Windows ARM64 Guide for NPU acceleration
Was this page helpful?