Skip to main content

NexaAI Python SDK API Reference

This document provides comprehensive API documentation for all modules and classes in the NexaAI Python SDK. For platform-specific setup and complete examples, please refer to the platform guides:

Core Modules

LLM (Large Language Model)

The LLM class provides text generation and conversation capabilities.

Initialization

from nexaai.llm import LLM
from nexaai.common import ModelConfig

# Initialize LLM
m_cfg = ModelConfig()
llm = LLM.from_(name_or_path="model_name_or_path", m_cfg=m_cfg, plugin_id="cpu_gpu", device_id="cpu")

Core Methods

generate_stream(prompt, g_cfg)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • g_cfg (GenerationConfig): Generation configuration
  • Returns: Generator yielding text tokens
apply_chat_template(conversation)
  • Applies chat template to conversation
  • Parameters:
    • conversation (List[ChatMessage]): Conversation history
  • Returns: Formatted prompt string
get_profiling_data()
  • Returns performance profiling information
  • Returns: Dict with profiling metrics or None
save_kv_cache(path)
  • Saves key-value cache to file
  • Parameters:
    • path (str): File path to save cache
load_kv_cache(path)
  • Loads key-value cache from file
  • Parameters:
    • path (str): File path to load cache
reset()
  • Resets conversation state and clears cache

VLM (Vision Language Model)

The VLM class provides multimodal understanding and generation capabilities.

Initialization

from nexaai.vlm import VLM
from nexaai.common import ModelConfig

# Initialize VLM
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path="model_name_or_path", m_cfg=m_cfg, plugin_id="cpu_gpu", device_id="")

Core Methods

generate_stream(prompt, g_cfg)
  • Generates text tokens in streaming fashion
  • Parameters:
    • prompt (str): Input prompt
    • g_cfg (GenerationConfig): Generation configuration with image_paths/audio_paths
  • Returns: Generator yielding text tokens
apply_chat_template(conversation)
  • Applies chat template to multimodal conversation
  • Parameters:
    • conversation (List[MultiModalMessage]): Multimodal conversation history
  • Returns: Formatted prompt string
get_profiling_data()
  • Returns performance profiling information
  • Returns: Dict with profiling metrics or None

Embedder

The Embedder class provides text vectorization and similarity computation.

Initialization

from nexaai.embedder import Embedder

# Initialize Embedder
embedder = Embedder.from_(name_or_path="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

generate(texts, config)
  • Generates embeddings for input texts
  • Parameters:
    • texts (List[str]): List of input texts
    • config (EmbeddingConfig): Embedding configuration
  • Returns: List of embedding vectors
get_embedding_dim()
  • Returns the dimension of embeddings
  • Returns: Integer dimension size

Reranker

The Reranker class provides document reranking capabilities.

Initialization

from nexaai.rerank import Reranker

# Initialize Reranker
reranker = Reranker.from_(name_or_path="model_name_or_path", plugin_id="cpu_gpu")

Core Methods

rerank(query, documents, config)
  • Reranks documents based on query relevance
  • Parameters:
    • query (str): Search query
    • documents (List[str]): List of documents to rank
    • config (RerankConfig): Reranking configuration
  • Returns: List of relevance scores

ASR (Automatic Speech Recognition)

The ASR class provides speech-to-text transcription capabilities.

Initialization

from nexaai.asr import ASR

# Initialize ASR
asr = ASR.from_(name_or_path="model_name_or_path", plugin_id="cpu_gpu", device_id="cpu")

Core Methods

transcribe(audio_path, language, config)
  • Transcribes audio file to text
  • Parameters:
    • audio_path (str): Path to audio file
    • language (str): Language code (“en”, “zh”, or "" for auto-detect)
    • config (ASRConfig): ASR configuration
  • Returns: Transcription result object
get_profiling_data()
  • Returns performance profiling information
  • Returns: Dict with profiling metrics or None

CV (Computer Vision)

The CVModel class provides computer vision capabilities including OCR.

Initialization

from nexaai.cv import CVModel, CVModelConfig, CVCapabilities

# Initialize CV Model
config = CVModelConfig(capabilities=CVCapabilities.OCR)
cv = CVModel.from_(name_or_path="model_name_or_path", config=config, plugin_id="cpu_gpu")

Core Methods

infer(image_path)
  • Performs inference on image
  • Parameters:
    • image_path (str): Path to input image
  • Returns: CVResults object with detection results

Configuration Classes

ModelConfig

General model configuration.
from nexaai.common import ModelConfig

config = ModelConfig()
# Default configuration is usually sufficient for most use cases

GenerationConfig

Text generation configuration.
from nexaai.common import GenerationConfig

config = GenerationConfig(
    max_tokens=100,           # Maximum tokens to generate
    image_paths=["path/to/image.jpg"],  # Image paths for VLM (optional)
    audio_paths=["path/to/audio.wav"],  # Audio paths for VLM (optional)
)

EmbeddingConfig

Text embedding configuration.
from nexaai.embedder import EmbeddingConfig

config = EmbeddingConfig(
    batch_size=4,             # Batch size for processing
)

RerankConfig

Document reranking configuration.
from nexaai.rerank import RerankConfig

config = RerankConfig(
    batch_size=4,             # Batch size for processing
)

ASRConfig

Speech recognition configuration.
from nexaai.asr import ASRConfig

config = ASRConfig(
    timestamps="segment",     # Timestamp granularity: none|segment|word
    beam_size=5,             # Beam size for decoding
    stream=False             # Whether to use streaming mode
)

CVModelConfig

Computer vision model configuration.
from nexaai.cv import CVModelConfig, CVCapabilities

config = CVModelConfig(
    capabilities=CVCapabilities.OCR,  # CV capability type
    det_model_path="path/to/det_model",  # Detection model path (for OCR)
    rec_model_path="path/to/rec_model",  # Recognition model path (for OCR)
)

Message Classes

ChatMessage

Represents a single message in a conversation.
from nexaai.common import ChatMessage

message = ChatMessage(role="user", content="Hello, how are you?")

MultiModalMessage

Represents a multimodal message with multiple content types.
from nexaai.common import MultiModalMessage, MultiModalMessageContent

contents = [
    MultiModalMessageContent(type="text", text="Describe this image"),
    MultiModalMessageContent(type="image", path="path/to/image.jpg")
]
message = MultiModalMessage(role="user", content=contents)

MultiModalMessageContent

Represents individual content within a multimodal message.
from nexaai.common import MultiModalMessageContent

# Text content
text_content = MultiModalMessageContent(type="text", text="Hello")

# Image content
image_content = MultiModalMessageContent(type="image", path="path/to/image.jpg")

# Audio content
audio_content = MultiModalMessageContent(type="audio", path="path/to/audio.wav")

Plugin ID Options

The plugin_id parameter supports different backends:
  • cpu_gpu: Default, supports both CPU and GPU
  • metal: Apple Silicon optimized (macOS)
  • mlx: MLX backend (macOS)
  • npu: NPU acceleration (Windows ARM64, Snapdragon X Elite)
  • nexaml: NexaML optimized backend
  • llama_cpp: For GGUF format models
  • onnx: ONNX runtime backend

Error Handling

The SDK provides comprehensive error handling with descriptive error messages. Common error scenarios include:
  • Invalid model paths
  • Unsupported plugin/device combinations
  • Authentication token issues
  • Memory allocation failures
  • Model loading errors
For detailed troubleshooting, refer to the platform-specific guides.

Performance Tips

  1. Model Selection: Choose models optimized for your platform and use case
  2. Batch Processing: Use appropriate batch sizes for embedding and reranking tasks
  3. Memory Management: Monitor memory usage, especially for large models
  4. Caching: Use KV cache for conversational applications
  5. Profiling: Use get_profiling_data() to monitor performance

Next Steps