Skip to main content

Prerequisites

Before you begin, make sure you have:
  • Python 3.10
    • If you are using conda, you can create a new environment via:
      conda create -n nexaai python=3.10
      conda activate nexaai
      

Installation

Install the latest NexaAI Python SDK from PyPI. Install command by OS:
  • Windows and Linux:
    pip install nexaai
    
  • macOS:
    pip install 'nexaai[mlx]'
    

Authentication Setup

Before running any examples, you need to set up your NexaAI authentication token.

Set Token in Environment

Replace "YOUR_NEXA_TOKEN_HERE" with your actual NexaAI token from https://sdk.nexa.ai/:
  • Linux/macOS:
    export NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
    
  • Windows:
    $env:NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
    

Running Your First Model

Language Model (LLM)

Python
from nexaai.llm import LLM, GenerationConfig
from nexaai.common import ModelConfig, ChatMessage

# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/Qwen/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf"
m_cfg = ModelConfig()
llm = LLM.from_(model_path, plugin_id="cpu_gpu", device_id="cpu", m_cfg=m_cfg)

# Create conversation
conversation = [ChatMessage(role="system", content="You are a helpful assistant.")]
conversation.append(ChatMessage(role="user", content="Hello, how are you?"))

# Apply chat template and generate
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100)):
    print(token, end="", flush=True)

Multimodal Model (VLM)

Python
from nexaai.vlm import VLM, GenerationConfig
from nexaai.common import ModelConfig, MultiModalMessage, MultiModalMessageContent

# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/gemma-3n-E4B-it-4bit-MLX/model-00001-of-00002.safetensors"
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_path, m_cfg=m_cfg, plugin_id="cpu_gpu", device_id="")

# Create multimodal conversation
conversation = [MultiModalMessage(role="system", 
                                content=[MultiModalMessageContent(type="text", text="You are a helpful assistant.")])]

# Add user message with image
contents = [
    MultiModalMessageContent(type="text", text="Describe this image"),
    MultiModalMessageContent(type="image", text="path/to/image.jpg")
]
conversation.append(MultiModalMessage(role="user", content=contents))

# Apply chat template and generate
prompt = vlm.apply_chat_template(conversation)
for token in vlm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100, image_paths=["path/to/image.jpg"])):
    print(token, end="", flush=True)

Embedder

Python
from nexaai.embedder import Embedder, EmbeddingConfig

# Initialize embedder
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-fp16-mlx/model.safetensors"
embedder = Embedder.from_(name_or_path=model_path, plugin_id="cpu_gpu")

# Generate embeddings
texts = ["Hello world", "How are you?"]
config = EmbeddingConfig(batch_size=2)
embeddings = embedder.generate(texts=texts, config=config)

for text, embedding in zip(texts, embeddings):
    print(f"Text: {text}")
    print(f"Embedding dimension: {len(embedding)}")

Reranker

Python
from nexaai.rerank import Reranker, RerankConfig

# Initialize reranker
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-rerank-mlx/jina-reranker-v2-base-multilingual-f16.safetensors"
reranker = Reranker.from_(name_or_path=model_path, plugin_id="cpu_gpu")

# Rerank documents
query = "What is machine learning?"
documents = ["Machine learning is a subset of AI", "Python is a programming language"]
config = RerankConfig(batch_size=2)
scores = reranker.rerank(query=query, documents=documents, config=config)

for doc, score in zip(documents, scores):
    print(f"[{score:.4f}] {doc}")

Next Steps


I