Documentation Index
Fetch the complete documentation index at: https://docs.nexa.ai/llms.txt
Use this file to discover all available pages before exploring further.
Installation & Running Your First Model
MacOS
Windows x64
Windows ARM64
Linux
MacOS Installation
Download the appropriate installer for your Mac:Run the downloaded .pkg file and follow the installation wizard.Running Your First Model
MacOS supports both MLX (Apple Silicon optimized) and GGUF models.Language Model (LLM)nexa infer NexaAI/Qwen3-4B-4bit-MLX
Multimodal Modelnexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX
Many MLX models in the Hugging Face
mlx-community have quality issues and may not run locally. We recommend using models from
our collection for best results.
Language Model (LLM)nexa infer NexaAI/Qwen3-0.6B
Multimodal Modelnexa infer NexaAI/Qwen2.5-Omni-3B-GGUF
To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.
Windows x64 Installation
Download the installer:Run the downloaded .exe file and follow the installation wizard.Running Your First Model
Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)nexa infer NexaAI/Qwen3-0.6B
Multimodal Modelnexa infer NexaAI/Qwen2.5-Omni-3B-GGUF
To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.
Currently, LLM (Large Language Model) and VLM (Vision Language Model) are in the testing scope. More modalities are coming soon!
Windows ARM64 Installation
Download the installer:Run the downloaded .exe file and follow the installation wizard.Running Your First Model
Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)nexa infer NexaAI/Qwen3-0.6B
Multimodal Modelnexa infer NexaAI/Qwen2.5-Omni-3B-GGUF
To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.
Currently, LLM (Large Language Model) and VLM (Vision Language Model) are in the testing scope. More modalities are coming soon!
NPU Acceleration (Snapdragon X Elite)
Hardware requirement: The following NPU-accelerated model currently runs only on Qualcomm Snapdragon X Elite laptops.
If you have a Snapdragon X Elite PC, you can run the flagship OmniNeural-4B model with NPU acceleration:OmniNeural-4B (Multimodal NPU Model)Voice Input Mode: Once running, record your voice directly in terminal:Press CTRL + C to stop recording, then hit enter to send.File Input: Drag image/audio files into the command line:> describe this image '/path/to/image.jpg' '/path/to/audio.wav'
For detailed NPU setup instructions and advanced features, see the NPU Guide. Linux Installation
Run the following command to download and install:curl -fsSL /path/to/install.sh -o install.sh && chmod +x install.sh && ./install.sh
Running Your First Model
Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)nexa infer NexaAI/Qwen3-0.6B
Multimodal Modelnexa infer NexaAI/Qwen2.5-Omni-3B-GGUF
To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.
We currently support LLM (Large Language Model) and VLM (Vision Language Model). More modalities are coming soon!
Explore CLI Commands
To see a list of all available CLI commands, run: