Quickstart

Installation & Running Your First Model

MacOS
Windows x64
Windows ARM64
Linux

MacOS Installation

Download the appropriate installer for your Mac:

Run the downloaded .pkg file and follow the installation wizard.

Running Your First Model

MacOS supports both MLX (Apple Silicon optimized) and GGUF models.

MLX Models (Recommended for Apple Silicon)
GGUF Models

Language Model (LLM)

MacOS

nexa infer NexaAI/Qwen3-4B-4bit-MLX

Multimodal Model

MacOS

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

Many MLX models in the Hugging Face mlx-community have quality issues and may not run locally. We recommend using models from our collection for best results.

Language Model (LLM)

MacOS

nexa infer NexaAI/Qwen3-0.6B

Multimodal Model

MacOS

nexa infer NexaAI/Qwen2.5-Omni-3B-GGUF

To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.

Windows x64 Installation

Download the installer:

x86_64 with Intel NPU support

Run the downloaded .exe file and follow the installation wizard.

Running Your First Model

Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)

Windows

nexa infer NexaAI/Qwen3-0.6B

Multimodal Model

Windows

nexa infer NexaAI/Qwen2.5-Omni-3B-GGUF

To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.

Currently, LLM (Large Language Model) and VLM (Vision Language Model) are in the testing scope. More modalities are coming soon!

Windows ARM64 Installation

Download the installer:

arm64 with Qualcomm NPU support

Run the downloaded .exe file and follow the installation wizard.

Running Your First Model

Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)

Windows

nexa infer NexaAI/Qwen3-0.6B

Multimodal Model

Windows

nexa infer NexaAI/Qwen2.5-Omni-3B-GGUF

To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.

Currently, LLM (Large Language Model) and VLM (Vision Language Model) are in the testing scope. More modalities are coming soon!

NPU Acceleration (Snapdragon X Elite)

Hardware requirement: The following NPU-accelerated model currently runs only on Qualcomm Snapdragon X Elite laptops.

If you have a Snapdragon X Elite PC, you can run the flagship OmniNeural-4B model with NPU acceleration:OmniNeural-4B (Multimodal NPU Model)

Windows

nexa infer omni-neural

Voice Input Mode: Once running, record your voice directly in terminal:

Windows

> /mic

Press CTRL + C to stop recording, then hit enter to send.File Input: Drag image/audio files into the command line:

> describe this image '/path/to/image.jpg' '/path/to/audio.wav'

For detailed NPU setup instructions and advanced features, see the NPU Guide.

Linux Installation

Run the following command to download and install:

Linux

curl -fsSL /path/to/install.sh -o install.sh && chmod +x install.sh && ./install.sh

Running Your First Model

Currently, we support LLM and Multimodal models. More model type support is coming soon!Language Model (LLM)

Linux

nexa infer NexaAI/Qwen3-0.6B

Multimodal Model

Linux

nexa infer NexaAI/Qwen2.5-Omni-3B-GGUF

To try other GGUF models, visit Hugging Face, copy the path of any compatible GGUF model (e.g., unsloth/Qwen2.5-VL-3B-Instruct-GGUF), and replace the model path in the command above.

We currently support LLM (Large Language Model) and VLM (Vision Language Model). More modalities are coming soon!

Explore CLI Commands

To see a list of all available CLI commands, run:

nexa -h

Was this page helpful?

Yes

Get Started

Nexa CLI Usage

Android SDK

Linux Docker

Python Library

iOS & macOS SDK

Community

Installation & Running Your First Model

MacOS Installation

Running Your First Model

Windows x64 Installation

Running Your First Model

Windows ARM64 Installation

Running Your First Model

NPU Acceleration (Snapdragon X Elite)

Linux Installation

Running Your First Model

Explore CLI Commands

Get Started

Nexa CLI Usage

Android SDK

Linux Docker

Python Library

iOS & macOS SDK

Community

​Installation & Running Your First Model

​MacOS Installation

​Running Your First Model

​Windows x64 Installation

​Running Your First Model

​Windows ARM64 Installation

​Running Your First Model

​NPU Acceleration (Snapdragon X Elite)

​Linux Installation

​Running Your First Model

​Explore CLI Commands

Installation & Running Your First Model

MacOS Installation

Running Your First Model

Windows x64 Installation

Running Your First Model

Windows ARM64 Installation

Running Your First Model

NPU Acceleration (Snapdragon X Elite)

Linux Installation

Running Your First Model

Explore CLI Commands