πŸ–₯️ Supported Devices

NPU models are supported on Qualcomm Neural Processing Units (NPUs):
  • Nexa CLI: Qualcomm Snapdragon NPU PC
  • NexaML: Any Qualcomm NPU device (contact us to request access)

βš™οΈ Prerequisites

  • If you haven’t already, install the nexa-SDK.
  • All NPU models require an access token before usage:
    • Create an account at sdk.nexa.ai
    • Generate a token: Go to Deployment β†’ Create Token
    • Activate your SDK: Run the following command in your terminal (replace with your token):
    bash
    nexa config set license '<your_token_here>'
    

LLM - Language Models

πŸ“ Language models in NPU format. Try out this quick example: Try it out:
bash
nexa infer NexaAI/qwen3-4B-npu
⌨️ Once model loads, type or paste multi-line text directly into the CLI to chat with the model.

LMM - Multimodal Models

πŸ–ΌοΈ Language models that also accept vision and/or audio inputs. LMM in NPU formats. Try out this quick example:
bash
nexa infer NexaAI/OmniNeural-4B
⌨️ Drag photos or audio clips directly into the CLI β€” you can even drop multiple images at once!

Supported Model List

We curated a list of top, high quality models in NPU format.

Models for Qualcomm NPU

For more advanced models, you may visit the Nexa Model Hub. Also, access token is required to download and use these models. To get access token, You can check the Prerequisites section of this page.

πŸ™‹ Request New Models

Want a specific model? Submit an issue on the nexa-sdk GitHub or request in our Discord/Slack community!