Model Name Mapping
For all NPU model, we use an internal namming mapping and please fill in plugin id accordingly. For GGUF format models, they are running on CPU/GPU and no need to fill in model name.| Model Name | Plugin ID | Huggingface repository name |
|---|---|---|
| omni-neural | npu | NexaAI/OmniNeural-4B-mobile |
| embedneural | npu | NexaAI/embedneural-npu-mobile |
| phi3.5 | npu | NexaAI/phi3.5-mini-npu-mobile |
| phi4 | npu | NexaAI/phi4-mini-npu-mobile |
| granite-nano | npu | NexaAI/Granite-4.0-h-350M-NPU-mobile |
| granite4 | npu | NexaAI/Granite-4-Micro-NPU-mobile |
| embed-gemma | npu | NexaAI/embeddinggemma-300m-npu-mobile |
| qwen3-4b | npu | NexaAI/Qwen3-4B-Instruct-2507-npu-mobile |
| llama3-3b | npu | NexaAI/Llama3.2-3B-NPU-Turbo-NPU-mobile |
| parakeet | npu | NexaAI/parakeet-tdt-0.6b-v3-npu-mobile |
| liquid-v2 | npu | NexaAI/LFM2-1.2B-npu-mobile |
| jina-rerank | npu | NexaAI/jina-v2-rerank-npu-mobile |
| paddleocr | npu | NexaAI/paddleocr-npu-mobile |
LLM Usage
Large Language Models for text generation and chat applications.Streaming Conversation - NPU
We support NPU inference for NEXA format models.Streaming Conversation - CPU
We support CPU inferrence for GGUF format models.Multimodal Usage
Vision-Language Models for image understanding and multimodal applications.Streaming Conversation - NPU
We support NPU inference for NEXA format models.Streaming Conversation - CPU
We support CPU inferrence for GGUF format models.API Reference
VlmCreateInput
VlmChatMessage
VlmContent
Embeddings Usage
Generate vector embeddings for semantic search and RAG applications.Basic Usage
API Reference
EmbedderCreateInput
EmbeddingConfig
ASR Usage
Automatic Speech Recognition for audio transcription.Basic Usage
API Reference
AsrCreateInput
AsrTranscribeInput
AsrTranscriptionResult
Rerank Usage
Improve search relevance by reranking documents based on query relevance.Basic Usage
API Reference
RerankerCreateInput
RerankConfig
RerankerResult
Methods
CV Usage
Computer Vision models for OCR, object detection, and image classification.Basic Usage
API Reference
CVCreateInput
CVModelConfig
CVCapability
CVResult
How to use CPU, GPU, NPU
Switch between different hardware acceleration modes.CPU/GPU Mode
NPU Mode (Qualcomm)
Need Help?
Join our community to get support, share your projects, and connect with other developers.Discord Community
Get real-time support and chat with the Nexa AI community
Slack Community
Collaborate with developers and access community resources
Was this page helpful?