Model Name Mapping
For all CoreML(ANE) models, we use an internal naming mapping and you should fill in the plugin id accordingly. For GGUF format models (running on CPU/GPU), you do not need to provide the plugin id or model name—the plugin parameter is not required.| Model Name | Plugin ID | Huggingface repository name |
|---|---|---|
| NexaAI/EmbedNeural-ANE | ane | NexaAI/EmbedNeural-ANE |
| NexaAI/parakeet-tdt-0.6b-v3-ane | ane | NexaAI/parakeet-tdt-0.6b-v3-ane |
ASR Usage
Automatic Speech Recognition for audio transcription.Basic Usage
ASR stream mode
API Reference
Core Methods
func load(from repoFolder: URL) async throws
- Loads an ASR model from a HuggingFace-format local repository folder
- Parameters:
- repoFolder: The folder containing the HuggingFace model files
- Returns: None
- Throws: Error if the model fails to load
- Note: This is an async function and must be awaited
func startRecordingStream(config: ASRStreamConfig = .init(), block tapBlock: AVAudioNodeTapBlock? = nil)
- Starts audio recording and ASR streaming simultaneously.
- Parameters:
- config: Streaming configuration.
- tapBlock: Custom tap block to inspect or process audio samples.
- Returns: none
func stopRecordingStream()
- Stops both audio recording and ASR streaming.
- Returns: None
func startRecording(block tapBlock: AVAudioNodeTapBlock? = nil)
- Starts audio recording only.
- Parameters:
- tapBlock: Optional tap block to inspect or process audio.
- Throws: Error if audio session or engine fails to start.
- Returns: None
func stopRecording()
- Stops the current audio recording session.
- Returns: None
func startStream(config: ASRStreamConfig = .init()) throws -> AsyncThrowingStream<String, Error>
- Starts ASR streaming mode.
- Parameters:
- config: Streaming configuration.
- Returns: A stream that yields partial or final transcription text.
func stopStream(graceful: Bool = true)
- Stops ASR streaming.
- Parameters:
- graceful:
- true: Process remaining buffered audio before stopping (default).
- false: Stop immediately.
- graceful:
- Returns: None
func streamPushSamples(samples: [Float]) throws
- Pushes raw audio samples into the streaming ASR pipeline for processing
- Parameters:
- samples: An array of PCM audio samples
- Returns: None
- Throws: Error if the streaming session is not active or the audio buffer cannot be processed
AsrResult
AsrResponse
AsrOptions
ASR streaming configuration
Embeddings Usage
Generate vector embeddings for semantic search and RAG applications.Basic Usage
API Reference
Core Methods
convenience init(from repoFolder: URL, plugin: Plugin = .cpu_gpu)
- Initializes an instance using a model stored in a local repository folder
- Parameters:
- repoFolder: Path to the local model repository folder
- plugin: Backend plugin to use (
cpu_gpuby default)
- Returns: An initialized instance
- Throws: Error if model loading or initialization fails
func embed(inputIds: [[Int32]], config: EmbeddingConfig) throws -> EmbedResult
- Generates embeddings from pre-tokenized input IDs
- Parameters:
- inputIds: Array of tokenized sequences, each inner array is the token IDs for one sample
- config: Embedding configuration
- Returns:
EmbedResult - Note: Supported only on the
cpu_gpuplugin
func embed(texts: [String], config: EmbeddingConfig) throws -> EmbedResult
- Generates embeddings for input text strings
- Parameters:
- texts: Array of input texts to embed
- config: Embedding process configuration (batch size, normalization, etc.)
- Returns:
EmbedResultcontaining embeddings and profiling data
func embed(imagePaths: [String], config: EmbeddingConfig) throws -> EmbedResult
- Generates embeddings for input images
- Parameters:
- imagePaths: Paths to input images
- config: Embedding configuration
- Returns:
EmbedResult
func dim() throws -> Int32
- Returns the embedding dimension for the model
- Parameters: None
- Returns: Int32 representing the embedding dimension
EmbeddingConfig
EmbedResult
LLM Usage
Large Language Models for text generation and chat applications.Streaming Conversation - CPU/GPU
We support CPU/GPU inference for GGUF format models.Multimodal Usage
Vision-Language Models for image understanding and multimodal applications.Streaming Conversation - CPU/GPU
We support CPU/GPU inference for GGUF format models.API Reference
Core Methods
func load(_ options: ModelOptions) async throws
- Loads the model with the specified configuration
- Parameters:
- options: Model loading options
- Returns: None
- Throws: Error if the model fails to load
func load(from repoFolder: URL, modelFileName: String = "", mmprojFileName: String = "") throws
- Loads the model from a local HuggingFace repository directory
- Parameters:
- repoFolder: Local HuggingFace repository directory
- modelFileName: Model file name; if empty, the default is used
- mmprojFileName: mmproj file name; if empty, the default is used
- Returns: None
- Throws: Error if model loading fails
func applyChatTemplate(messages: [ChatMessage], options: ChatTemplateOptions) async throws -> String
- Applies the model’s chat template and formats messages accordingly
- Parameters:
- messages: Messages to format
- options: Template configuration options
- Returns: String representing the formatted prompt
- Throws: Error if the template cannot be applied
func generateAsyncStream(messages: [ChatMessage], options: GenerationOptions) async throws -> AsyncThrowingStream<String, any Error>
- Generates text in streaming fashion from chat messages
- Parameters:
- messages: Chat history used for generation
- options: Generation configuration
- Returns: yielding generated text tokens
- Throws: Error if generation fails to start
func generate(prompt: String, config: GenerationConfig) async throws -> GenerateResult
- Generates text for a single prompt
- Parameters:
- prompt: Input prompt
- config: Generation configuration
- Returns: GenerateResult
- Throws: Error if generation fails
func reset()
- Resets the model’s internal state
- Parameters: None
- Returns: None
func stopStream()
- Stops generation streaming session
- Parameters: None
- Returns: None
func saveKVCache(to path: String)
- Saves the current KV cache to the specified file path
- Parameters:
- path (String): Destination file path
- Returns: None
- Notes: Only available for LLM models
func loadKVCache(from path: String)
- Loads KV cache from the specified file path
- Parameters:
- path (String): Source file path
- Returns: None
- Notes: Only available for LLM models
GenerationConfig
SamplerConfig
ProfileData
Rerank Usage
Improve search relevance by reranking documents based on query relevance.Basic Usage
API Reference
Core Methods
init(modelPath: String, tokenizerPath: String? = nil, deviceId: String? = nil, plugin: Plugin = .cpu_gpu) throws
- Initializes a reranker model from local file paths
- Parameters:
- modelPath: Path to the reranker model
- tokenizerPath: Path to tokenizer
- deviceId: Device identifier; if nil, uses default backend setup
- plugin: Backend plugin (default
cpu_gpu)
- Returns: Instance of the reranker
- Throws: Error if initialization fails
func rerank(_ query: String, documents: [String], config: RerankConfig = .init()) async throws -> RerankerResult
- Performs document ranking given a query and a list of documents
- Parameters:
- query: Query string to evaluate
- documents: List of documents to rank
- config: Reranking configuration
- Returns: RerankerResult
- Throws: Error during reranking execution
convenience init(from repoFolder: URL, plugin: Plugin = .cpu_gpu) throws
- Convenience initializer that loads a reranker from a HuggingFace-style local repository
- Parameters:
- repoFolder: Local repository directory
- plugin: Backend plugin (default
cpu_gpu)
- Returns: Instance of the reranker
- Throws: Error if loading fails
RerankerResult
RerankConfig
How to use CPU/GPU, ANE
Currently, two hardware acceleration mode are offered by Nexa iOS / macOS SDK - CPU/GPU or ANE NPU acceleration. A model can only be run on a certain hardware acceleration. Please carefully read the model card on huggingface to correctly use them.CPU/GPU Mode
ANE Mode
The Embedder modality supports both CPU/GPU and ANE execution.
LLM, VLM, and Reranker modules support CPU/GPU only.
The ASR module runs exclusively on ANE and does not support CPU/GPU execution.
Need Help?
Join our community to get support, share your projects, and connect with other developers.Discord Community
Get real-time support and chat with the Nexa AI community
Slack Community
Collaborate with developers and access community resources
Was this page helpful?