CPU / GPU

LLM 用法

适用于文本生成与聊天应用的大语言模型。

流式对话

支持 GGUF 格式模型的 CPU/GPU 推理。你可以从社区中选择任何 GGUF 模型，并使用 cpu_gpu 插件运行。

LlmWrapper.builder()
    .llmCreateInput(
        LlmCreateInput(
            model_name = "", // GGUF CPU/GPU 模型：model_name 保持为空
            model_path = "<your-model-path>",
            config = ModelConfig(
                nCtx = 4096,
                nGpuLayers = 0  // 0 表示 CPU，> 0 表示 GPU
            ),
            plugin_id = "cpu_gpu",
            device_id = null  // null 表示 CPU，"gpu" 表示 GPU
        )
    )
    .build()
    .onSuccess { llmWrapper = it }
    .onFailure { error -> println("Error: ${error.message}") }

val chatList = arrayListOf(ChatMessage("user", "What is AI?"))

llmWrapper.applyChatTemplate(chatList.toTypedArray(), null, false).onSuccess { template ->
    val genConfig = GenerationConfig(maxTokens = 2048)
    
    llmWrapper.generateStreamFlow(template.formattedText, genConfig).collect { result ->
        when (result) {
            is LlmStreamResult.Token -> println(result.text)
            is LlmStreamResult.Completed -> println("Done!")
            is LlmStreamResult.Error -> println("Error: ${result.throwable}")
        }
    }
}

CPU/GPU 配置

使用 device_id 和 nGpuLayers 的组合控制模型在 CPU 或 GPU 上运行： GPU 执行要求：

device_id 必须设置为 "gpu"
nGpuLayers 必须大于 0（通常设置为 999 以卸载所有层）

CPU 执行：

device_id 为 null（默认）
或 nGpuLayers 为 0

示例：在 GPU 上运行

LlmWrapper.builder()
    .llmCreateInput(
        LlmCreateInput(
            model_name = "",
            model_path = "<your-model-path>",
            config = ModelConfig(
                nCtx = 4096,
                nGpuLayers = 999  // 将所有层卸载到 GPU
            ),
            plugin_id = "cpu_gpu",
            device_id = "gpu"  // 使用 GPU 设备
        )
    )
    .build()
    .onSuccess { llmWrapper = it }
    .onFailure { error -> println("Error: ${error.message}") }

示例：在 CPU 上运行（默认）

LlmWrapper.builder()
    .llmCreateInput(
        LlmCreateInput(
            model_name = "",
            model_path = "<your-model-path>",
            config = ModelConfig(
                nCtx = 4096,
                nGpuLayers = 0  // 全部在 CPU 上
            ),
            plugin_id = "cpu_gpu",
            device_id = null  // 默认为 CPU
        )
    )
    .build()
    .onSuccess { llmWrapper = it }

多模态用法

用于图像理解与多模态应用的视觉语言模型。

流式对话

支持 GGUF 格式模型的 CPU/GPU 推理。

VlmWrapper.builder()
    .vlmCreateInput(
        VlmCreateInput(
            model_name = "",  // GGUF CPU/GPU：保持为空（无需模型名称）
            model_path = <your-model-path>,
            mmproj_path = <your-mmproj-path>,  // 视觉投影权重
            config = ModelConfig(
                nCtx = 4096,
                nGpuLayers = 0  // 0 表示 CPU，> 0 表示 GPU
            ),
            plugin_id = "cpu_gpu",
            device_id = null  // null 表示 CPU，"gpu" 表示 GPU
        )
    )
    .build()
    .onSuccess { vlmWrapper = it }
    .onFailure { error -> 
        println("Error: ${error.message}")
    }

// 使用已加载的 VLM 进行图像 + 文本推理
val contents = listOf(
    VlmContent("image", <your-image-path>),
    VlmContent("text", <your-text>)
)

val chatList = arrayListOf(VlmChatMessage("user", contents))

vlmWrapper.applyChatTemplate(chatList.toTypedArray(), null, false).onSuccess { template ->
    // 创建基础 GenerationConfig 并设置 maxTokens
    val baseConfig = GenerationConfig(maxTokens = 2048)
    
    // 从 chatList 注入媒体路径到 config
    val configWithMedia = vlmWrapper.injectMediaPathsToConfig(
        chatList.toTypedArray(),
        baseConfig
    )
    
    vlmWrapper.generateStreamFlow(template.formattedText, configWithMedia).collect { result ->
        when (result) {
            is LlmStreamResult.Token -> println(result.text)
            is LlmStreamResult.Completed -> println("Done!")
            is LlmStreamResult.Error -> println("Error: ${result.throwable}")
        }
    }
}

ASR 用法

用于音频转写的自动语音识别。

基本用法

支持 whisper.cpp 模型的 CPU 推理。

// 加载 whisper.cpp 推理的 ASR 模型
AsrWrapper.builder()
    .asrCreateInput(
        AsrCreateInput(
            model_name = "",  // whisper.cpp 保持为空
            model_path = <your-model-path>,  // 例如："ggml-base-q8_0.bin"
            config = ModelConfig(
                nCtx = 4096  // 上下文大小（使用 nCtx 而非 max_tokens）
            ),
            plugin_id = "whisper_cpp"  // 使用 whisper.cpp 后端
        )
    )
    .build()
    .onSuccess { asrWrapper = it }
    .onFailure { error -> 
        println("Error: ${error.message}")
    }

// 转写音频文件
asrWrapper.transcribe(
    AsrTranscribeInput(
        audioPath = <your-audio-path>,  // .wav 文件路径（建议 16kHz）
        language = "en",                // 语言代码："en"、"zh"、"es" 等
        timestamps = null               // 可选时间戳格式
    )
).onSuccess { result ->
    println("Transcription: ${result.result.transcript}")
}

TTS 用法

将文本转换为自然语音的文本转语音合成。

基本用法

支持 GGUF 格式 TTS 模型的 CPU 推理。

// 加载 CPU 推理的 TTS 模型
TtsWrapper.builder()
    .ttsCreateInput(
        TtsCreateInput(
            model_name = "",  // CPU/GPU 模型保持为空
            model_path = <your-model-path>,  // TTS 模型路径（例如：Kokoro GGUF 模型）
            config = ModelConfig(
                nCtx = 4096  // 上下文大小
            ),
            plugin_id = "tts_cpp"  // 使用 TTS 后端
        )
    )
    .build()
    .onSuccess { ttsWrapper = it }
    .onFailure { error -> 
        println("Error: ${error.message}")
    }

// 从文本合成语音
ttsWrapper.synthesize(
    TtsSynthesizeInput(
        textUtf8 = "Hello, this is a text to speech demo using Nexa SDK.",
        outputPath = <your-output-audio-path>  // 音频保存路径（例如："/path/to/output.wav"）
    )
).onSuccess { result ->
    println("Speech synthesized successfully!")
    println("Audio saved to: ${result.outputPath}")
}.onFailure { error ->
    println("Error during synthesis: ${error.message}")
}

需要帮助？

加入我们的社区获取支持、分享项目并与其他开发者交流。

Discord 社区

获取实时支持并与 Nexa AI 社区交流

Slack 社区

与开发者协作并访问社区资源

Was this page helpful?

Yes

快速开始

Nexa CLI 使用

Android SDK

Linux Docker

Python 库

iOS & macOS SDK

社区

LLM 用法

流式对话

CPU/GPU 配置

示例：在 GPU 上运行

示例：在 CPU 上运行（默认）

多模态用法

流式对话

ASR 用法

基本用法

TTS 用法

基本用法

需要帮助？

Discord 社区

Slack 社区

快速开始

Nexa CLI 使用

Android SDK

Linux Docker

Python 库

iOS & macOS SDK

社区

​LLM 用法

​流式对话

​CPU/GPU 配置

​示例：在 GPU 上运行

​示例：在 CPU 上运行（默认）

​多模态用法

​流式对话

​ASR 用法

​基本用法

​TTS 用法

​基本用法

​需要帮助？

Discord 社区

Slack 社区

LLM 用法

流式对话

CPU/GPU 配置

示例：在 GPU 上运行

示例：在 CPU 上运行（默认）

多模态用法

流式对话

ASR 用法

基本用法

TTS 用法

基本用法

需要帮助？