Documentation Index
Fetch the complete documentation index at: https://docs.nexa.ai/llms.txt
Use this file to discover all available pages before exploring further.
LLM Usage
Large Language Models for text generation and chat applications.
Streaming Conversation
We support CPU/GPU inference for GGUF format models. You can pick any GGUF models from the community and run with the cpu_gpu plugin.
LlmWrapper.builder()
.llmCreateInput(
LlmCreateInput(
model_name = "", // For GGUF CPU/GPU models, leave model_name empty.
model_path = "<your-model-path>",
config = ModelConfig(
nCtx = 4096,
nGpuLayers = 0 // 0 for CPU, > 0 for GPU
),
plugin_id = "cpu_gpu",
device_id = null // null for CPU, "gpu" for GPU
)
)
.build()
.onSuccess { llmWrapper = it }
.onFailure { error -> println("Error: ${error.message}") }
val chatList = arrayListOf(ChatMessage("user", "What is AI?"))
llmWrapper.applyChatTemplate(chatList.toTypedArray(), null, false).onSuccess { template ->
val genConfig = GenerationConfig(maxTokens = 2048)
llmWrapper.generateStreamFlow(template.formattedText, genConfig).collect { result ->
when (result) {
is LlmStreamResult.Token -> println(result.text)
is LlmStreamResult.Completed -> println("Done!")
is LlmStreamResult.Error -> println("Error: ${result.throwable}")
}
}
}
CPU/GPU Configuration
Control whether your model runs on CPU or GPU using a combination of device_id and nGpuLayers:
GPU Execution Requirements:
device_id must be set to "gpu"
nGpuLayers must be greater than 0 (typically set to 999 to offload all layers)
CPU Execution:
device_id is null (default)
- OR
nGpuLayers is 0
Example: Running on GPU
LlmWrapper.builder()
.llmCreateInput(
LlmCreateInput(
model_name = "",
model_path = "<your-model-path>",
config = ModelConfig(
nCtx = 4096,
nGpuLayers = 999 // Offload all layers to GPU
),
plugin_id = "cpu_gpu",
device_id = "gpu" // Use GPU device
)
)
.build()
.onSuccess { llmWrapper = it }
.onFailure { error -> println("Error: ${error.message}") }
Example: Running on CPU (Default)
LlmWrapper.builder()
.llmCreateInput(
LlmCreateInput(
model_name = "",
model_path = "<your-model-path>",
config = ModelConfig(
nCtx = 4096,
nGpuLayers = 0 // All on CPU
),
plugin_id = "cpu_gpu",
device_id = null // Default to CPU
)
)
.build()
.onSuccess { llmWrapper = it }
Multimodal Usage
Vision-Language Models for image understanding and multimodal applications.
Streaming Conversation
We support CPU/GPU inference for GGUF format models.
VlmWrapper.builder()
.vlmCreateInput(
VlmCreateInput(
model_name = "", // For GGUF on CPU/GPU, leave empty (no model name needed)
model_path = <your-model-path>,
mmproj_path = <your-mmproj-path>, // vision projection weights
config = ModelConfig(
nCtx = 4096,
nGpuLayers = 0 // 0 for CPU, > 0 for GPU
),
plugin_id = "cpu_gpu",
device_id = null // null for CPU, "gpu" for GPU
)
)
.build()
.onSuccess { vlmWrapper = it }
.onFailure { error ->
println("Error: ${error.message}")
}
// Use the loaded VLM with image and text
val contents = listOf(
VlmContent("image", <your-image-path>),
VlmContent("text", <your-text>)
)
val chatList = arrayListOf(VlmChatMessage("user", contents))
vlmWrapper.applyChatTemplate(chatList.toTypedArray(), null, false).onSuccess { template ->
// Create base GenerationConfig with maxTokens
val baseConfig = GenerationConfig(maxTokens = 2048)
// Inject media paths from chatList into config
val configWithMedia = vlmWrapper.injectMediaPathsToConfig(
chatList.toTypedArray(),
baseConfig
)
vlmWrapper.generateStreamFlow(template.formattedText, configWithMedia).collect { result ->
when (result) {
is LlmStreamResult.Token -> println(result.text)
is LlmStreamResult.Completed -> println("Done!")
is LlmStreamResult.Error -> println("Error: ${result.throwable}")
}
}
}
ASR Usage
Automatic Speech Recognition for audio transcription.
Basic Usage
We support CPU inference for whisper.cpp models.
// Load ASR model for whisper.cpp inference
AsrWrapper.builder()
.asrCreateInput(
AsrCreateInput(
model_name = "", // Empty for whisper.cpp
model_path = <your-model-path>, // e.g., "ggml-base-q8_0.bin"
config = ModelConfig(
nCtx = 4096 // Context size (use nCtx instead of max_tokens)
),
plugin_id = "whisper_cpp" // Use whisper.cpp backend
)
)
.build()
.onSuccess { asrWrapper = it }
.onFailure { error ->
println("Error: ${error.message}")
}
// Transcribe audio file
asrWrapper.transcribe(
AsrTranscribeInput(
audioPath = <your-audio-path>, // Path to .wav file (16kHz recommended)
language = "en", // Language code: "en", "zh", "es", etc.
timestamps = null // Optional timestamp format
)
).onSuccess { result ->
println("Transcription: ${result.result.transcript}")
}
TTS Usage
Text-to-Speech synthesis for converting text into natural-sounding speech.
Basic Usage
We support CPU inference for TTS models in GGUF format.
// Load TTS model for CPU inference
TtsWrapper.builder()
.ttsCreateInput(
TtsCreateInput(
model_name = "", // Empty for CPU/GPU models
model_path = <your-model-path>, // Path to TTS model (e.g., Kokoro GGUF model)
config = ModelConfig(
nCtx = 4096 // Context size
),
plugin_id = "tts_cpp" // Use TTS backend
)
)
.build()
.onSuccess { ttsWrapper = it }
.onFailure { error ->
println("Error: ${error.message}")
}
// Synthesize speech from text
ttsWrapper.synthesize(
TtsSynthesizeInput(
textUtf8 = "Hello, this is a text to speech demo using Nexa SDK.",
outputPath = <your-output-audio-path> // Path where audio will be saved (e.g., "/path/to/output.wav")
)
).onSuccess { result ->
println("Speech synthesized successfully!")
println("Audio saved to: ${result.outputPath}")
}.onFailure { error ->
println("Error during synthesis: ${error.message}")
}
Need Help?
Join our community to get support, share your projects, and connect with other developers.
Discord Community
Get real-time support and chat with the Nexa AI community
Slack Community
Collaborate with developers and access community resources