Overview
Copy usage: nexa [-h] [-V] { run,onnx,server,pull,remove,clean,list,login,whoami,logout} ...
Nexa CLI tool for handling various model operations.
positional arguments:
{ run,onnx,server,pull,remove,clean,list,login,whoami,logout}
sub-command help
run Run inference for various tasks using GGUF models.
onnx Run inference for various tasks using ONNX models.
embed Generate embeddings for text.
server Run the Nexa AI Text Generation Service
pull Pull a model from official or hub.
remove Remove a model from local machine.
clean Clean up all model files.
list List all models in the local machine.
login Login to Nexa API.
whoami Show current user information.
logout Logout from Nexa API.
options:
-h, --help show this help message and exit
-V, --version Show the version of the Nexa SDK.
List local models
List all models on your local computer.
Download a model
Download a model file to your local computer from Nexa Model Hub .
Copy nexa pull MODEL_PATH
usage: nexa pull [-h] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
-hf, --huggingface Pull model from Hugging Face Hub
Example Command:
Remove a model
Remove a model from your local computer.
Copy nexa remove MODEL_PATH
usage: nexa remove [-h] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
Example Command:
Remove all downloaded models
Remove all downloaded models on your local computer.
Run a model
Run a model on your local computer. If the model file is not yet downloaded, it will be automatically fetched first. For more details, please refer to the Inference page.
By default nexa run
will run GGUF models. Use nexa onnx
to run ONNX models.
Remember to install nexaai onnx version in advance.
Run text-generation model
Run text-generation models on your local computer.
Copy nexa run MODEL_PATH
usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
-pf, --profiling Enable profiling logs for the inference process
-st, --streamlit Run the inference in Streamlit UI
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-mt, --model_type Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
Text generation options:
-t, --temperature TEMPERATURE
Temperature for sampling
-m, --max_new_tokens MAX_NEW_TOKENS
Maximum number of new tokens to generate
-k, --top_k TOP_K Top-k sampling parameter
-p, --top_p TOP_P Top-p sampling parameter
-sw, --stop_words [STOP_WORDS ...]
List of stop words for early stopping
--lora_path Path to a LoRA file to apply to the model
--nctx Maximum context length of the model you 're using
Example Command:
Run image-generation model
Run image-generation models on your local computer.
Copy nexa run MODEL_PATH
usage: nexa run [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [-st] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
-st, --streamlit Run the inference in Streamlit UI, can be used with -lp or -hf
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-mt, --model_type Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
Image generation options:
-i2i, --img2img Whether to run image-to-image generation
-ns, --num_inference_steps NUM_INFERENCE_STEPS
Number of inference steps
-np, --num_images_per_prompt NUM_IMAGES_PER_PROMPT
Number of images to generate per prompt
-H, --height HEIGHT Height of the output image
-W, --width WIDTH Width of the output image
-g, --guidance_scale GUIDANCE_SCALE
Guidance scale for diffusion
-o, --output OUTPUT Output path for the generated image
-s, --random_seed RANDOM_SEED
Random seed for image generation
--lora_dir LORA_DIR Path to directory containing LoRA files
--wtype WTYPE Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0 )
--control_net_path CONTROL_NET_PATH
Path to control net model
--control_image_path CONTROL_IMAGE_PATH
Path to image condition for Control Net
--control_strength CONTROL_STRENGTH
Strength to apply Control Net
Example Command:
Run vision-language model
Run vision-language models on your local computer.
Copy nexa run MODEL_PATH
usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
-pf, --profiling Enable profiling logs for the inference process
-st, --streamlit Run the inference in Streamlit UI, can be used with -lp or -hf
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-mt, --model_type Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
VLM generation options:
-t, --temperature TEMPERATURE
Temperature for sampling
-m, --max_new_tokens MAX_NEW_TOKENS
Maximum number of new tokens to generate
-k, --top_k TOP_K Top-k sampling parameter
-p, --top_p TOP_P Top-p sampling parameter
-sw, --stop_words [STOP_WORDS ...]
List of stop words for early stopping
Example Command:
Run audio model
Run audio models on your local computer.
Copy nexa run MODEL_PATH
usage: nexa run [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-t TEMPERATURE] [-c COMPUTE_TYPE] [-st] model_path
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
-st, --streamlit Run the inference in Streamlit UI, can be used with -lp or -hf
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-mt, --model_type Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
Automatic Speech Recognition options:
-b, --beam_size BEAM_SIZE
Beam size to use for transcription
-l, --language LANGUAGE
The language spoken in the audio. It should be a language code such as 'en' or 'fr' .
--task TASK Task to execute (transcribe or translate )
-c, --compute_type COMPUTE_TYPE
Type to use for computation (e.g., float16, int8, int8_float16 )
Example Command:
Copy nexa run faster-whisper-tiny
Generate Embeddings
Generate Text Embeddings
Copy nexa embed MODEL_PATH
usage: nexa embed [-h] [-lp] [-hf] [-n] [-nt] model_path prompt
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
prompt Prompt to generate embeddings
options:
-h, --help show this help message and exit
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
-n, --normalize Normalize the embeddings
-nt, --no_truncate Not truncate the embeddings
Example Command:
Copy nexa embed mxbai "I love Nexa AI."
nexa embed nomic "I love Nexa AI." >> generated_embeddings.txt
nexa embed nomic-embed-text-v1.5:fp16 "I love Nexa AI."
nexa embed sentence-transformers/all-MiniLM-L6-v2:gguf-fp16 "I love Nexa AI." >> generated_embeddings.txt
Start local server
start a local server using models on your local computer.
Copy nexa server MODEL_PATH
usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] [--nctx NCTX] model_path
positional arguments:
model_path Path or identifier for the model in S3
options:
-h, --help show this help message and exit
-lp, --local_path Indicate that the model path provided is the local path, must be used with -mt
-mt, --model_type Indicate the model running type, must be used with -lp or -hf, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface Load model from Hugging Face Hub, must be used with -mt
--host HOST Host to bind the server to
--port PORT Port to bind the server to
--reload Enable automatic reloading on code changes
Example Command:
Run Model Evaluation
Run evaluation using models on your local computer.
Copy usage: nexa eval model_path [-h] [--tasks TASKS] [--limit LIMIT]
positional arguments:
model_path Path or identifier for the model in Nexa Model Hub
options:
-h, --help show this help message and exit
--tasks TASKS Tasks to evaluate, comma-separated
--limit LIMIT Limit the number of examples per task. If < 1, limit is a percentage of the total number of examples.
Examples
Copy nexa eval phi3 --tasks ifeval --limit 0.5
For more details, please refer to Local Server page
For model_path
in nexa commands,
If you want to use models in Nexa Model Hub, it's better to follow the standard format as below to ensure correct model loading and execution:
[user_name]/[repo_name]:[tag_name](user's model)
[repo_name]:[tag_name](official model)
If you want to use models in Hugging Face, use Hugging Face repo_id as model_path
Last updated 2 months ago