⬆️Upload Model

🤝 Share your model and connect with developers, researchers, and users for support and collaboration.

To upload models, you will need to create an account with Nexa AI Hub. Currently, you can upload model through web interface. After uploading the model, you will have control over what files to include in your model, and how to create a tag for your model to make your model more discoverable, read more below.

Upload Method

Step 0. To upload models to the Hub, visit Nexa Hub and make sure you registered an account for Nexa AI Hub.

Click the "Sign in/Sign up" button on the top right corner

Step 1. Click "Upload your model"

Click the "Upload your model" button on the top right corner

Step 2. Fill-in model name, parameters, model type, and license (optional)

Step 3. Fill-in Model tag name, see the tag section on how to tag your model

Name your model in the highlighted "Model tag name" field

Step 4. Edit model tag to make your model more discoverable

Click "Edit quantization tag"
Edit quantization tag information as needed

Step 5. Upload files

Click "Upload file" button under "Files" tab
Upload file here

Step 6. Edit README to add more descriptions for your model

Click "Edit" button under "Model Info" tab
Update README info here

About Model Tag

Model Tag Name

We recommend that model tag names should make sense. In official models in Model Hub, we add model precision in the model tag name.

Model Precision
Bits per Weight (BPW) Approximation

gguf-q2_K

2

gguf-q3_K_L

3

gguf-q3_K_M

3

gguf-q3_K_S

3

gguf-q4_0

4

gguf-q4_1

4

gguf-q4_K_M

4

gguf-q4_K_S

4

gguf-q5_0

5

gguf-q5_1

5

gguf-q5_K_M

5

gguf-q5_K_S

5

gguf-q6_K

6

gguf-q8_0

8

onnx-int4

4

onnx-int8

8

onnx-bf16

16

onnx-fp16

16

onnx-fp32

32

File Format Tag

  • GGUF

GGUF is an optimized binary format designed for efficient model loading and saving, particularly suited for inference tasks. It is compatible with GGML and other executors. Developed by @ggerganov, the creator of llama.cpp (a widely-used C/C++ LLM inference framework), GGUF forms the foundation of the Nexa SDK's GGML component.

  • ONNX

ONNX is an open standard format for representing machine learning models. It establishes a common set of operators and a unified file format, enabling AI developers to utilize models across various frameworks, tools, runtimes, and compilers. ONNX shows unique performance advantages on devices with limited ram(mobile, IoT). The Nexa SDK's ONNX component is built upon the onnxruntime framework.

RAM

This metric indicates the minimum random access memory (RAM) necessary for local model execution. You can calculate the approximate RAM required according to the formula:

RAM=ParametersBPW/8RAM = Parameters * BPW / 8

File Size

Displays the total storage space required for the model.

Last updated

Was this helpful?