GGUF

GGUF Interface

NexaTextInference

A class used for loading text models and running text generation.

Methods

  • run(): Run the text generation loop.

  • run_streamlit(): Run the Streamlit UI.

  • create_embedding(input): Embed a string.

  • create_chat_completion(messages): Generate completion for a chat conversation.

  • create_completion(prompt): Generate completion for a given prompt.

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path (str): Local path of the model. Either model_path or local_path should be provided.

  • embedding (bool): Enable embedding generation.

  • stop_words (list): List of stop words for early stopping.

  • temperature (float): Temperature for sampling.

  • max_new_tokens (int): Maximum number of new tokens to generate.

  • top_k (int): Top-k sampling parameter.

  • top_p (float): Top-p sampling parameter.

  • profiling (bool): Enable timing measurements for the generation process.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaTextInference

model_path = "llama2"
inference = NexaTextInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=512,
    top_k=50,
    top_p=0.9,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit(model_path)

# create_embedding(input) method
inference.create_embedding("Hello, world!")

# create_chat_completion(messages)
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# create_completion(prompt)
inference.create_completion("Q: Name the planets in the solar system? A:")

NexaImageInference

A class used for loading image models and running image generation.

Methods

  • txt2img(prompt): Generate images from text.

  • img2img(image_path, prompt): Generate images from an image.

  • run_txt2img(): Run the text-to-image generation loop.

  • run_img2img(): Run the image-to-image generation loop.

  • run_streamlit(): Run the Streamlit UI.

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path (str): Local path of the model. Either model_path or local_path should be provided.

  • output_path (str): Output path for the generated image.

  • num_inference_steps (int): Number of inference steps.

  • width (int): Width of the output image.

  • height (int): Height of the output image.

  • guidance_scale (float): Guidance scale for diffusion.

  • random_seed (int): Random seed for image generation.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaImageInference

model_path = "lcm-dreamshaper"
inference = NexaImageInference(
    model_path=model_path,
    local_path=None,
    num_inference_steps=4,
    width=512,
    height=512,
    guidance_scale=1.0,
    random_seed=0,
)

# txt2img(prompt) method
inference.txt2img("a lovely cat")

# img2img(image_path, prompt) method
inference.img2img(image_path="path/to/local/image", prompt="blue sky")

# run_txt2img() method
inference.run_txt2img()

# run_img2img() method
inference.run_img2img()

# run_streamlit() method
inference.run_streamlit(model_path)

NexaVLMInference

A class used for loading VLM (Vision-Language Model) models and running text generation.

Methods

  • run(): Run the text generation loop.

  • run_streamlit(): Run the Streamlit UI.

  • create_chat_completion(messages): Generate text completion for a given chat prompt.

  • _chat(user_input, image_path): Generate text about the given image

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path (str): Local path of the model. Either model_path or local_path should be provided.

  • stop_words (list): List of stop words for early stopping.

  • temperature (float): Temperature for sampling.

  • max_new_tokens (int): Maximum number of new tokens to generate.

  • top_k (int): Top-k sampling parameter.

  • top_p (float): Top-p sampling parameter.

  • profiling (bool): Enable timing measurements for the generation process.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaVLMInference

model_path = "nanollava"
inference = NexaVLMInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=2048,
    top_k=50,
    top_p=1.0,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# create_chat_completion(messages) method
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# _chat(user_input, image_path) method
inference._chat(user_input="Describe this image in detail.", image_path="path/to/local/image")

NexaVoiceInference

A class used for loading voice models and running voice transcription.

Methods

  • run(): Run the voice transcription loop.

  • run_streamlit(): Run the Streamlit UI.

  • transcribe(audio_path): Transcribe the audio file into text

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path (str): Local path of the model. Either model_path or local_path should be provided.

  • output_dir (str): Output directory for transcriptions.

  • compute_type (str): Type to use for computation (e.g., float16, int8, int8_float16).

  • beam_size (int): Beam size to use for transcription.

  • language (str): The language spoken in the audio.

  • task (str): Task to execute (transcribe or translate).

  • temperature (float): Temperature for sampling.

Example Code

from nexa.gguf import NexaVoiceInference

model_path = "faster-whisper-large"
inference = NexaVoiceInference(
    model_path=model_path,
    local_path=None,
    beam_size=5,
    language=None,
    task="transcribe",
    temperature=0.0,
    compute_type="default"
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# _transcribe_audio(audio_path) method
inference.transcribe("path/to/your/audio.wav")

Generate Embeddings

Generate text embeddings that can be used in RAG.

Example Code

from nexa.gguf import NexaTextInference
import chromadb

documents = [
    "The Great Wall of China is the longest wall in the world, stretching over 13,000 miles",
    "Construction of the Great Wall began more than 2,300 years ago",
    "The Great Wall is made from stone, brick, rammed earth, wood, and other materials",
    "Contrary to popular belief, the Great Wall is not visible from space with the naked eye",
    "The Great Wall was built by several dynasties over many centuries",
    "Some sections of the Great Wall have become popular tourist attractions"
]

client = chromadb.Client()
collection = client.create_collection(name="docs")

inference = NexaTextInference(
    model_path="mxbai",
    embedding=True
)

# store each document in a vector embedding database
for i, d in enumerate(documents):
    embedding = inference.create_embedding(d)
    collection.add(
        ids=[str(i)],
        embeddings=[embedding],
        documents=[d]
    )

prompt = "What is the Great Wall made of?"

# generate an embedding for the prompt and retrieve the most relevant doc
embedding = inference.create_embedding(prompt)
results = collection.query(
  query_embeddings=[embedding],
  n_results=1
)
data = results['documents'][0][0]

print(data)

Last updated