Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Welcome to Nidum!

At Nidum, we are committed to delivering cutting-edge AI models that offer advanced capabilities and unrestricted access to innovation. With Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, we bring you a performance-optimized, space-efficient, and feature-rich model designed for diverse use cases.


GitHub Icon
Explore Nidum's Open-Source Projects on GitHub: https://github.com/NidumAI-Inc


Key Features

  1. Compact and Efficient: Built in the MLX-4bit format for optimized performance with minimal memory usage.
  2. Versatility: Excels in a wide range of tasks, including technical problem-solving, educational queries, and casual conversations.
  3. Extended Context Handling: Capable of maintaining coherence in long-context interactions.
  4. Seamless Integration: Enhanced compatibility with the mlx-lm library for a streamlined development experience.
  5. Uncensored Access: Provides uninhibited responses across a variety of topics and applications.

How to Use

To utilize Nidum-Llama-3.2-3B-Uncensored-MLX-4bit, install the mlx-lm library and follow the example code below:

Installation

pip install mlx-lm

Usage

from mlx_lm import load, generate

# Load the model and tokenizer
model, tokenizer = load("nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit")

# Create a prompt
prompt = "hello"

# Apply the chat template if available
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

# Generate the response
response = generate(model, tokenizer, prompt=prompt, verbose=True)

# Print the response
print(response)

About the Model

The nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit model was converted to MLX format from nidum/Nidum-Llama-3.2-3B-Uncensored using mlx-lm version 0.19.2, providing the following benefits:

  • Smaller Memory Footprint: Ideal for environments with limited hardware resources.
  • High Performance: Retains the advanced capabilities of the original model while optimizing inference speed and efficiency.
  • Plug-and-Play Compatibility: Easily integrate with the mlx-lm ecosystem for seamless deployment.

Use Cases

  • Technical Problem Solving
  • Research and Educational Assistance
  • Open-Ended Q&A
  • Creative Writing and Ideation
  • Long-Context Dialogues
  • Unrestricted Knowledge Exploration

Datasets and Fine-Tuning

The model inherits the fine-tuned capabilities of its predecessor, Nidum-Llama-3.2-3B-Uncensored, including:

  • Uncensored Data: Ensures detailed and uninhibited responses.
  • RAG-Based Fine-Tuning: Optimizes retrieval-augmented generation for information-intensive tasks.
  • Math-Instruct Data: Tailored for precise mathematical reasoning.
  • Long-Context Fine-Tuning: Enhanced coherence and relevance in extended interactions.

Quantized Model Download

The MLX-4bit version is highly efficient, maintaining a balance between precision and memory usage.


Benchmark

Benchmark Metric LLaMA 3B Nidum 3B Observation
GPQA Exact Match (Flexible) 0.3 0.5 Nidum 3B demonstrates significant improvement, particularly in generative tasks.
Accuracy 0.4 0.5 Consistent improvement, especially in zero-shot scenarios.
HellaSwag Accuracy 0.3 0.4 Better performance in common sense reasoning tasks.
Normalized Accuracy 0.3 0.4 Enhanced ability to understand and predict context in sentence completion.
Normalized Accuracy (Stderr) 0.15275 0.1633 Slightly improved consistency in normalized accuracy.
Accuracy (Stderr) 0.15275 0.1633 Shows robustness in reasoning accuracy compared to LLaMA 3B.

Insights:

  1. Compact Efficiency: The MLX-4bit model ensures high performance with reduced resource usage.
  2. Enhanced Usability: Optimized for seamless integration with lightweight deployment scenarios.

Contributing

We invite contributions to further enhance the MLX-4bit model's capabilities. Reach out to us for collaboration opportunities.


Contact

For inquiries, support, or feedback, email us at [email protected].


Explore the Future

Embrace the power of innovation with Nidum-Llama-3.2-3B-Uncensored-MLX-4bit—the ideal blend of performance and efficiency.


Downloads last month
0
Safetensors
Model size
502M params
Tensor type
FP16
·
U32
·
Inference Examples
Inference API (serverless) does not yet support adapter-transformers models for this pipeline type.

Model tree for nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit

Collection including nidum/Nidum-Llama-3.2-3B-Uncensored-MLX-4bit