🚀 Falcon-7b-sharded-bf16-finetuned-sft

Phase-Technologies/falcon-7b-sharded-bf16-finetuned-sft is a sharded version of the Falcon-7B model, optimized with BF16 (Brain Floating Point 16-bit precision) for efficient inference and training on limited-memory GPUs.

🔥 Key Features:

🦅 Based on Falcon-7B, a powerful transformer model
🏗 Sharded for multi-GPU loading
🎯 BF16 precision for lower memory usage
⚡ Optimized for inference & fine-tuning

📂 Model Details

Feature 🏆	Details 📜
Architecture 🏗	Falcon-7B (Transformer-based)
Parameters 🔢	3.84B
Precision 🎯	Brain Floating Point 16 (BF16)
Tokenizer 🔤	Hugging Face `AutoTokenizer`
Use Cases 🎯	Chatbots 🤖, Summarization 📚, Q&A ❓, Text Generation ✍️
License 📄	Apache 2.0
Developer 🏢	Phase Technologies

🚀 Installation, Setup & Model Loading

🔹 Install Dependencies

pip install transformers accelerate torch

🔹 Load the Model in Python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Phase-Technologies/falcon-7b-sharded-bf16"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model (optimized for BF16 & sharded loading)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

print("✅ Model Loaded Successfully!")

🎯 Usage

🔹 Text Generation

prompt = "Once upon a time, in a futuristic world..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate output
output = model.generate(**inputs, max_length=100)

# Decode and print result
print(tokenizer.decode(output[0], skip_special_tokens=True))

🔹 Running on Multiple GPUs

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    offload_folder="offload"  # For CPU offloading
)

📈 Performance

🔗 References

🏛 Falcon Model Paper

🚀 Hugging Face Documentation

🔥 Phase Technologies

📢 Contributions & Issues: If you find a bug or have a feature request, feel free to open an issue! 😊

🚀 Happy Coding! 💻🎉