π Falcon-7b-sharded-bf16-finetuned-sft
Phase-Technologies/falcon-7b-sharded-bf16-finetuned-sft
is a sharded version of the Falcon-7B model, optimized with BF16 (Brain Floating Point 16-bit precision) for efficient inference and training on limited-memory GPUs.
π₯ Key Features:
- π¦ Based on Falcon-7B, a powerful transformer model
- π Sharded for multi-GPU loading
- π― BF16 precision for lower memory usage
- β‘ Optimized for inference & fine-tuning
π Model Details
Feature π | Details π |
---|---|
Architecture π | Falcon-7B (Transformer-based) |
Parameters π’ | 3.84B |
Precision π― | Brain Floating Point 16 (BF16) |
Tokenizer π€ | Hugging Face AutoTokenizer |
Use Cases π― | Chatbots π€, Summarization π, Q&A β, Text Generation βοΈ |
License π | Apache 2.0 |
Developer π’ | Phase Technologies |
π Installation, Setup & Model Loading
πΉ Install Dependencies
pip install transformers accelerate torch
πΉ Load the Model in Python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Phase-Technologies/falcon-7b-sharded-bf16"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load model (optimized for BF16 & sharded loading)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
print("β
Model Loaded Successfully!")
π― Usage
πΉ Text Generation
prompt = "Once upon a time, in a futuristic world..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate output
output = model.generate(**inputs, max_length=100)
# Decode and print result
print(tokenizer.decode(output[0], skip_special_tokens=True))
πΉ Running on Multiple GPUs
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
offload_folder="offload" # For CPU offloading
)
π Performance
π References
π Falcon Model Paper
π Hugging Face Documentation
π₯ Phase Technologies
π’ Contributions & Issues: If you find a bug or have a feature request, feel free to open an issue! π
π Happy Coding! π»π
- Downloads last month
- 50
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.