πŸš€ Falcon-7b-sharded-bf16-finetuned-sft

Phase-Technologies/falcon-7b-sharded-bf16-finetuned-sft is a sharded version of the Falcon-7B model, optimized with BF16 (Brain Floating Point 16-bit precision) for efficient inference and training on limited-memory GPUs.

πŸ”₯ Key Features:

  • πŸ¦… Based on Falcon-7B, a powerful transformer model
  • πŸ— Sharded for multi-GPU loading
  • 🎯 BF16 precision for lower memory usage
  • ⚑ Optimized for inference & fine-tuning

πŸ“‚ Model Details

Feature πŸ† Details πŸ“œ
Architecture πŸ— Falcon-7B (Transformer-based)
Parameters πŸ”’ 3.84B
Precision 🎯 Brain Floating Point 16 (BF16)
Tokenizer πŸ”€ Hugging Face AutoTokenizer
Use Cases 🎯 Chatbots πŸ€–, Summarization πŸ“š, Q&A ❓, Text Generation ✍️
License πŸ“„ Apache 2.0
Developer 🏒 Phase Technologies

πŸš€ Installation, Setup & Model Loading

πŸ”Ή Install Dependencies

pip install transformers accelerate torch

πŸ”Ή Load the Model in Python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Phase-Technologies/falcon-7b-sharded-bf16"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model (optimized for BF16 & sharded loading)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

print("βœ… Model Loaded Successfully!")

🎯 Usage

πŸ”Ή Text Generation

prompt = "Once upon a time, in a futuristic world..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate output
output = model.generate(**inputs, max_length=100)

# Decode and print result
print(tokenizer.decode(output[0], skip_special_tokens=True))

πŸ”Ή Running on Multiple GPUs

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    offload_folder="offload"  # For CPU offloading
)

πŸ“ˆ Performance


πŸ”— References

πŸ› Falcon Model Paper

πŸš€ Hugging Face Documentation

πŸ”₯ Phase Technologies

πŸ“’ Contributions & Issues: If you find a bug or have a feature request, feel free to open an issue! 😊


πŸš€ Happy Coding! πŸ’»πŸŽ‰

Downloads last month
50
Safetensors
Model size
3.84B params
Tensor type
F32
Β·
U8
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.