Meta-Llama-2-7b-chat-hf-Quantized
Collection
Different quantized versions of Meta's Llama-2-7b-chat-hf model
•
8 items
•
Updated
This repo contains 4-bit quantized (using bitsandbytes) model of Meta's meta-llama/Llama-2-7b-chat-hf
QLoRA: Efficient Finetuning of Quantized LLMs: arXiv - QLoRA: Efficient Finetuning of Quantized LLMs
Hugging Face Blog post on 4-bit quantization using bitsandbytes: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
bitsandbytes github repo: bitsandbytes github repo
Use the code below to get started with the model.
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
model_id_llama = "alokabhishek/Llama-2-7b-chat-hf-bnb-4bit"
tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True)
model_llama = AutoModelForCausalLM.from_pretrained(
model_id_llama,
device_map="auto"
)
pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')
prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_llama = pipe_llama(prompt_llama, max_new_tokens=512)
print(output_llama[0]["generated_text"])
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]