Model Card for alokabhishek/Llama-2-7b-chat-hf-bnb-4bit

This repo contains 4-bit quantized (using bitsandbytes) model of Meta's meta-llama/Llama-2-7b-chat-hf

Model Details

Model creator: Meta
Original model: Llama-2-7b-chat-hf

About 4 bit quantization using bitsandbytes

QLoRA: Efficient Finetuning of Quantized LLMs: arXiv - QLoRA: Efficient Finetuning of Quantized LLMs

Hugging Face Blog post on 4-bit quantization using bitsandbytes: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

bitsandbytes github repo: bitsandbytes github repo

How to Get Started with the Model

Use the code below to get started with the model.

How to run from Python code

First install the package

pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation

Import

import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM

Use a pipeline as a high-level helper

model_id_llama = "alokabhishek/Llama-2-7b-chat-hf-bnb-4bit"

tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True)

model_llama = AutoModelForCausalLM.from_pretrained(
    model_id_llama,
    device_map="auto"
)


pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')

prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_llama = pipe_llama(prompt_llama, max_new_tokens=512)

print(output_llama[0]["generated_text"])

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

alokabhishek
/

Llama-2-7b-chat-hf-bnb-4bit

Model Card for alokabhishek/Llama-2-7b-chat-hf-bnb-4bit

Model Details

About 4 bit quantization using bitsandbytes

How to Get Started with the Model

How to run from Python code

First install the package

Import

Use a pipeline as a high-level helper

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Model Card Authors [optional]

Model Card Contact

Collection including alokabhishek/Llama-2-7b-chat-hf-bnb-4bit

Meta-Llama-2-7b-chat-hf-Quantized