FineLlama-3.2-3B-Instruct-16bit-ead
This repository contains a fine-tuned version of LLaMa-3.2-3B-Instruct specifically trained to understand and generate EAD (Encoded Archival Description) XML format for archival records description.
Model Description
- Base Model: meta-llama/Llama-3.2-3B-Instruct
- Training Dataset: Geraldine/Ead-Instruct-12k
- Task: Generation of EAD/XML compliant archival descriptions
- Training Type: Instruction fine-tuning with PEFT (Parameter Efficient Fine-Tuning) using LoRA
Key Features
- Specialized in generating EAD/XML format for archival metadata
- Trained on a comprehensive dataset of EAD/XML examples
- Optimized for archival description tasks
- Memory efficient through 4-bit quantization
Training Details
Technical Specifications
Quantization: 4-bit quantization using bitsandbytes
- NF4 quantization type
- Double quantization enabled
- bfloat16 compute dtype
LoRA Configuration
- r: 256
- alpha: 128
- dropout: 0.05
- target modules: all-linear
Training parameters
- Epochs: 3
- Batch size: 3
- Gradient accumulation steps: 2
- Learning rate: 2e-4
- Warmup ratio: 0.03
- Max sequence length: 2048
- Scheduler: Constant
Training Infrastructure
- Libraries: transformers, peft, trl
- Mixed Precision: FP16/BF16 (based on hardware support)
- Optimizer: fused adamw
Training Notebook
The training Notebook is available on Kaggle
Usage
Installation
pip install transformers torch bitsandbytes
Loading the model
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
from peft import PeftModel, PeftConfig
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name = "Geraldine/FineLlama-3.2-3B-Instruct-16bit-ead"
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
quantization_config=bnb_config
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)
Example usage
messages = [
{"role": "system", "content": "You are an expert in EAD/XML generation for archival records metadata."},
{"role": "user", "content": "Generate a minimal and compliant <eadheader> template with all required EAD/XML tags"},
]
inputs = tokenizer.apply_chat_template(
messages,
return_dict=True,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(**inputs,
max_new_tokens = 1024,
pad_token_id=tokenizer.eos_token_id,
use_cache = True,)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- The model is specifically trained for EAD/XML format and may not perform well on general archival tasks
- Performance depends on the quality and specificity of the input prompts
- Maximum sequence length is limited to 2048 tokens
Citation [optional]
BibTeX:
@misc{ead-llama,
author = {Géraldine Geoffroy},
title = {EAD-XML LLaMa: Fine-tuned LLaMa Model for Archival Description},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace Repository},
howpublished = {\url{https://huggingface.co./Geraldine/FineLlama-3.2-3B-Instruct-16bit-ead}}
}
Licence
This model is subject to the same license as the base LLaMa model. Please refer to Meta's LLaMa license for usage terms and conditions.
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.