English
finance

Quantized to exl2 using Exllamav2 0.0.2

Instruction Pre-Training: Language Models are Supervised Multitask Learners

This repo contains the finance model developed from Llama3-8B in our paper Instruction Pre-Training: Language Models are Supervised Multitask Learners.

We explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train language models. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. Instruction Pre-Training outperforms Vanilla Pre-training in both general pre-training from scratch and domain-adaptive continual pre-training. In pre-training from scratch, Instruction Pre-Training not only improves pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B.

Resources

🤗 We share our data and models with example usages, feel free to open any issues or discussions! 🤗

Domain-Adaptive Continued Pre-Training

Following AdaptLLM, we augment the domain-specific raw corpora with instruction-response pairs generated by our context-based instruction synthesizer.

1. To chat with the finance-Llama3-8B model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("instruction-pretrain/finance-Llama3-8B")
tokenizer = AutoTokenizer.from_pretrained("instruction-pretrain/finance-Llama3-8B")

# Put your input here, NO prompt template is required
user_input = '''Use this fact to answer the question: Title of each class Trading Symbol(s) Name of each exchange on which registered
Common Stock, Par Value $.01 Per Share MMM New York Stock Exchange
MMM Chicago Stock Exchange, Inc.
1.500% Notes due 2026 MMM26 New York Stock Exchange
1.750% Notes due 2030 MMM30 New York Stock Exchange
1.500% Notes due 2031 MMM31 New York Stock Exchange

Which debt securities are registered to trade on a national securities exchange under 3M's name as of Q2 of 2023?'''

inputs = tokenizer(user_input, return_tensors="pt", add_special_tokens=True).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=400)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(pred)

2. To evaluate our models on the domain-specific tasks

  1. Set up dependencies
git clone https://github.com/microsoft/LMOps
cd LMOps/adaptllm
pip install -r requirements.txt
  1. Evaluate
DOMAIN='finance'

# if the model can fit on a single GPU: set MODEL_PARALLEL=False
# elif the model is too large to fit on a single GPU: set MODEL_PARALLEL=True
MODEL_PARALLEL=False

# number of GPUs, chosen from [1,2,4,8]
N_GPU=1

# Set as True
add_bos_token=True

bash scripts/inference.sh ${DOMAIN} 'instruction-pretrain/finance-Llama3-8B' ${add_bos_token} ${MODEL_PARALLEL} ${N_GPU}

Citation

If you find our work helpful, please cite us:

AdaptLLM

@inproceedings{
cheng2024adapting,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Datasets used to train MurtazaNasir/instruction-pretrain_finance-Llama3-8B-exl2