Testing notes and Recommendations

  1. Could you push safetensors to the repos instead of or in addition to pytorch? It greatly speeds up download speeds for users. You just need to add safe_serialization=True when you push to hub.

  2. There seems to be an issue with the tokenizer in this repo. It doesn't load smoothly.

ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
  1. The model is much much better than the previous sft for conversation. I tested the 13B model for passkey retrieval and it fails at 2k, 4k, 8k lengths. The base llama model succeeds at 2k, so it looks like performance there has degraded:
Length of input is 1915
[INST] we are here live in passkey-u89dsnakj8 Omaha Nebraska good morning everybody I'm Becky quick along with Mike santoli and in just 30 min...
...I guess we'll dig into that a little deeper today and we've got some numbers that you've been going through be very interested to hear the color on th

Respond with the passkey contained within the above text. [/INST]

Berkshire Hathaway's first quarter earnings were released moments ago, and the results are quite intriguing. The company reported earnings of $35.8 billion, a 13% increas...

The response is sensible, it just doesn't answer the request for the passkey.


Thanks for your messages.

  1. Thanks for your reminding. I will do so tomorrow.

  2. It is weird. I can load this model smoothly on my machine. Would you please provide me your environment message, like conda list ? I will test it in your environment.

  3. This is an interesting observation. During the sft, we previously train the models on all long-context data. And then the resulted model has a serious degradation on short QA. After that, we fuse the train data with short QA from alpaca. It obtains better performance on short QA and is also good at long QA. But it is then not good at retrieval. This is somewhat a trade-offs.

For models without sft, they are good at retrieval. Because they are trained consistently on long context data, like 32768. For example, our 7b 32k models, https://huggingface.co./Yukang/Llama-2-7b-longlora-32k-ft

Yukang Chen


I have uploaded the safetensors files.

Yukang Chen

Thanks Yukang for the safetensors. I saved a lot of time loading today. BTW, may be better to push bf16 since I don't know if anyone uses float32 and it's 2x as big.

Regarding point 2, here is 'pip list':

I'm using a jupyter notebook.


We have updated our LongAlpaca models from alpaca prompting to llama2 prompting, which is consistent to their pre-trained models. Please refer to the updated models and the inference code below with the llama2 prompting.


In addition, we updated our requirement list. We have test on it. It should be all right for training and inference.


Yukang Chen

Thanks Yukang.

Does this mean that all data used to train LongAlpaca models is generated with Llama 2 and thus has a Llama 2 type license (i.e. it is no longer limited by having used openai conversations)?



The training data is not generated by llama2. It is collected by ours. We just use the format of llama2 prompting.

    "<s>[INST] <<SYS>>\n"
    "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\n"
    "If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n"
    "<</SYS>> \n\n {instruction} [/INST]"

Yukang Chen

Thanks! That helps clarify.

I guess the Alpaca data is "The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. "

So that means this model still wouldn't be available for commercial use?

Yes. I think so.

