Simple Stories

Simple Stories will be a series of small text generation models trained on the TinyStories dataset.

The goal is to experiment with creating small language models that can perform highly specific tasks. In this case, the task is generating children's stories.

Model Details

The model has 4M parameters (Safetensors seems to have inflated this to 13M, I will look into why in the future). This model has not been fine-tuned for instructions. It will simply spew out text when asked. I will be working on an instruct model in the coming days.

The model is a decoder only transformer model with 4 decoder layers and 2 attention heads. The model was trained for 3 epochs on only ~50MB of text and can already produce semi-coherent stories.

The code used to train the model can be found on my github.

Usage

  1. Import the relevant HuggingFace Auto classes and load model and tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
  1. Tokenize your input sequence and call the model.generate function
inputs = tokenizer("Once upon a time,", return_tensors="pt", return_attention_mask=False)
outputs = model.model.generate(inputs['input_ids'], 250)

Note that we are calling model.model.generate not just model.generate

  1. Decode the output and print the text
text = tokenizer.batch_decode(outputs)[0]
print(text)

Sample

Here is a short sample generated by the model.

Once upon a time, there was a little girl called Daisy. Daisy wanted to go to the park with her mommy. She packed some yummy food and chirpies and carried them . Daisy was so excited for her mommy to try. The puppy and Mommy brought a big spoon to make souping. Daisy loved swimming and jun ate until she was disappointed. They began to start playing in the garden. They gathered around and ate and boot into the bread . As Daisy got hungry on the grass, she found some magic. She read more to see what was Luckily, Daisy was very impressed. When the lady opened the pot, something tickling to another. It was a rare. Daisy was so happy that she gave the tunately. Daisy was no longer scared. She knew she had to tell Mommy at the store. She took her to the soup and opened the tasty hot chocolate. When Daisy gave it to Daisy and princessed around a special spoon every day.

No, the story doesn't fully make sense. But most of the words are valid English and the characters and overarching plot are consistent. This is progress :)

Going forward

The direct next step is creating a instruct model for interacting with and generating custom stories. After that I will continue working to improve the base model by increasing the amount of data it is trained on and continueing to experiment with different hyperparameters.

If you have any suggestions/questions, or you want to discuss anything about the model please reach out to me on twitter @_broskitweets.

Downloads last month
78
Safetensors
Model size
13.8M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train broskicodes/simple-stories-4M