distilgpt2-finetuned-stories

This model is a fine-tuned version of distilgpt2 on the demelin/understanding_fables dataset. It achieves the following results on the evaluation set:

Loss: 3.3089

Autoregressive and Prefix Language Modelling

Language Modelling, especially text generation works on the principle of generating the next token based on its previous antecedents.

This is what Autoregressive modelling are based on, it predicts the next token i.e. word here on the basis of token preceding it. Here, we take P(wi|wi-1), where wi is next word and wi-1 is token preceeding it, and P is the probbaility pf generating wi wrt wi-1

But for Prefix Language modelling, we consider input into function and consider it in generation of our next word, i.e. the input is used as a context for generation of next tokens, calculating the conditional probability of next work wrt context. P(w|x), where w is next token and x is context and P is probability of getting w wrt x context.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	20	3.4065
No log	2.0	40	3.3288
No log	3.0	60	3.3089

Framework versions

Transformers 4.36.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.0

Jjzzzz
/

distilgpt2-finetuned-stories