distilgpt2-finetuned-stories
This model is a fine-tuned version of distilgpt2 on the demelin/understanding_fables dataset. It achieves the following results on the evaluation set:
- Loss: 3.3089
Autoregressive and Prefix Language Modelling
Language Modelling, especially text generation works on the principle of generating the next token based on its previous antecedents.
This is what Autoregressive modelling are based on, it predicts the next token i.e. word here on the basis of token preceding it. Here, we take P(wi|wi-1), where wi is next word and wi-1 is token preceeding it, and P is the probbaility pf generating wi wrt wi-1
But for Prefix Language modelling, we consider input into function and consider it in generation of our next word, i.e. the input is used as a context for generation of next tokens, calculating the conditional probability of next work wrt context. P(w|x), where w is next token and x is context and P is probability of getting w wrt x context.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 1.0 | 20 | 3.4065 |
No log | 2.0 | 40 | 3.3288 |
No log | 3.0 | 60 | 3.3089 |
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0
- Downloads last month
- 14
Model tree for Jjzzzz/distilgpt2-finetuned-stories
Base model
distilbert/distilgpt2