umarbutler
commited on
Commit
•
06071d6
1
Parent(s):
80abc11
Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,6 @@ The training dataset was subsequently fed to [DistilGPT2](https://huggingface.co
|
|
57 |
| Batch size per device | 4 |
|
58 |
| Weight decay | 0.01 |
|
59 |
| Warmup ratio | 0.06 |
|
60 |
-
| Gradient accumulation steps | 1 |
|
61 |
|
62 |
After training for 3 epochs, or 465,441 steps, over a period of ~40 hours on a single GeForce RTX 2080 Ti, the model achieved a loss of 0.65.
|
63 |
|
|
|
57 |
| Batch size per device | 4 |
|
58 |
| Weight decay | 0.01 |
|
59 |
| Warmup ratio | 0.06 |
|
|
|
60 |
|
61 |
After training for 3 epochs, or 465,441 steps, over a period of ~40 hours on a single GeForce RTX 2080 Ti, the model achieved a loss of 0.65.
|
62 |
|