Training in progress, step 300, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +25 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +31 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +216 -0
last-checkpoint/trainer_state.json +2166 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/Qwen2.5-Coder-1.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/Qwen2.5-Coder-1.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.04,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "gate_proj",
+    "k_proj",
+    "up_proj",
+    "o_proj",
+    "v_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e228b17a73bc3f81947c086347fce87ac063d7350d4c76e1f3155aca666ddba5
+size 295488936

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|PAD_TOKEN|>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:484b5ff183c7d02282937a1fb0b6a0beb0819da03c3b077a457d1db88a133e6f
+size 150487412

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5aa4de7450a948b9728f964c952892496ba1ad747b45f21e3f4394cdc4b34487
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97217b203f1ec36d5ff0b43e1fbe7c384792a66d6e8afc16c5c4e545b12b1358
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|PAD_TOKEN|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fab42efe8d17406525a9154b728cf9e957629a8ed7ce997770efdd71128c6a1a
+size 11422086

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,216 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<|PAD_TOKEN|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|PAD_TOKEN|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2166 @@

+{
+  "best_metric": 0.8408719301223755,
+  "best_model_checkpoint": "miner_id_24/checkpoint-300",
+  "epoch": 0.3018867924528302,
+  "eval_steps": 150,
+  "global_step": 300,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0010062893081761006,
+      "grad_norm": 0.34507399797439575,
+      "learning_rate": 3.0000000000000004e-07,
+      "loss": 0.6716,
+      "step": 1
+    },
+    {
+      "epoch": 0.0010062893081761006,
+      "eval_loss": 1.1011072397232056,
+      "eval_runtime": 72.3489,
+      "eval_samples_per_second": 46.262,
+      "eval_steps_per_second": 11.569,
+      "step": 1
+    },
+    {
+      "epoch": 0.002012578616352201,
+      "grad_norm": 0.310740202665329,
+      "learning_rate": 6.000000000000001e-07,
+      "loss": 0.6763,
+      "step": 2
+    },
+    {
+      "epoch": 0.0030188679245283017,
+      "grad_norm": 0.33329853415489197,
+      "learning_rate": 9.000000000000001e-07,
+      "loss": 0.7372,
+      "step": 3
+    },
+    {
+      "epoch": 0.004025157232704402,
+      "grad_norm": 0.3545615077018738,
+      "learning_rate": 1.2000000000000002e-06,
+      "loss": 0.7506,
+      "step": 4
+    },
+    {
+      "epoch": 0.005031446540880503,
+      "grad_norm": 0.3150895833969116,
+      "learning_rate": 1.5e-06,
+      "loss": 0.7077,
+      "step": 5
+    },
+    {
+      "epoch": 0.0060377358490566035,
+      "grad_norm": 0.2971160113811493,
+      "learning_rate": 1.8000000000000001e-06,
+      "loss": 0.7474,
+      "step": 6
+    },
+    {
+      "epoch": 0.007044025157232704,
+      "grad_norm": 0.3241315186023712,
+      "learning_rate": 2.1000000000000002e-06,
+      "loss": 0.7257,
+      "step": 7
+    },
+    {
+      "epoch": 0.008050314465408805,
+      "grad_norm": 0.3870707154273987,
+      "learning_rate": 2.4000000000000003e-06,
+      "loss": 0.8577,
+      "step": 8
+    },
+    {
+      "epoch": 0.009056603773584906,
+      "grad_norm": 0.42336294054985046,
+      "learning_rate": 2.7e-06,
+      "loss": 0.9492,
+      "step": 9
+    },
+    {
+      "epoch": 0.010062893081761006,
+      "grad_norm": 0.3823484778404236,
+      "learning_rate": 3e-06,
+      "loss": 0.9361,
+      "step": 10
+    },
+    {
+      "epoch": 0.011069182389937107,
+      "grad_norm": 0.39906373620033264,
+      "learning_rate": 3.3e-06,
+      "loss": 0.8811,
+      "step": 11
+    },
+    {
+      "epoch": 0.012075471698113207,
+      "grad_norm": 0.4224358797073364,
+      "learning_rate": 3.6000000000000003e-06,
+      "loss": 0.9665,
+      "step": 12
+    },
+    {
+      "epoch": 0.013081761006289308,
+      "grad_norm": 0.3891141712665558,
+      "learning_rate": 3.900000000000001e-06,
+      "loss": 0.9568,
+      "step": 13
+    },
+    {
+      "epoch": 0.014088050314465408,
+      "grad_norm": 0.44890257716178894,
+      "learning_rate": 4.2000000000000004e-06,
+      "loss": 0.9954,
+      "step": 14
+    },
+    {
+      "epoch": 0.01509433962264151,
+      "grad_norm": 0.3897688388824463,
+      "learning_rate": 4.5e-06,
+      "loss": 0.9744,
+      "step": 15
+    },
+    {
+      "epoch": 0.01610062893081761,
+      "grad_norm": 0.3893052041530609,
+      "learning_rate": 4.800000000000001e-06,
+      "loss": 0.9226,
+      "step": 16
+    },
+    {
+      "epoch": 0.01710691823899371,
+      "grad_norm": 0.4676080346107483,
+      "learning_rate": 5.1e-06,
+      "loss": 1.0271,
+      "step": 17
+    },
+    {
+      "epoch": 0.018113207547169812,
+      "grad_norm": 0.4254531264305115,
+      "learning_rate": 5.4e-06,
+      "loss": 0.9791,
+      "step": 18
+    },
+    {
+      "epoch": 0.019119496855345912,
+      "grad_norm": 0.4190981984138489,
+      "learning_rate": 5.7000000000000005e-06,
+      "loss": 1.011,
+      "step": 19
+    },
+    {
+      "epoch": 0.02012578616352201,
+      "grad_norm": 0.3880650997161865,
+      "learning_rate": 6e-06,
+      "loss": 1.0195,
+      "step": 20
+    },
+    {
+      "epoch": 0.021132075471698115,
+      "grad_norm": 0.385028213262558,
+      "learning_rate": 6.300000000000001e-06,
+      "loss": 1.05,
+      "step": 21
+    },
+    {
+      "epoch": 0.022138364779874214,
+      "grad_norm": 0.3784335255622864,
+      "learning_rate": 6.6e-06,
+      "loss": 1.0114,
+      "step": 22
+    },
+    {
+      "epoch": 0.023144654088050314,
+      "grad_norm": 0.37113457918167114,
+      "learning_rate": 6.9e-06,
+      "loss": 1.0068,
+      "step": 23
+    },
+    {
+      "epoch": 0.024150943396226414,
+      "grad_norm": 0.40349870920181274,
+      "learning_rate": 7.2000000000000005e-06,
+      "loss": 1.0286,
+      "step": 24
+    },
+    {
+      "epoch": 0.025157232704402517,
+      "grad_norm": 0.3880467712879181,
+      "learning_rate": 7.5e-06,
+      "loss": 1.0293,
+      "step": 25
+    },
+    {
+      "epoch": 0.026163522012578617,
+      "grad_norm": 0.4099384546279907,
+      "learning_rate": 7.800000000000002e-06,
+      "loss": 1.0203,
+      "step": 26
+    },
+    {
+      "epoch": 0.027169811320754716,
+      "grad_norm": 0.4069879651069641,
+      "learning_rate": 8.1e-06,
+      "loss": 0.9713,
+      "step": 27
+    },
+    {
+      "epoch": 0.028176100628930816,
+      "grad_norm": 0.4327114522457123,
+      "learning_rate": 8.400000000000001e-06,
+      "loss": 1.0237,
+      "step": 28
+    },
+    {
+      "epoch": 0.02918238993710692,
+      "grad_norm": 0.4324794411659241,
+      "learning_rate": 8.7e-06,
+      "loss": 0.9488,
+      "step": 29
+    },
+    {
+      "epoch": 0.03018867924528302,
+      "grad_norm": 0.46345117688179016,
+      "learning_rate": 9e-06,
+      "loss": 1.0397,
+      "step": 30
+    },
+    {
+      "epoch": 0.03119496855345912,
+      "grad_norm": 0.4823172688484192,
+      "learning_rate": 9.3e-06,
+      "loss": 0.9933,
+      "step": 31
+    },
+    {
+      "epoch": 0.03220125786163522,
+      "grad_norm": 0.4677373170852661,
+      "learning_rate": 9.600000000000001e-06,
+      "loss": 1.0105,
+      "step": 32
+    },
+    {
+      "epoch": 0.03320754716981132,
+      "grad_norm": 0.48578086495399475,
+      "learning_rate": 9.9e-06,
+      "loss": 1.0846,
+      "step": 33
+    },
+    {
+      "epoch": 0.03421383647798742,
+      "grad_norm": 0.5035053491592407,
+      "learning_rate": 1.02e-05,
+      "loss": 1.0945,
+      "step": 34
+    },
+    {
+      "epoch": 0.03522012578616352,
+      "grad_norm": 0.520107090473175,
+      "learning_rate": 1.0500000000000001e-05,
+      "loss": 1.0162,
+      "step": 35
+    },
+    {
+      "epoch": 0.036226415094339624,
+      "grad_norm": 0.5222728848457336,
+      "learning_rate": 1.08e-05,
+      "loss": 0.9715,
+      "step": 36
+    },
+    {
+      "epoch": 0.03723270440251572,
+      "grad_norm": 0.5960727334022522,
+      "learning_rate": 1.11e-05,
+      "loss": 1.165,
+      "step": 37
+    },
+    {
+      "epoch": 0.038238993710691824,
+      "grad_norm": 0.5684401392936707,
+      "learning_rate": 1.1400000000000001e-05,
+      "loss": 1.1352,
+      "step": 38
+    },
+    {
+      "epoch": 0.03924528301886793,
+      "grad_norm": 0.6207726001739502,
+      "learning_rate": 1.1700000000000001e-05,
+      "loss": 1.066,
+      "step": 39
+    },
+    {
+      "epoch": 0.04025157232704402,
+      "grad_norm": 0.6263030171394348,
+      "learning_rate": 1.2e-05,
+      "loss": 1.1443,
+      "step": 40
+    },
+    {
+      "epoch": 0.041257861635220126,
+      "grad_norm": 0.5903211832046509,
+      "learning_rate": 1.23e-05,
+      "loss": 1.1045,
+      "step": 41
+    },
+    {
+      "epoch": 0.04226415094339623,
+      "grad_norm": 0.6511650085449219,
+      "learning_rate": 1.2600000000000001e-05,
+      "loss": 1.049,
+      "step": 42
+    },
+    {
+      "epoch": 0.043270440251572326,
+      "grad_norm": 0.7136890292167664,
+      "learning_rate": 1.2900000000000002e-05,
+      "loss": 1.2288,
+      "step": 43
+    },
+    {
+      "epoch": 0.04427672955974843,
+      "grad_norm": 0.7521870732307434,
+      "learning_rate": 1.32e-05,
+      "loss": 1.2577,
+      "step": 44
+    },
+    {
+      "epoch": 0.045283018867924525,
+      "grad_norm": 0.7466375827789307,
+      "learning_rate": 1.3500000000000001e-05,
+      "loss": 1.2349,
+      "step": 45
+    },
+    {
+      "epoch": 0.04628930817610063,
+      "grad_norm": 0.845863401889801,
+      "learning_rate": 1.38e-05,
+      "loss": 1.3065,
+      "step": 46
+    },
+    {
+      "epoch": 0.04729559748427673,
+      "grad_norm": 0.945936381816864,
+      "learning_rate": 1.4100000000000002e-05,
+      "loss": 1.4172,
+      "step": 47
+    },
+    {
+      "epoch": 0.04830188679245283,
+      "grad_norm": 0.979352593421936,
+      "learning_rate": 1.4400000000000001e-05,
+      "loss": 1.2013,
+      "step": 48
+    },
+    {
+      "epoch": 0.04930817610062893,
+      "grad_norm": 1.2940466403961182,
+      "learning_rate": 1.47e-05,
+      "loss": 1.2903,
+      "step": 49
+    },
+    {
+      "epoch": 0.050314465408805034,
+      "grad_norm": 2.658918857574463,
+      "learning_rate": 1.5e-05,
+      "loss": 1.4156,
+      "step": 50
+    },
+    {
+      "epoch": 0.05132075471698113,
+      "grad_norm": 0.3177829682826996,
+      "learning_rate": 1.5300000000000003e-05,
+      "loss": 0.7362,
+      "step": 51
+    },
+    {
+      "epoch": 0.052327044025157234,
+      "grad_norm": 0.25816917419433594,
+      "learning_rate": 1.5600000000000003e-05,
+      "loss": 0.5831,
+      "step": 52
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 0.2627386748790741,
+      "learning_rate": 1.59e-05,
+      "loss": 0.5988,
+      "step": 53
+    },
+    {
+      "epoch": 0.05433962264150943,
+      "grad_norm": 0.2952287793159485,
+      "learning_rate": 1.62e-05,
+      "loss": 0.6925,
+      "step": 54
+    },
+    {
+      "epoch": 0.055345911949685536,
+      "grad_norm": 0.330160915851593,
+      "learning_rate": 1.65e-05,
+      "loss": 0.7455,
+      "step": 55
+    },
+    {
+      "epoch": 0.05635220125786163,
+      "grad_norm": 0.3190469443798065,
+      "learning_rate": 1.6800000000000002e-05,
+      "loss": 0.7658,
+      "step": 56
+    },
+    {
+      "epoch": 0.057358490566037736,
+      "grad_norm": 0.3036893308162689,
+      "learning_rate": 1.7100000000000002e-05,
+      "loss": 0.7393,
+      "step": 57
+    },
+    {
+      "epoch": 0.05836477987421384,
+      "grad_norm": 0.31386005878448486,
+      "learning_rate": 1.74e-05,
+      "loss": 0.7351,
+      "step": 58
+    },
+    {
+      "epoch": 0.059371069182389935,
+      "grad_norm": 0.36971819400787354,
+      "learning_rate": 1.77e-05,
+      "loss": 0.8898,
+      "step": 59
+    },
+    {
+      "epoch": 0.06037735849056604,
+      "grad_norm": 0.36234527826309204,
+      "learning_rate": 1.8e-05,
+      "loss": 0.8539,
+      "step": 60
+    },
+    {
+      "epoch": 0.06138364779874214,
+      "grad_norm": 0.2921310365200043,
+      "learning_rate": 1.83e-05,
+      "loss": 0.7897,
+      "step": 61
+    },
+    {
+      "epoch": 0.06238993710691824,
+      "grad_norm": 0.3270234763622284,
+      "learning_rate": 1.86e-05,
+      "loss": 0.872,
+      "step": 62
+    },
+    {
+      "epoch": 0.06339622641509433,
+      "grad_norm": 0.31730490922927856,
+      "learning_rate": 1.8900000000000002e-05,
+      "loss": 0.8473,
+      "step": 63
+    },
+    {
+      "epoch": 0.06440251572327044,
+      "grad_norm": 0.3125281035900116,
+      "learning_rate": 1.9200000000000003e-05,
+      "loss": 0.8964,
+      "step": 64
+    },
+    {
+      "epoch": 0.06540880503144654,
+      "grad_norm": 0.3257617652416229,
+      "learning_rate": 1.9500000000000003e-05,
+      "loss": 0.9189,
+      "step": 65
+    },
+    {
+      "epoch": 0.06641509433962264,
+      "grad_norm": 0.3161460757255554,
+      "learning_rate": 1.98e-05,
+      "loss": 0.8713,
+      "step": 66
+    },
+    {
+      "epoch": 0.06742138364779875,
+      "grad_norm": 0.31015118956565857,
+      "learning_rate": 2.01e-05,
+      "loss": 0.8749,
+      "step": 67
+    },
+    {
+      "epoch": 0.06842767295597484,
+      "grad_norm": 0.32371950149536133,
+      "learning_rate": 2.04e-05,
+      "loss": 0.8607,
+      "step": 68
+    },
+    {
+      "epoch": 0.06943396226415094,
+      "grad_norm": 0.31971532106399536,
+      "learning_rate": 2.0700000000000002e-05,
+      "loss": 0.9075,
+      "step": 69
+    },
+    {
+      "epoch": 0.07044025157232704,
+      "grad_norm": 0.3293505907058716,
+      "learning_rate": 2.1000000000000002e-05,
+      "loss": 0.8704,
+      "step": 70
+    },
+    {
+      "epoch": 0.07144654088050315,
+      "grad_norm": 0.3107757866382599,
+      "learning_rate": 2.1300000000000003e-05,
+      "loss": 0.9121,
+      "step": 71
+    },
+    {
+      "epoch": 0.07245283018867925,
+      "grad_norm": 0.3114102780818939,
+      "learning_rate": 2.16e-05,
+      "loss": 0.8762,
+      "step": 72
+    },
+    {
+      "epoch": 0.07345911949685535,
+      "grad_norm": 0.34217050671577454,
+      "learning_rate": 2.1900000000000004e-05,
+      "loss": 0.8136,
+      "step": 73
+    },
+    {
+      "epoch": 0.07446540880503144,
+      "grad_norm": 0.34302064776420593,
+      "learning_rate": 2.22e-05,
+      "loss": 0.9353,
+      "step": 74
+    },
+    {
+      "epoch": 0.07547169811320754,
+      "grad_norm": 0.3483302891254425,
+      "learning_rate": 2.25e-05,
+      "loss": 0.9131,
+      "step": 75
+    },
+    {
+      "epoch": 0.07647798742138365,
+      "grad_norm": 0.3389265537261963,
+      "learning_rate": 2.2800000000000002e-05,
+      "loss": 0.8294,
+      "step": 76
+    },
+    {
+      "epoch": 0.07748427672955975,
+      "grad_norm": 0.3670293390750885,
+      "learning_rate": 2.31e-05,
+      "loss": 0.9307,
+      "step": 77
+    },
+    {
+      "epoch": 0.07849056603773585,
+      "grad_norm": 0.3749659061431885,
+      "learning_rate": 2.3400000000000003e-05,
+      "loss": 0.974,
+      "step": 78
+    },
+    {
+      "epoch": 0.07949685534591194,
+      "grad_norm": 0.40262284874916077,
+      "learning_rate": 2.37e-05,
+      "loss": 0.864,
+      "step": 79
+    },
+    {
+      "epoch": 0.08050314465408805,
+      "grad_norm": 0.40775737166404724,
+      "learning_rate": 2.4e-05,
+      "loss": 0.9825,
+      "step": 80
+    },
+    {
+      "epoch": 0.08150943396226415,
+      "grad_norm": 0.43855974078178406,
+      "learning_rate": 2.4300000000000005e-05,
+      "loss": 1.0478,
+      "step": 81
+    },
+    {
+      "epoch": 0.08251572327044025,
+      "grad_norm": 0.43877044320106506,
+      "learning_rate": 2.46e-05,
+      "loss": 0.9031,
+      "step": 82
+    },
+    {
+      "epoch": 0.08352201257861636,
+      "grad_norm": 0.43200093507766724,
+      "learning_rate": 2.4900000000000002e-05,
+      "loss": 0.8508,
+      "step": 83
+    },
+    {
+      "epoch": 0.08452830188679246,
+      "grad_norm": 0.4515688717365265,
+      "learning_rate": 2.5200000000000003e-05,
+      "loss": 1.043,
+      "step": 84
+    },
+    {
+      "epoch": 0.08553459119496855,
+      "grad_norm": 0.48888376355171204,
+      "learning_rate": 2.55e-05,
+      "loss": 0.9322,
+      "step": 85
+    },
+    {
+      "epoch": 0.08654088050314465,
+      "grad_norm": 0.5086196064949036,
+      "learning_rate": 2.5800000000000004e-05,
+      "loss": 0.9251,
+      "step": 86
+    },
+    {
+      "epoch": 0.08754716981132075,
+      "grad_norm": 0.5476843118667603,
+      "learning_rate": 2.61e-05,
+      "loss": 0.9828,
+      "step": 87
+    },
+    {
+      "epoch": 0.08855345911949686,
+      "grad_norm": 0.5471930503845215,
+      "learning_rate": 2.64e-05,
+      "loss": 1.0144,
+      "step": 88
+    },
+    {
+      "epoch": 0.08955974842767296,
+      "grad_norm": 0.49611005187034607,
+      "learning_rate": 2.6700000000000005e-05,
+      "loss": 0.8959,
+      "step": 89
+    },
+    {
+      "epoch": 0.09056603773584905,
+      "grad_norm": 0.5451375842094421,
+      "learning_rate": 2.7000000000000002e-05,
+      "loss": 0.9002,
+      "step": 90
+    },
+    {
+      "epoch": 0.09157232704402515,
+      "grad_norm": 0.604302704334259,
+      "learning_rate": 2.7300000000000003e-05,
+      "loss": 0.9707,
+      "step": 91
+    },
+    {
+      "epoch": 0.09257861635220126,
+      "grad_norm": 0.6355348825454712,
+      "learning_rate": 2.76e-05,
+      "loss": 1.0894,
+      "step": 92
+    },
+    {
+      "epoch": 0.09358490566037736,
+      "grad_norm": 0.6959015130996704,
+      "learning_rate": 2.79e-05,
+      "loss": 1.1298,
+      "step": 93
+    },
+    {
+      "epoch": 0.09459119496855346,
+      "grad_norm": 0.7101635932922363,
+      "learning_rate": 2.8200000000000004e-05,
+      "loss": 1.0116,
+      "step": 94
+    },
+    {
+      "epoch": 0.09559748427672957,
+      "grad_norm": 0.7091540694236755,
+      "learning_rate": 2.85e-05,
+      "loss": 1.0213,
+      "step": 95
+    },
+    {
+      "epoch": 0.09660377358490566,
+      "grad_norm": 0.7268860340118408,
+      "learning_rate": 2.8800000000000002e-05,
+      "loss": 0.9775,
+      "step": 96
+    },
+    {
+      "epoch": 0.09761006289308176,
+      "grad_norm": 0.7862218022346497,
+      "learning_rate": 2.91e-05,
+      "loss": 1.1169,
+      "step": 97
+    },
+    {
+      "epoch": 0.09861635220125786,
+      "grad_norm": 0.9230831861495972,
+      "learning_rate": 2.94e-05,
+      "loss": 1.0902,
+      "step": 98
+    },
+    {
+      "epoch": 0.09962264150943397,
+      "grad_norm": 1.130565881729126,
+      "learning_rate": 2.9700000000000004e-05,
+      "loss": 1.2398,
+      "step": 99
+    },
+    {
+      "epoch": 0.10062893081761007,
+      "grad_norm": 1.6438910961151123,
+      "learning_rate": 3e-05,
+      "loss": 1.191,
+      "step": 100
+    },
+    {
+      "epoch": 0.10163522012578616,
+      "grad_norm": 0.2781599462032318,
+      "learning_rate": 3.03e-05,
+      "loss": 0.5496,
+      "step": 101
+    },
+    {
+      "epoch": 0.10264150943396226,
+      "grad_norm": 0.4129364788532257,
+      "learning_rate": 3.0600000000000005e-05,
+      "loss": 0.6212,
+      "step": 102
+    },
+    {
+      "epoch": 0.10364779874213836,
+      "grad_norm": 0.38384121656417847,
+      "learning_rate": 3.09e-05,
+      "loss": 0.6719,
+      "step": 103
+    },
+    {
+      "epoch": 0.10465408805031447,
+      "grad_norm": 0.29824548959732056,
+      "learning_rate": 3.1200000000000006e-05,
+      "loss": 0.6618,
+      "step": 104
+    },
+    {
+      "epoch": 0.10566037735849057,
+      "grad_norm": 0.3344751000404358,
+      "learning_rate": 3.15e-05,
+      "loss": 0.7009,
+      "step": 105
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 0.2800855338573456,
+      "learning_rate": 3.18e-05,
+      "loss": 0.6809,
+      "step": 106
+    },
+    {
+      "epoch": 0.10767295597484276,
+      "grad_norm": 0.2987403869628906,
+      "learning_rate": 3.21e-05,
+      "loss": 0.7403,
+      "step": 107
+    },
+    {
+      "epoch": 0.10867924528301887,
+      "grad_norm": 0.3085695207118988,
+      "learning_rate": 3.24e-05,
+      "loss": 0.7999,
+      "step": 108
+    },
+    {
+      "epoch": 0.10968553459119497,
+      "grad_norm": 0.36500483751296997,
+      "learning_rate": 3.27e-05,
+      "loss": 0.8602,
+      "step": 109
+    },
+    {
+      "epoch": 0.11069182389937107,
+      "grad_norm": 0.31299957633018494,
+      "learning_rate": 3.3e-05,
+      "loss": 0.8269,
+      "step": 110
+    },
+    {
+      "epoch": 0.11169811320754718,
+      "grad_norm": 0.34132349491119385,
+      "learning_rate": 3.33e-05,
+      "loss": 0.8163,
+      "step": 111
+    },
+    {
+      "epoch": 0.11270440251572326,
+      "grad_norm": 0.34863606095314026,
+      "learning_rate": 3.3600000000000004e-05,
+      "loss": 0.8515,
+      "step": 112
+    },
+    {
+      "epoch": 0.11371069182389937,
+      "grad_norm": 0.347277969121933,
+      "learning_rate": 3.39e-05,
+      "loss": 0.8504,
+      "step": 113
+    },
+    {
+      "epoch": 0.11471698113207547,
+      "grad_norm": 0.28663745522499084,
+      "learning_rate": 3.4200000000000005e-05,
+      "loss": 0.7537,
+      "step": 114
+    },
+    {
+      "epoch": 0.11572327044025157,
+      "grad_norm": 0.3243389427661896,
+      "learning_rate": 3.4500000000000005e-05,
+      "loss": 0.7972,
+      "step": 115
+    },
+    {
+      "epoch": 0.11672955974842768,
+      "grad_norm": 0.33570441603660583,
+      "learning_rate": 3.48e-05,
+      "loss": 0.9207,
+      "step": 116
+    },
+    {
+      "epoch": 0.11773584905660377,
+      "grad_norm": 0.329550564289093,
+      "learning_rate": 3.5100000000000006e-05,
+      "loss": 0.8054,
+      "step": 117
+    },
+    {
+      "epoch": 0.11874213836477987,
+      "grad_norm": 0.35539188981056213,
+      "learning_rate": 3.54e-05,
+      "loss": 0.8552,
+      "step": 118
+    },
+    {
+      "epoch": 0.11974842767295597,
+      "grad_norm": 0.3422081172466278,
+      "learning_rate": 3.57e-05,
+      "loss": 0.8242,
+      "step": 119
+    },
+    {
+      "epoch": 0.12075471698113208,
+      "grad_norm": 0.3479245603084564,
+      "learning_rate": 3.6e-05,
+      "loss": 0.9574,
+      "step": 120
+    },
+    {
+      "epoch": 0.12176100628930818,
+      "grad_norm": 0.3424389064311981,
+      "learning_rate": 3.63e-05,
+      "loss": 0.8586,
+      "step": 121
+    },
+    {
+      "epoch": 0.12276729559748428,
+      "grad_norm": 0.3260529339313507,
+      "learning_rate": 3.66e-05,
+      "loss": 0.8643,
+      "step": 122
+    },
+    {
+      "epoch": 0.12377358490566037,
+      "grad_norm": 0.32257863879203796,
+      "learning_rate": 3.69e-05,
+      "loss": 0.7959,
+      "step": 123
+    },
+    {
+      "epoch": 0.12477987421383648,
+      "grad_norm": 0.3997071087360382,
+      "learning_rate": 3.72e-05,
+      "loss": 0.9142,
+      "step": 124
+    },
+    {
+      "epoch": 0.12578616352201258,
+      "grad_norm": 0.34371042251586914,
+      "learning_rate": 3.7500000000000003e-05,
+      "loss": 0.8725,
+      "step": 125
+    },
+    {
+      "epoch": 0.12679245283018867,
+      "grad_norm": 0.3547852635383606,
+      "learning_rate": 3.7800000000000004e-05,
+      "loss": 0.8642,
+      "step": 126
+    },
+    {
+      "epoch": 0.12779874213836478,
+      "grad_norm": 0.39939579367637634,
+      "learning_rate": 3.8100000000000005e-05,
+      "loss": 0.8844,
+      "step": 127
+    },
+    {
+      "epoch": 0.12880503144654087,
+      "grad_norm": 0.43842682242393494,
+      "learning_rate": 3.8400000000000005e-05,
+      "loss": 0.9023,
+      "step": 128
+    },
+    {
+      "epoch": 0.129811320754717,
+      "grad_norm": 0.39390280842781067,
+      "learning_rate": 3.87e-05,
+      "loss": 0.8407,
+      "step": 129
+    },
+    {
+      "epoch": 0.13081761006289308,
+      "grad_norm": 0.4113910496234894,
+      "learning_rate": 3.9000000000000006e-05,
+      "loss": 0.8262,
+      "step": 130
+    },
+    {
+      "epoch": 0.13182389937106917,
+      "grad_norm": 0.4252544641494751,
+      "learning_rate": 3.93e-05,
+      "loss": 0.9217,
+      "step": 131
+    },
+    {
+      "epoch": 0.1328301886792453,
+      "grad_norm": 0.4059774577617645,
+      "learning_rate": 3.96e-05,
+      "loss": 0.891,
+      "step": 132
+    },
+    {
+      "epoch": 0.13383647798742138,
+      "grad_norm": 0.4556514024734497,
+      "learning_rate": 3.990000000000001e-05,
+      "loss": 0.8422,
+      "step": 133
+    },
+    {
+      "epoch": 0.1348427672955975,
+      "grad_norm": 0.4458742141723633,
+      "learning_rate": 4.02e-05,
+      "loss": 0.8469,
+      "step": 134
+    },
+    {
+      "epoch": 0.13584905660377358,
+      "grad_norm": 0.5275019407272339,
+      "learning_rate": 4.05e-05,
+      "loss": 0.9503,
+      "step": 135
+    },
+    {
+      "epoch": 0.13685534591194967,
+      "grad_norm": 0.5571326017379761,
+      "learning_rate": 4.08e-05,
+      "loss": 0.9006,
+      "step": 136
+    },
+    {
+      "epoch": 0.1378616352201258,
+      "grad_norm": 0.5211470723152161,
+      "learning_rate": 4.11e-05,
+      "loss": 0.896,
+      "step": 137
+    },
+    {
+      "epoch": 0.13886792452830188,
+      "grad_norm": 0.5354921221733093,
+      "learning_rate": 4.1400000000000003e-05,
+      "loss": 0.8761,
+      "step": 138
+    },
+    {
+      "epoch": 0.139874213836478,
+      "grad_norm": 0.5351850986480713,
+      "learning_rate": 4.1700000000000004e-05,
+      "loss": 0.913,
+      "step": 139
+    },
+    {
+      "epoch": 0.14088050314465408,
+      "grad_norm": 0.5430857539176941,
+      "learning_rate": 4.2000000000000004e-05,
+      "loss": 0.9542,
+      "step": 140
+    },
+    {
+      "epoch": 0.1418867924528302,
+      "grad_norm": 0.6346920728683472,
+      "learning_rate": 4.23e-05,
+      "loss": 0.9038,
+      "step": 141
+    },
+    {
+      "epoch": 0.1428930817610063,
+      "grad_norm": 0.6297568678855896,
+      "learning_rate": 4.2600000000000005e-05,
+      "loss": 0.9314,
+      "step": 142
+    },
+    {
+      "epoch": 0.14389937106918238,
+      "grad_norm": 0.699191689491272,
+      "learning_rate": 4.2900000000000006e-05,
+      "loss": 0.9658,
+      "step": 143
+    },
+    {
+      "epoch": 0.1449056603773585,
+      "grad_norm": 0.6862769722938538,
+      "learning_rate": 4.32e-05,
+      "loss": 1.0581,
+      "step": 144
+    },
+    {
+      "epoch": 0.1459119496855346,
+      "grad_norm": 0.7385261058807373,
+      "learning_rate": 4.35e-05,
+      "loss": 1.0378,
+      "step": 145
+    },
+    {
+      "epoch": 0.1469182389937107,
+      "grad_norm": 0.8822638988494873,
+      "learning_rate": 4.380000000000001e-05,
+      "loss": 1.0691,
+      "step": 146
+    },
+    {
+      "epoch": 0.1479245283018868,
+      "grad_norm": 0.8276723027229309,
+      "learning_rate": 4.41e-05,
+      "loss": 1.0017,
+      "step": 147
+    },
+    {
+      "epoch": 0.14893081761006288,
+      "grad_norm": 0.9372941851615906,
+      "learning_rate": 4.44e-05,
+      "loss": 1.1179,
+      "step": 148
+    },
+    {
+      "epoch": 0.149937106918239,
+      "grad_norm": 1.1694546937942505,
+      "learning_rate": 4.47e-05,
+      "loss": 1.0597,
+      "step": 149
+    },
+    {
+      "epoch": 0.1509433962264151,
+      "grad_norm": 2.0057129859924316,
+      "learning_rate": 4.5e-05,
+      "loss": 0.9425,
+      "step": 150
+    },
+    {
+      "epoch": 0.1509433962264151,
+      "eval_loss": 0.8647085428237915,
+      "eval_runtime": 72.4006,
+      "eval_samples_per_second": 46.229,
+      "eval_steps_per_second": 11.561,
+      "step": 150
+    },
+    {
+      "epoch": 0.1519496855345912,
+      "grad_norm": 0.288259893655777,
+      "learning_rate": 4.5299999999999997e-05,
+      "loss": 0.4872,
+      "step": 151
+    },
+    {
+      "epoch": 0.1529559748427673,
+      "grad_norm": 0.3159155249595642,
+      "learning_rate": 4.5600000000000004e-05,
+      "loss": 0.4908,
+      "step": 152
+    },
+    {
+      "epoch": 0.15396226415094338,
+      "grad_norm": 0.3822842836380005,
+      "learning_rate": 4.5900000000000004e-05,
+      "loss": 0.6747,
+      "step": 153
+    },
+    {
+      "epoch": 0.1549685534591195,
+      "grad_norm": 0.34155163168907166,
+      "learning_rate": 4.62e-05,
+      "loss": 0.6869,
+      "step": 154
+    },
+    {
+      "epoch": 0.1559748427672956,
+      "grad_norm": 0.30352383852005005,
+      "learning_rate": 4.6500000000000005e-05,
+      "loss": 0.7332,
+      "step": 155
+    },
+    {
+      "epoch": 0.1569811320754717,
+      "grad_norm": 0.27222898602485657,
+      "learning_rate": 4.6800000000000006e-05,
+      "loss": 0.6474,
+      "step": 156
+    },
+    {
+      "epoch": 0.1579874213836478,
+      "grad_norm": 0.25012049078941345,
+      "learning_rate": 4.71e-05,
+      "loss": 0.6288,
+      "step": 157
+    },
+    {
+      "epoch": 0.1589937106918239,
+      "grad_norm": 0.3045409023761749,
+      "learning_rate": 4.74e-05,
+      "loss": 0.7165,
+      "step": 158
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.35603249073028564,
+      "learning_rate": 4.770000000000001e-05,
+      "loss": 0.7622,
+      "step": 159
+    },
+    {
+      "epoch": 0.1610062893081761,
+      "grad_norm": 0.3265491724014282,
+      "learning_rate": 4.8e-05,
+      "loss": 0.7179,
+      "step": 160
+    },
+    {
+      "epoch": 0.1620125786163522,
+      "grad_norm": 0.3346395194530487,
+      "learning_rate": 4.83e-05,
+      "loss": 0.795,
+      "step": 161
+    },
+    {
+      "epoch": 0.1630188679245283,
+      "grad_norm": 0.30010277032852173,
+      "learning_rate": 4.860000000000001e-05,
+      "loss": 0.7067,
+      "step": 162
+    },
+    {
+      "epoch": 0.16402515723270442,
+      "grad_norm": 0.2951001524925232,
+      "learning_rate": 4.89e-05,
+      "loss": 0.7975,
+      "step": 163
+    },
+    {
+      "epoch": 0.1650314465408805,
+      "grad_norm": 0.31591150164604187,
+      "learning_rate": 4.92e-05,
+      "loss": 0.8215,
+      "step": 164
+    },
+    {
+      "epoch": 0.1660377358490566,
+      "grad_norm": 0.2965446412563324,
+      "learning_rate": 4.9500000000000004e-05,
+      "loss": 0.7495,
+      "step": 165
+    },
+    {
+      "epoch": 0.1670440251572327,
+      "grad_norm": 0.2988322675228119,
+      "learning_rate": 4.9800000000000004e-05,
+      "loss": 0.7753,
+      "step": 166
+    },
+    {
+      "epoch": 0.1680503144654088,
+      "grad_norm": 0.3438805937767029,
+      "learning_rate": 5.01e-05,
+      "loss": 0.8675,
+      "step": 167
+    },
+    {
+      "epoch": 0.16905660377358492,
+      "grad_norm": 0.3534359633922577,
+      "learning_rate": 5.0400000000000005e-05,
+      "loss": 0.8761,
+      "step": 168
+    },
+    {
+      "epoch": 0.170062893081761,
+      "grad_norm": 0.366860032081604,
+      "learning_rate": 5.0700000000000006e-05,
+      "loss": 0.7727,
+      "step": 169
+    },
+    {
+      "epoch": 0.1710691823899371,
+      "grad_norm": 0.325207382440567,
+      "learning_rate": 5.1e-05,
+      "loss": 0.8902,
+      "step": 170
+    },
+    {
+      "epoch": 0.1720754716981132,
+      "grad_norm": 0.3613484501838684,
+      "learning_rate": 5.13e-05,
+      "loss": 0.8652,
+      "step": 171
+    },
+    {
+      "epoch": 0.1730817610062893,
+      "grad_norm": 0.34094613790512085,
+      "learning_rate": 5.160000000000001e-05,
+      "loss": 0.8303,
+      "step": 172
+    },
+    {
+      "epoch": 0.17408805031446542,
+      "grad_norm": 0.3416270613670349,
+      "learning_rate": 5.19e-05,
+      "loss": 0.8074,
+      "step": 173
+    },
+    {
+      "epoch": 0.1750943396226415,
+      "grad_norm": 0.35190239548683167,
+      "learning_rate": 5.22e-05,
+      "loss": 0.8995,
+      "step": 174
+    },
+    {
+      "epoch": 0.1761006289308176,
+      "grad_norm": 0.355774849653244,
+      "learning_rate": 5.250000000000001e-05,
+      "loss": 0.7938,
+      "step": 175
+    },
+    {
+      "epoch": 0.17710691823899372,
+      "grad_norm": 0.40358009934425354,
+      "learning_rate": 5.28e-05,
+      "loss": 0.8999,
+      "step": 176
+    },
+    {
+      "epoch": 0.1781132075471698,
+      "grad_norm": 0.3395152986049652,
+      "learning_rate": 5.31e-05,
+      "loss": 0.8034,
+      "step": 177
+    },
+    {
+      "epoch": 0.17911949685534592,
+      "grad_norm": 0.3633538484573364,
+      "learning_rate": 5.340000000000001e-05,
+      "loss": 0.8634,
+      "step": 178
+    },
+    {
+      "epoch": 0.180125786163522,
+      "grad_norm": 0.367371141910553,
+      "learning_rate": 5.3700000000000004e-05,
+      "loss": 0.8588,
+      "step": 179
+    },
+    {
+      "epoch": 0.1811320754716981,
+      "grad_norm": 0.38628652691841125,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 0.9101,
+      "step": 180
+    },
+    {
+      "epoch": 0.18213836477987422,
+      "grad_norm": 0.4008401036262512,
+      "learning_rate": 5.4300000000000005e-05,
+      "loss": 0.8333,
+      "step": 181
+    },
+    {
+      "epoch": 0.1831446540880503,
+      "grad_norm": 0.3967512547969818,
+      "learning_rate": 5.4600000000000006e-05,
+      "loss": 0.8003,
+      "step": 182
+    },
+    {
+      "epoch": 0.18415094339622642,
+      "grad_norm": 0.4281199276447296,
+      "learning_rate": 5.49e-05,
+      "loss": 0.8564,
+      "step": 183
+    },
+    {
+      "epoch": 0.1851572327044025,
+      "grad_norm": 0.44826948642730713,
+      "learning_rate": 5.52e-05,
+      "loss": 0.9245,
+      "step": 184
+    },
+    {
+      "epoch": 0.1861635220125786,
+      "grad_norm": 0.46945154666900635,
+      "learning_rate": 5.550000000000001e-05,
+      "loss": 0.9026,
+      "step": 185
+    },
+    {
+      "epoch": 0.18716981132075472,
+      "grad_norm": 0.5053539872169495,
+      "learning_rate": 5.58e-05,
+      "loss": 0.8419,
+      "step": 186
+    },
+    {
+      "epoch": 0.1881761006289308,
+      "grad_norm": 0.539570152759552,
+      "learning_rate": 5.61e-05,
+      "loss": 0.933,
+      "step": 187
+    },
+    {
+      "epoch": 0.18918238993710693,
+      "grad_norm": 0.5156318545341492,
+      "learning_rate": 5.640000000000001e-05,
+      "loss": 0.8187,
+      "step": 188
+    },
+    {
+      "epoch": 0.19018867924528302,
+      "grad_norm": 0.5218313932418823,
+      "learning_rate": 5.67e-05,
+      "loss": 0.9329,
+      "step": 189
+    },
+    {
+      "epoch": 0.19119496855345913,
+      "grad_norm": 0.5657551288604736,
+      "learning_rate": 5.7e-05,
+      "loss": 0.9744,
+      "step": 190
+    },
+    {
+      "epoch": 0.19220125786163522,
+      "grad_norm": 0.6356487274169922,
+      "learning_rate": 5.730000000000001e-05,
+      "loss": 1.1001,
+      "step": 191
+    },
+    {
+      "epoch": 0.1932075471698113,
+      "grad_norm": 0.5877925753593445,
+      "learning_rate": 5.7600000000000004e-05,
+      "loss": 0.8659,
+      "step": 192
+    },
+    {
+      "epoch": 0.19421383647798743,
+      "grad_norm": 0.6765493750572205,
+      "learning_rate": 5.7900000000000005e-05,
+      "loss": 0.9198,
+      "step": 193
+    },
+    {
+      "epoch": 0.19522012578616352,
+      "grad_norm": 0.7951050400733948,
+      "learning_rate": 5.82e-05,
+      "loss": 1.0845,
+      "step": 194
+    },
+    {
+      "epoch": 0.19622641509433963,
+      "grad_norm": 0.7103424668312073,
+      "learning_rate": 5.8500000000000006e-05,
+      "loss": 0.9394,
+      "step": 195
+    },
+    {
+      "epoch": 0.19723270440251572,
+      "grad_norm": 0.8180978298187256,
+      "learning_rate": 5.88e-05,
+      "loss": 1.0566,
+      "step": 196
+    },
+    {
+      "epoch": 0.1982389937106918,
+      "grad_norm": 0.87216717004776,
+      "learning_rate": 5.91e-05,
+      "loss": 1.0402,
+      "step": 197
+    },
+    {
+      "epoch": 0.19924528301886793,
+      "grad_norm": 0.9373968839645386,
+      "learning_rate": 5.940000000000001e-05,
+      "loss": 1.0354,
+      "step": 198
+    },
+    {
+      "epoch": 0.20025157232704402,
+      "grad_norm": 1.068832516670227,
+      "learning_rate": 5.97e-05,
+      "loss": 0.9874,
+      "step": 199
+    },
+    {
+      "epoch": 0.20125786163522014,
+      "grad_norm": 1.516627311706543,
+      "learning_rate": 6e-05,
+      "loss": 1.0475,
+      "step": 200
+    },
+    {
+      "epoch": 0.20226415094339623,
+      "grad_norm": 0.2923840880393982,
+      "learning_rate": 6.030000000000001e-05,
+      "loss": 0.5281,
+      "step": 201
+    },
+    {
+      "epoch": 0.20327044025157232,
+      "grad_norm": 0.29050689935684204,
+      "learning_rate": 6.06e-05,
+      "loss": 0.4798,
+      "step": 202
+    },
+    {
+      "epoch": 0.20427672955974843,
+      "grad_norm": 0.4115563929080963,
+      "learning_rate": 6.09e-05,
+      "loss": 0.6778,
+      "step": 203
+    },
+    {
+      "epoch": 0.20528301886792452,
+      "grad_norm": 0.3951669931411743,
+      "learning_rate": 6.120000000000001e-05,
+      "loss": 0.6916,
+      "step": 204
+    },
+    {
+      "epoch": 0.20628930817610064,
+      "grad_norm": 0.2904520332813263,
+      "learning_rate": 6.15e-05,
+      "loss": 0.6837,
+      "step": 205
+    },
+    {
+      "epoch": 0.20729559748427673,
+      "grad_norm": 0.2610388696193695,
+      "learning_rate": 6.18e-05,
+      "loss": 0.6065,
+      "step": 206
+    },
+    {
+      "epoch": 0.20830188679245282,
+      "grad_norm": 0.23877385258674622,
+      "learning_rate": 6.21e-05,
+      "loss": 0.5818,
+      "step": 207
+    },
+    {
+      "epoch": 0.20930817610062893,
+      "grad_norm": 0.3209669589996338,
+      "learning_rate": 6.240000000000001e-05,
+      "loss": 0.6615,
+      "step": 208
+    },
+    {
+      "epoch": 0.21031446540880502,
+      "grad_norm": 0.373333603143692,
+      "learning_rate": 6.27e-05,
+      "loss": 0.7769,
+      "step": 209
+    },
+    {
+      "epoch": 0.21132075471698114,
+      "grad_norm": 0.3545984923839569,
+      "learning_rate": 6.3e-05,
+      "loss": 0.7624,
+      "step": 210
+    },
+    {
+      "epoch": 0.21232704402515723,
+      "grad_norm": 0.3442912697792053,
+      "learning_rate": 6.330000000000001e-05,
+      "loss": 0.744,
+      "step": 211
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 0.3227010667324066,
+      "learning_rate": 6.36e-05,
+      "loss": 0.8385,
+      "step": 212
+    },
+    {
+      "epoch": 0.21433962264150944,
+      "grad_norm": 0.2944405674934387,
+      "learning_rate": 6.39e-05,
+      "loss": 0.7201,
+      "step": 213
+    },
+    {
+      "epoch": 0.21534591194968553,
+      "grad_norm": 0.3218012750148773,
+      "learning_rate": 6.42e-05,
+      "loss": 0.799,
+      "step": 214
+    },
+    {
+      "epoch": 0.21635220125786164,
+      "grad_norm": 0.3213767111301422,
+      "learning_rate": 6.450000000000001e-05,
+      "loss": 0.7873,
+      "step": 215
+    },
+    {
+      "epoch": 0.21735849056603773,
+      "grad_norm": 0.3547811806201935,
+      "learning_rate": 6.48e-05,
+      "loss": 0.8379,
+      "step": 216
+    },
+    {
+      "epoch": 0.21836477987421385,
+      "grad_norm": 0.3962251543998718,
+      "learning_rate": 6.510000000000001e-05,
+      "loss": 0.8377,
+      "step": 217
+    },
+    {
+      "epoch": 0.21937106918238994,
+      "grad_norm": 0.33094266057014465,
+      "learning_rate": 6.54e-05,
+      "loss": 0.7869,
+      "step": 218
+    },
+    {
+      "epoch": 0.22037735849056603,
+      "grad_norm": 0.3101328909397125,
+      "learning_rate": 6.57e-05,
+      "loss": 0.805,
+      "step": 219
+    },
+    {
+      "epoch": 0.22138364779874214,
+      "grad_norm": 0.3209860622882843,
+      "learning_rate": 6.6e-05,
+      "loss": 0.8163,
+      "step": 220
+    },
+    {
+      "epoch": 0.22238993710691823,
+      "grad_norm": 0.340444952249527,
+      "learning_rate": 6.630000000000001e-05,
+      "loss": 0.9052,
+      "step": 221
+    },
+    {
+      "epoch": 0.22339622641509435,
+      "grad_norm": 0.36003029346466064,
+      "learning_rate": 6.66e-05,
+      "loss": 0.8657,
+      "step": 222
+    },
+    {
+      "epoch": 0.22440251572327044,
+      "grad_norm": 0.40881285071372986,
+      "learning_rate": 6.69e-05,
+      "loss": 0.8777,
+      "step": 223
+    },
+    {
+      "epoch": 0.22540880503144653,
+      "grad_norm": 0.3710480332374573,
+      "learning_rate": 6.720000000000001e-05,
+      "loss": 0.8709,
+      "step": 224
+    },
+    {
+      "epoch": 0.22641509433962265,
+      "grad_norm": 0.35468587279319763,
+      "learning_rate": 6.75e-05,
+      "loss": 0.8358,
+      "step": 225
+    },
+    {
+      "epoch": 0.22742138364779874,
+      "grad_norm": 0.3594996929168701,
+      "learning_rate": 6.78e-05,
+      "loss": 0.8274,
+      "step": 226
+    },
+    {
+      "epoch": 0.22842767295597485,
+      "grad_norm": 0.34879687428474426,
+      "learning_rate": 6.81e-05,
+      "loss": 0.7641,
+      "step": 227
+    },
+    {
+      "epoch": 0.22943396226415094,
+      "grad_norm": 0.3559417128562927,
+      "learning_rate": 6.840000000000001e-05,
+      "loss": 0.8445,
+      "step": 228
+    },
+    {
+      "epoch": 0.23044025157232703,
+      "grad_norm": 0.3670806884765625,
+      "learning_rate": 6.87e-05,
+      "loss": 0.7682,
+      "step": 229
+    },
+    {
+      "epoch": 0.23144654088050315,
+      "grad_norm": 0.36322924494743347,
+      "learning_rate": 6.900000000000001e-05,
+      "loss": 0.8262,
+      "step": 230
+    },
+    {
+      "epoch": 0.23245283018867924,
+      "grad_norm": 0.3891243040561676,
+      "learning_rate": 6.93e-05,
+      "loss": 0.8628,
+      "step": 231
+    },
+    {
+      "epoch": 0.23345911949685536,
+      "grad_norm": 0.39287006855010986,
+      "learning_rate": 6.96e-05,
+      "loss": 0.7999,
+      "step": 232
+    },
+    {
+      "epoch": 0.23446540880503144,
+      "grad_norm": 0.45216286182403564,
+      "learning_rate": 6.99e-05,
+      "loss": 0.8008,
+      "step": 233
+    },
+    {
+      "epoch": 0.23547169811320753,
+      "grad_norm": 0.41083312034606934,
+      "learning_rate": 7.020000000000001e-05,
+      "loss": 0.7043,
+      "step": 234
+    },
+    {
+      "epoch": 0.23647798742138365,
+      "grad_norm": 0.467481404542923,
+      "learning_rate": 7.05e-05,
+      "loss": 0.9042,
+      "step": 235
+    },
+    {
+      "epoch": 0.23748427672955974,
+      "grad_norm": 0.47072046995162964,
+      "learning_rate": 7.08e-05,
+      "loss": 0.8493,
+      "step": 236
+    },
+    {
+      "epoch": 0.23849056603773586,
+      "grad_norm": 0.5148504972457886,
+      "learning_rate": 7.110000000000001e-05,
+      "loss": 0.8944,
+      "step": 237
+    },
+    {
+      "epoch": 0.23949685534591195,
+      "grad_norm": 0.4873788356781006,
+      "learning_rate": 7.14e-05,
+      "loss": 0.8484,
+      "step": 238
+    },
+    {
+      "epoch": 0.24050314465408806,
+      "grad_norm": 0.5475208163261414,
+      "learning_rate": 7.170000000000001e-05,
+      "loss": 0.938,
+      "step": 239
+    },
+    {
+      "epoch": 0.24150943396226415,
+      "grad_norm": 0.562898576259613,
+      "learning_rate": 7.2e-05,
+      "loss": 0.968,
+      "step": 240
+    },
+    {
+      "epoch": 0.24251572327044024,
+      "grad_norm": 0.5775845050811768,
+      "learning_rate": 7.230000000000001e-05,
+      "loss": 0.9167,
+      "step": 241
+    },
+    {
+      "epoch": 0.24352201257861636,
+      "grad_norm": 0.6406755447387695,
+      "learning_rate": 7.26e-05,
+      "loss": 0.9806,
+      "step": 242
+    },
+    {
+      "epoch": 0.24452830188679245,
+      "grad_norm": 0.7658033967018127,
+      "learning_rate": 7.290000000000001e-05,
+      "loss": 1.0084,
+      "step": 243
+    },
+    {
+      "epoch": 0.24553459119496857,
+      "grad_norm": 0.6914187669754028,
+      "learning_rate": 7.32e-05,
+      "loss": 1.0369,
+      "step": 244
+    },
+    {
+      "epoch": 0.24654088050314465,
+      "grad_norm": 0.7622518539428711,
+      "learning_rate": 7.35e-05,
+      "loss": 1.0694,
+      "step": 245
+    },
+    {
+      "epoch": 0.24754716981132074,
+      "grad_norm": 0.7900513410568237,
+      "learning_rate": 7.38e-05,
+      "loss": 0.9953,
+      "step": 246
+    },
+    {
+      "epoch": 0.24855345911949686,
+      "grad_norm": 0.7765729427337646,
+      "learning_rate": 7.410000000000001e-05,
+      "loss": 0.9145,
+      "step": 247
+    },
+    {
+      "epoch": 0.24955974842767295,
+      "grad_norm": 0.9165322780609131,
+      "learning_rate": 7.44e-05,
+      "loss": 0.9706,
+      "step": 248
+    },
+    {
+      "epoch": 0.25056603773584907,
+      "grad_norm": 1.0286595821380615,
+      "learning_rate": 7.47e-05,
+      "loss": 0.885,
+      "step": 249
+    },
+    {
+      "epoch": 0.25157232704402516,
+      "grad_norm": 1.5639967918395996,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 1.1141,
+      "step": 250
+    },
+    {
+      "epoch": 0.25257861635220125,
+      "grad_norm": 0.2546844184398651,
+      "learning_rate": 7.53e-05,
+      "loss": 0.4977,
+      "step": 251
+    },
+    {
+      "epoch": 0.25358490566037734,
+      "grad_norm": 0.29418128728866577,
+      "learning_rate": 7.560000000000001e-05,
+      "loss": 0.601,
+      "step": 252
+    },
+    {
+      "epoch": 0.2545911949685535,
+      "grad_norm": 0.25388413667678833,
+      "learning_rate": 7.590000000000002e-05,
+      "loss": 0.5847,
+      "step": 253
+    },
+    {
+      "epoch": 0.25559748427672957,
+      "grad_norm": 0.2464444786310196,
+      "learning_rate": 7.620000000000001e-05,
+      "loss": 0.6033,
+      "step": 254
+    },
+    {
+      "epoch": 0.25660377358490566,
+      "grad_norm": 0.2632049024105072,
+      "learning_rate": 7.65e-05,
+      "loss": 0.679,
+      "step": 255
+    },
+    {
+      "epoch": 0.25761006289308175,
+      "grad_norm": 0.2602216899394989,
+      "learning_rate": 7.680000000000001e-05,
+      "loss": 0.6452,
+      "step": 256
+    },
+    {
+      "epoch": 0.25861635220125784,
+      "grad_norm": 0.2294234186410904,
+      "learning_rate": 7.71e-05,
+      "loss": 0.6278,
+      "step": 257
+    },
+    {
+      "epoch": 0.259622641509434,
+      "grad_norm": 0.25241509079933167,
+      "learning_rate": 7.74e-05,
+      "loss": 0.6991,
+      "step": 258
+    },
+    {
+      "epoch": 0.26062893081761007,
+      "grad_norm": 0.31433945894241333,
+      "learning_rate": 7.77e-05,
+      "loss": 0.7421,
+      "step": 259
+    },
+    {
+      "epoch": 0.26163522012578616,
+      "grad_norm": 0.27750471234321594,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.7088,
+      "step": 260
+    },
+    {
+      "epoch": 0.26264150943396225,
+      "grad_norm": 0.2663845121860504,
+      "learning_rate": 7.83e-05,
+      "loss": 0.6647,
+      "step": 261
+    },
+    {
+      "epoch": 0.26364779874213834,
+      "grad_norm": 0.2817147374153137,
+      "learning_rate": 7.86e-05,
+      "loss": 0.7404,
+      "step": 262
+    },
+    {
+      "epoch": 0.2646540880503145,
+      "grad_norm": 0.2903977632522583,
+      "learning_rate": 7.890000000000001e-05,
+      "loss": 0.7516,
+      "step": 263
+    },
+    {
+      "epoch": 0.2656603773584906,
+      "grad_norm": 0.2894536554813385,
+      "learning_rate": 7.92e-05,
+      "loss": 0.7456,
+      "step": 264
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 0.3023775517940521,
+      "learning_rate": 7.950000000000001e-05,
+      "loss": 0.7271,
+      "step": 265
+    },
+    {
+      "epoch": 0.26767295597484275,
+      "grad_norm": 0.3151356279850006,
+      "learning_rate": 7.980000000000002e-05,
+      "loss": 0.7934,
+      "step": 266
+    },
+    {
+      "epoch": 0.26867924528301884,
+      "grad_norm": 0.33253538608551025,
+      "learning_rate": 8.010000000000001e-05,
+      "loss": 0.8091,
+      "step": 267
+    },
+    {
+      "epoch": 0.269685534591195,
+      "grad_norm": 0.31868165731430054,
+      "learning_rate": 8.04e-05,
+      "loss": 0.8129,
+      "step": 268
+    },
+    {
+      "epoch": 0.2706918238993711,
+      "grad_norm": 0.310516893863678,
+      "learning_rate": 8.07e-05,
+      "loss": 0.8405,
+      "step": 269
+    },
+    {
+      "epoch": 0.27169811320754716,
+      "grad_norm": 0.33763372898101807,
+      "learning_rate": 8.1e-05,
+      "loss": 0.8847,
+      "step": 270
+    },
+    {
+      "epoch": 0.27270440251572325,
+      "grad_norm": 0.3426532447338104,
+      "learning_rate": 8.13e-05,
+      "loss": 0.8414,
+      "step": 271
+    },
+    {
+      "epoch": 0.27371069182389934,
+      "grad_norm": 0.32909977436065674,
+      "learning_rate": 8.16e-05,
+      "loss": 0.7759,
+      "step": 272
+    },
+    {
+      "epoch": 0.2747169811320755,
+      "grad_norm": 0.32656487822532654,
+      "learning_rate": 8.190000000000001e-05,
+      "loss": 0.8483,
+      "step": 273
+    },
+    {
+      "epoch": 0.2757232704402516,
+      "grad_norm": 0.34775760769844055,
+      "learning_rate": 8.22e-05,
+      "loss": 0.8266,
+      "step": 274
+    },
+    {
+      "epoch": 0.27672955974842767,
+      "grad_norm": 0.3375285565853119,
+      "learning_rate": 8.25e-05,
+      "loss": 0.8038,
+      "step": 275
+    },
+    {
+      "epoch": 0.27773584905660376,
+      "grad_norm": 0.33193573355674744,
+      "learning_rate": 8.280000000000001e-05,
+      "loss": 0.8453,
+      "step": 276
+    },
+    {
+      "epoch": 0.2787421383647799,
+      "grad_norm": 0.3424042761325836,
+      "learning_rate": 8.31e-05,
+      "loss": 0.8105,
+      "step": 277
+    },
+    {
+      "epoch": 0.279748427672956,
+      "grad_norm": 0.34351646900177,
+      "learning_rate": 8.340000000000001e-05,
+      "loss": 0.9062,
+      "step": 278
+    },
+    {
+      "epoch": 0.2807547169811321,
+      "grad_norm": 0.3484840393066406,
+      "learning_rate": 8.370000000000002e-05,
+      "loss": 0.8524,
+      "step": 279
+    },
+    {
+      "epoch": 0.28176100628930817,
+      "grad_norm": 0.4118354022502899,
+      "learning_rate": 8.400000000000001e-05,
+      "loss": 0.8195,
+      "step": 280
+    },
+    {
+      "epoch": 0.28276729559748426,
+      "grad_norm": 0.3858529329299927,
+      "learning_rate": 8.43e-05,
+      "loss": 0.8834,
+      "step": 281
+    },
+    {
+      "epoch": 0.2837735849056604,
+      "grad_norm": 0.43352559208869934,
+      "learning_rate": 8.46e-05,
+      "loss": 0.9224,
+      "step": 282
+    },
+    {
+      "epoch": 0.2847798742138365,
+      "grad_norm": 0.4096381664276123,
+      "learning_rate": 8.49e-05,
+      "loss": 0.8505,
+      "step": 283
+    },
+    {
+      "epoch": 0.2857861635220126,
+      "grad_norm": 0.4414840638637543,
+      "learning_rate": 8.520000000000001e-05,
+      "loss": 0.8873,
+      "step": 284
+    },
+    {
+      "epoch": 0.28679245283018867,
+      "grad_norm": 0.42483052611351013,
+      "learning_rate": 8.55e-05,
+      "loss": 0.844,
+      "step": 285
+    },
+    {
+      "epoch": 0.28779874213836476,
+      "grad_norm": 0.47505491971969604,
+      "learning_rate": 8.580000000000001e-05,
+      "loss": 0.8902,
+      "step": 286
+    },
+    {
+      "epoch": 0.2888050314465409,
+      "grad_norm": 0.4909922778606415,
+      "learning_rate": 8.61e-05,
+      "loss": 0.9051,
+      "step": 287
+    },
+    {
+      "epoch": 0.289811320754717,
+      "grad_norm": 0.47221359610557556,
+      "learning_rate": 8.64e-05,
+      "loss": 0.8076,
+      "step": 288
+    },
+    {
+      "epoch": 0.2908176100628931,
+      "grad_norm": 0.492624431848526,
+      "learning_rate": 8.67e-05,
+      "loss": 0.863,
+      "step": 289
+    },
+    {
+      "epoch": 0.2918238993710692,
+      "grad_norm": 0.5304532647132874,
+      "learning_rate": 8.7e-05,
+      "loss": 0.8684,
+      "step": 290
+    },
+    {
+      "epoch": 0.29283018867924526,
+      "grad_norm": 0.5642457604408264,
+      "learning_rate": 8.730000000000001e-05,
+      "loss": 0.9337,
+      "step": 291
+    },
+    {
+      "epoch": 0.2938364779874214,
+      "grad_norm": 0.5815471410751343,
+      "learning_rate": 8.760000000000002e-05,
+      "loss": 0.9618,
+      "step": 292
+    },
+    {
+      "epoch": 0.2948427672955975,
+      "grad_norm": 0.6020749807357788,
+      "learning_rate": 8.790000000000001e-05,
+      "loss": 0.9948,
+      "step": 293
+    },
+    {
+      "epoch": 0.2958490566037736,
+      "grad_norm": 0.5898910164833069,
+      "learning_rate": 8.82e-05,
+      "loss": 0.9052,
+      "step": 294
+    },
+    {
+      "epoch": 0.2968553459119497,
+      "grad_norm": 0.7975096106529236,
+      "learning_rate": 8.85e-05,
+      "loss": 0.9531,
+      "step": 295
+    },
+    {
+      "epoch": 0.29786163522012576,
+      "grad_norm": 0.821437418460846,
+      "learning_rate": 8.88e-05,
+      "loss": 1.0587,
+      "step": 296
+    },
+    {
+      "epoch": 0.2988679245283019,
+      "grad_norm": 0.8027580976486206,
+      "learning_rate": 8.910000000000001e-05,
+      "loss": 1.0143,
+      "step": 297
+    },
+    {
+      "epoch": 0.299874213836478,
+      "grad_norm": 0.8040540218353271,
+      "learning_rate": 8.94e-05,
+      "loss": 0.9964,
+      "step": 298
+    },
+    {
+      "epoch": 0.3008805031446541,
+      "grad_norm": 0.9765298962593079,
+      "learning_rate": 8.970000000000001e-05,
+      "loss": 1.0449,
+      "step": 299
+    },
+    {
+      "epoch": 0.3018867924528302,
+      "grad_norm": 1.631256341934204,
+      "learning_rate": 9e-05,
+      "loss": 0.9422,
+      "step": 300
+    },
+    {
+      "epoch": 0.3018867924528302,
+      "eval_loss": 0.8408719301223755,
+      "eval_runtime": 72.1015,
+      "eval_samples_per_second": 46.421,
+      "eval_steps_per_second": 11.609,
+      "step": 300
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 600,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 300,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 4,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.7158725269716992e+17,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c5ef83948df5db42f26b475e9297655348dd354de8f9033fdfe8214e9d2b6f1
+size 6840

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff