--- base_model: amazingvince/zephyr-smol_llama-100m-sft-full inference: false license: apache-2.0 model-index: - name: zephyr-smol_llama-100m-sft-full results: [] model_creator: amazingvince model_name: zephyr-smol_llama-100m-sft-full pipeline_tag: text-generation quantized_by: afrideva tags: - generated_from_trainer - gguf - ggml - quantized - q2_k - q3_k_m - q4_k_m - q5_k_m - q6_k - q8_0 --- # amazingvince/zephyr-smol_llama-100m-sft-full-GGUF Quantized GGUF model files for [zephyr-smol_llama-100m-sft-full](https://huggingface.co./amazingvince/zephyr-smol_llama-100m-sft-full) from [amazingvince](https://huggingface.co./amazingvince) | Name | Quant method | Size | | ---- | ---- | ---- | | [zephyr-smol_llama-100m-sft-full.fp16.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.fp16.gguf) | fp16 | 204.25 MB | | [zephyr-smol_llama-100m-sft-full.q2_k.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf) | q2_k | 51.90 MB | | [zephyr-smol_llama-100m-sft-full.q3_k_m.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q3_k_m.gguf) | q3_k_m | 58.04 MB | | [zephyr-smol_llama-100m-sft-full.q4_k_m.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q4_k_m.gguf) | q4_k_m | 66.38 MB | | [zephyr-smol_llama-100m-sft-full.q5_k_m.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q5_k_m.gguf) | q5_k_m | 75.31 MB | | [zephyr-smol_llama-100m-sft-full.q6_k.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q6_k.gguf) | q6_k | 84.80 MB | | [zephyr-smol_llama-100m-sft-full.q8_0.gguf](https://huggingface.co./afrideva/zephyr-smol_llama-100m-sft-full-GGUF/resolve/main/zephyr-smol_llama-100m-sft-full.q8_0.gguf) | q8_0 | 109.33 MB | ## Original Model Card: # zephyr-smol_llama-100m-sft-full This model is a fine-tuned version of [BEE-spoke-data/smol_llama-101M-GQA](https://huggingface.co./BEE-spoke-data/smol_llama-101M-GQA) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.9579 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 1.9642 | 0.7 | 1141 | 1.9578 | ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.0 - Datasets 2.14.6 - Tokenizers 0.14.1