vietphuon
/

Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

FINAL BENCHMARKING

Time to First Token (TTFT): 0.001s
Time Per Output Token (TPOT): 41.83ms/token
Throughput (token/s): 24.35token/s
Average Token Latency (ms/token): 41.92ms/token
Total Generation Time: 18.427s
Input Tokenization Time: 0.009s
Input Tokens: 1909
Output Tokens: 443
Total Tokens: 2352
Memory Usage (GPU): 3.38GB

Uploaded model

Developed by: vietphuon
License: apache-2.0
Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Model tree for vietphuon/Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Finetuned

(127)

this model

Collection including vietphuon/Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1

Released fine-tuned QuizGen models

Most current fine-tuned and tested models for Quizgen downstream task from Rockship Co. • 4 items • Updated Oct 30