FINAL BENCHMARKING

  • Time to First Token (TTFT): 0.001s
  • Time Per Output Token (TPOT): 41.83ms/token
  • Throughput (token/s): 24.35token/s
  • Average Token Latency (ms/token): 41.92ms/token
  • Total Generation Time: 18.427s
  • Input Tokenization Time: 0.009s
  • Input Tokens: 1909
  • Output Tokens: 443
  • Total Tokens: 2352
  • Memory Usage (GPU): 3.38GB

Uploaded model

  • Developed by: vietphuon
  • License: apache-2.0
  • Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for vietphuon/Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1

Collection including vietphuon/Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1