Visualize in Weights & Biases

distilbert-base-uncased: edu classifier

This is a (rare) encoder that supports flash attention 2! Use attn_implementation="flash_attention_2" when loading w/ FA2 installed for faster inference.

This model is a fine-tuned version of distilbert-base-uncased on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2324
  • Mse: 0.2324

Usage

Note this is for CPU, for GPU you will need to make some (small) changes.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("pszemraj/mpnet-base-edu-classifier")
model = AutoModelForSequenceClassification.from_pretrained("pszemraj/mpnet-base-edu-classifier")

text = "This is a test sentence."
inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
outputs = model(**inputs)
logits = outputs.logits.squeeze(-1).float().detach().numpy()
score = logits.item()
result = {
    "text": text,
    "score": score,
    "int_score": int(round(max(0, min(score, 5)))),
}

print(result)
# {'text': 'This is a test sentence.', 'score': 0.3350256383419037, 'int_score': 0}

Intended uses & limitations

Refer to the hf classifier's model card for more details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 90085
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0
Downloads last month
29
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/distilbert-base-uncased-edu-classifier

Finetuned
(7083)
this model

Dataset used to train pszemraj/distilbert-base-uncased-edu-classifier