---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- generated_from_trainer
- text-classification
- news-classification
- english
- modernbert
metrics:
- f1
model-index:
- name: ModernBERT-NewsClassifier-EN-small
  results: []
---

# ModernBERT-NewsClassifier-EN-small


This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co./answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set:

- **Validation Loss**: `3.1201`  
- **Weighted F1 Score**: `0.5475`
---
## Model Description

**Architecture**: This model is based on [ModernBERT-base](https://huggingface.co./answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.

**Task**: **Multi-class News Classification**  
- The model classifies English news headlines or short texts into one of 15 categories.

**Use Cases**:
- Automatically tagging news headlines with appropriate categories in editorial pipelines.
- Classifying short text blurbs for social media or aggregator systems.
- Building a quick filter for content-based recommendation engines.
---
## Intended Uses & Limitations

- **Intended for**: Users who need to categorize short English news texts into broad topics.  
- **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed.  
- **Limitations**:
  - Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
---

## Training and Evaluation Data

- **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.).  
- **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category.  
- **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples).  

### Categories

1. POLITICS  
2. WELLNESS  
3. ENTERTAINMENT  
4. TRAVEL  
5. STYLE & BEAUTY  
6. PARENTING  
7. HEALTHY LIVING  
8. QUEER VOICES  
9. FOOD & DRINK  
10. BUSINESS  
11. COMEDY  
12. SPORTS  
13. BLACK VOICES  
14. HOME & LIVING  
15. PARENTS  

---

## Training Procedure

### Hyperparameters

| Hyperparameter                | Value                  |
|------------------------------:|:-----------------------|
| **learning_rate**            | 5e-05                  |
| **train_batch_size**         | 8                      |
| **eval_batch_size**          | 4                      |
| **seed**                     | 42                     |
| **gradient_accumulation_steps** | 2                  |
| **total_train_batch_size**   | 16 (8 x 2)             |
| **optimizer**                | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) |
| **lr_scheduler_type**        | linear                 |
| **lr_scheduler_warmup_steps**| 100                    |
| **num_epochs**               | 5                      |

**Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency.  
**Loss Function**: Cross-entropy (with weighted F1 as metric).

---

## Training Results

| Training Loss | Epoch  | Step | Validation Loss | F1 (Weighted) |
|:-------------:|:------:|:----:|:---------------:|:-------------:|
| 2.6251        | 1.0    | 1688 | 1.3810          | 0.5543        |
| 1.9267        | 2.0    | 3376 | 1.4378          | 0.5588        |
| 0.6349        | 3.0    | 5064 | 2.1705          | 0.5415        |
| 0.1273        | 4.0    | 6752 | 2.9007          | 0.5402        |
| 0.0288        | 4.9973 | 8435 | 3.1201          | 0.5475        |

- **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set.

---

## Inference Example

Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly.

### 1) Quick Start with `pipeline`

```python
from transformers import pipeline

# Instantiate the pipeline
classifier = pipeline(
    "text-classification",
    model="Sengil/ModernBERT-NewsClassifier-EN-small"
)

# Sample text
text = "The President pledges new infrastructure initiatives amid economic concerns."
outputs = classifier(text)

# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
print(outputs)
```

### 2) Direct Model Usage

```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"

# Load model & tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

sample_text = "Local authorities call for better healthcare policies."
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

# Convert logits to probabilities
probs = F.softmax(logits, dim=1)[0]
predicted_label_id = torch.argmax(probs).item()

# Get the label string
id2label = model.config.id2label
predicted_label = id2label[predicted_label_id]
confidence_score = probs[predicted_label_id].item()

print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
```

---

## Additional Information

- **Framework Versions**:
  - **Transformers**: 4.49.0.dev0
  - **PyTorch**: 2.5.1+cu121
  - **Datasets**: 3.2.0
  - **Tokenizers**: 0.21.0

- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  
- **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co./answerdotai). This fine-tuned checkpoint inherits the same license.

---

**Citation** (If you use or extend this model in your research or applications, please consider citing it):
```
@misc{ModernBERTNewsClassifierENsmall,
  title={ModernBERT-NewsClassifier-EN-small},
  author={Mert Sengil},
  year={2025},
  howpublished={\url{https://huggingface.co./Sengil/ModernBERT-NewsClassifier-EN-small}},
}
```