--- library_name: transformers license: apache-2.0 base_model: answerdotai/ModernBERT-base tags: - generated_from_trainer - text-classification - news-classification - english - modernbert metrics: - f1 model-index: - name: ModernBERT-NewsClassifier-EN-small results: [] --- # ModernBERT-NewsClassifier-EN-small This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co./answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set: - **Validation Loss**: `3.1201` - **Weighted F1 Score**: `0.5475` --- ## Model Description **Architecture**: This model is based on [ModernBERT-base](https://huggingface.co./answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs. **Task**: **Multi-class News Classification** - The model classifies English news headlines or short texts into one of 15 categories. **Use Cases**: - Automatically tagging news headlines with appropriate categories in editorial pipelines. - Classifying short text blurbs for social media or aggregator systems. - Building a quick filter for content-based recommendation engines. --- ## Intended Uses & Limitations - **Intended for**: Users who need to categorize short English news texts into broad topics. - **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed. - **Limitations**: - Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous. --- ## Training and Evaluation Data - **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.). - **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category. - **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples). ### Categories 1. POLITICS 2. WELLNESS 3. ENTERTAINMENT 4. TRAVEL 5. STYLE & BEAUTY 6. PARENTING 7. HEALTHY LIVING 8. QUEER VOICES 9. FOOD & DRINK 10. BUSINESS 11. COMEDY 12. SPORTS 13. BLACK VOICES 14. HOME & LIVING 15. PARENTS --- ## Training Procedure ### Hyperparameters | Hyperparameter | Value | |------------------------------:|:-----------------------| | **learning_rate** | 5e-05 | | **train_batch_size** | 8 | | **eval_batch_size** | 4 | | **seed** | 42 | | **gradient_accumulation_steps** | 2 | | **total_train_batch_size** | 16 (8 x 2) | | **optimizer** | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) | | **lr_scheduler_type** | linear | | **lr_scheduler_warmup_steps**| 100 | | **num_epochs** | 5 | **Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency. **Loss Function**: Cross-entropy (with weighted F1 as metric). --- ## Training Results | Training Loss | Epoch | Step | Validation Loss | F1 (Weighted) | |:-------------:|:------:|:----:|:---------------:|:-------------:| | 2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 | | 1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 | | 0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 | | 0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 | | 0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 | - **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set. --- ## Inference Example Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly. ### 1) Quick Start with `pipeline` ```python from transformers import pipeline # Instantiate the pipeline classifier = pipeline( "text-classification", model="Sengil/ModernBERT-NewsClassifier-EN-small" ) # Sample text text = "The President pledges new infrastructure initiatives amid economic concerns." outputs = classifier(text) # Output: [{'label': 'POLITICS', 'score': 0.95}, ...] print(outputs) ``` ### 2) Direct Model Usage ```python import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForSequenceClassification model_name = "Sengil/ModernBERT-NewsClassifier-EN-small" # Load model & tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) sample_text = "Local authorities call for better healthcare policies." inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): logits = model(**inputs).logits # Convert logits to probabilities probs = F.softmax(logits, dim=1)[0] predicted_label_id = torch.argmax(probs).item() # Get the label string id2label = model.config.id2label predicted_label = id2label[predicted_label_id] confidence_score = probs[predicted_label_id].item() print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}") ``` --- ## Additional Information - **Framework Versions**: - **Transformers**: 4.49.0.dev0 - **PyTorch**: 2.5.1+cu121 - **Datasets**: 3.2.0 - **Tokenizers**: 0.21.0 - **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) - **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co./answerdotai). This fine-tuned checkpoint inherits the same license. --- **Citation** (If you use or extend this model in your research or applications, please consider citing it): ``` @misc{ModernBERTNewsClassifierENsmall, title={ModernBERT-NewsClassifier-EN-small}, author={Mert Sengil}, year={2025}, howpublished={\url{https://huggingface.co./Sengil/ModernBERT-NewsClassifier-EN-small}}, } ```