File size: 3,794 Bytes
2fcb419
 
 
7b70c23
 
 
 
 
 
 
 
 
 
2fcb419
8a77f3c
2fcb419
b6376b7
2fcb419
b6376b7
8a77f3c
2fcb419
8a77f3c
 
 
 
2fcb419
8a77f3c
 
 
 
 
 
 
 
 
2fcb419
8a77f3c
2fcb419
8a77f3c
2fcb419
8a77f3c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184544d
8a77f3c
 
 
 
 
 
 
 
 
 
 
 
 
f34efe3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6f168f
 
 
f34efe3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
base_model: meta-llama/Llama-3.2-1B
library_name: peft
datasets:
- cjziems/Article-Bias-Prediction
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- Bias
- News
- Articles
- Political
---
# **Llama-3.2-1B (Political Bias Detection)**

## **Overview** 

This model is designed to detect potential political bias in news articles. Given a text passage (e.g., a news article), the model returns probabilities indicating whether the text is leaning to the *Left*, *Center*, or *Right* of the political spectrum.
## **Model Description**

### **Model Architecture**
- **Base Model**: [meta-llama/Llama-3.2-1B](https://huggingface.co./meta-llama/Llama-3.2-1B)  
- **Adapters**: LoRA (Low-Rank Adaptation)  
- **Precision**: 4-bit quantization enabled for efficient inference and training (with nested/double quantization).  

### **Intended Use**
- **Primary**: Provide a text of a news article, the model outputs probabilities corresponding to three political bias labels: 
  - **LABEL_0**: Left  
  - **LABEL_1**: Center  
  - **LABEL_2**: Right  
- **Usage Scenarios**:  
  - Media research and analytics  
  - Automated or semi-automated political bias detection in digital news  
  - Educational or journalistic explorations of bias  

> **Note**: This model is *not* an authoritative arbiter of political bias. It can be used as a *supplementary* tool to help flag potential leanings.

---

## **How to Use**

Below is a sample code snippet demonstrating how to load the model and apply LoRA adapters for classification:

```python
import transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel

# 1. Load the *base* LLaMA model for sequence classification
base_model_name = "meta-llama/Llama-3.2-1B"
access_token = "YOUR_HF_ACCESS_TOKEN"  # If needed

model = AutoModelForSequenceClassification.from_pretrained(
    base_model_name,
    use_auth_token=access_token,
    num_labels=3,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# 2. Load the LoRA adapter on top of the base model
adapter_path = "tzoulio/news-bias-finder-llama-3.2-1B"
model = PeftModel.from_pretrained(model, adapter_path)

# 3. Create the pipeline with the specified model and tokenizer
pipeline = transformers.pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer
)

# Example usage
text = "Insert the news article text here..."
prediction = pipeline(text)
print(prediction)
```

### **Input / Output Details**

**Input**: A single string containing the text of a news article.
**Output**: A list of dictionaries, where each dictionary contains:
  - "label": The predicted label (e.g., "LABEL_2")
  - "score": The probability for that label.
```css
Example Output: [[{"LABEL_0": 0.23, "LABEL_1": 0.30, "LABEL_2": 0.47}]]
Indicates 23% chance of Left, 30% chance of Center, 47% chance of Right.

```

## **Training & Fine-tuning**
### **Dataset Sizes**

  - **Training Set**: 17,984 examples
  - **Evaluation Set**: 4,496 examples
  - **Test Set**: 5,620 examples

### **Hyperparameters and Important Settings**
```python
# Precision & Quantization
load_in_4bit = True
bnb_4bit_use_double_quant = True
bnb_4bit_quant_type = "nf4"
bnb_4bit_compute_dtype = torch.bfloat16

# LoRA Configuration
lora_r = 16
lora_alpha = 64
lora_dropout = 0.1
bias = "none"

# Task Type
task_type = "SEQ_CLS"

# Training Setup
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
learning_rate = 2e-4
optim = "paged_adamw_32bit"
num_train_epochs = 3
warmup_steps = 2
fp16 = True
logging_steps = 1
```
## **Evaluation**
### **Metrics**
We report the F1-score on each dataset split.
## **Results**

  - F1-Score (Training): 0.96658
  - F1-Score (Eval)    : 0.96664
  - F1-Score (Test)    : 0.96299