OrK7/parler_hate_speech · Hugging Face

Social Network Hate Detection: Finding Social Media Posts Containing Hateful Information Using Ensemble Methods and Back-Translation

Recent research efforts have been directed toward the development of automated systems for detecting hateful content to assist social media providers in identifying and removing such content before it can be viewed by the public. This paper introduces a unique ensemble approach that utilizes DeBERTa models, which benefits from pre-training on massive synthetic data and the integration of back-translation techniques during training and testing. Our findings reveal that this approach delivers state-of-the-art results in hate-speech detection. The results demonstrate that the combination of back-translation, ensemble, and test-time augmentation results in a considerable improvement across various metrics and models in both the Parler and GAB datasets. We show that our method reduces models’ bias in an effective and meaningful way, and also reduces the RMSE from 0.838 to around 0.766 and increases R-squared from 0.520 to 0.599. The biggest improvement was seen in small Deberate models, while for large models, there was either a minor improvement or no change.

Results

!pip install huggingface_hub
!pip install tokenizers transformers
!pip install iterative-stratification
!git clone https://github.com/OrKatz7/parler-hate-speech
%cd parler-hate-speech/src

from huggingface_hub import hf_hub_download
import torch
import sys
from model import CustomModel,MeanPooling
from transformers import AutoTokenizer, AutoModel, AutoConfig
import numpy as np
class CFG:
    model="microsoft/deberta-v3-base"
    target_cols=['label_mean']

name = "OrK7/parler_hate_speech"
downloaded_model_path = hf_hub_download(repo_id=name, filename="pytorch_model.bin")
model = torch.load(downloaded_model_path)
tokenizer = AutoTokenizer.from_pretrained(name)

def prepare_input(text):
    inputs = tokenizer.encode_plus(
        text, 
        return_tensors=None, 
        add_special_tokens=True, 
        max_length=512,
        pad_to_max_length=True,
        truncation=True
    )
    for k, v in inputs.items():
        inputs[k] = torch.tensor(np.array(v).reshape(1,-1), dtype=torch.long)
    return inputs

def collate(inputs):
    mask_len = int(inputs["attention_mask"].sum(axis=1).max())
    for k, v in inputs.items():
        inputs[k] = inputs[k][:,:mask_len]
    return inputs

from transformers import Pipeline
class HatePipeline(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        if "maybe_arg" in kwargs:
            preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
        return preprocess_kwargs, {}, {}

    def preprocess(self, inputs):
        out = prepare_input(inputs)
        return collate(out)

    def _forward(self, model_inputs):
        outputs = self.model(model_inputs)
        return outputs

    def postprocess(self, model_outputs):
        return np.array(model_outputs[0,0].numpy()).clip(0,1)*4+1

pipe = HatePipeline(model=model)
pipe("I Love you #")

results: 1.0

pipe("I Hate #$%#$%Jewish%$#@%^^@#")

results: 4.155200004577637