AITSecNER - Entity Recognition for Cybersecurity

This repository demonstrates how to use the AITSecNER model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text.

Installation

Install GLiNER via pip:

pip install gliner

Usage

Import and Load Model

Load the pretrained AITSecNER model directly from Hugging Face:

from gliner import GLiNER

model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True)

Predict Entities

Define the input text and entity labels you wish to extract:

# Example input text
text = """
Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states 
“THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros.
"""

# Entity labels
labels = [
    'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC', 
    'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL'
]

# Predict entities
entities = model.predict_entities(text, labels, threshold=0.5)

# Display results
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Sample Output

Emotet => MALWARE
Microsoft => ORG

Model Details

The AITSecNER model was fine-tuned using the urchade/gliner_small model from Hugging Face on the priamai/AnnoCTR dataset. For more details about the dataset, see the paper "AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports".

GLiNER is described in detail in the paper "GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer".

About

AITSecNER leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as:

Cyber threat intelligence analysis
Incident response documentation
Automated cybersecurity reporting

selfconstruct3d
/

AITSecNER