T5-for-information-extraction

This is an encoder-decoder model that was trained on various information extraction tasks, including text classification, named entity recognition, relation extraction and entity linking.

How to use:

First of all, initialize the model:

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device('cpu')

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")

model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie").to(device)

You need to set a prompt and put it with text to the model, below are examples of how to use it for different tasks:

named entity recognition

input_text = "Extract entity types from the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

text classification

input_text = "Classify the following text into the most relevant categories: Kyiv is the capital of Ukraine"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

relation extraction

input_text = "Extract relations between entities in the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Unlimited-classifier

With our unlimited-classifier you can use t5-for-ie to classify text into millions of categories. It applies generation with contraints that is super helful when structured and deterministic outputs are needed.

To install it, run the following command:

pip install -U unlimited-classifier

Right now you can try it with the following example:

from unlimited_classifier import TextClassifier

labels=[
        "e1 - capital of Ukraine",
        "e1 - capital of Poland",
        "e1 - European city",
        "e1 - Asian city",
        "e1 - small country"
    ]

classifier = TextClassifier(
    labels=['default'],
    model=model,
    tokenizer=tokenizer,
    device=device #if cuda 
)
classifier.initialize_labels_trie(labels)

text = "<e1>Kyiv</e1> is the capital <e2>Ukraine</e2>."

output = classifier.invoke(text)
print(output)

Turbo T5

We recommend to use this model on GPU with our TurboT5 package, it uses custom CUDA kernels that accelerate computations and allows much longer sequences.

First of all, you need to install the package

pip install turbot5 -U

Then you can import different heads for various purposes; we released more encoder heads for tasks such as token classification, question-answering or text classification and, of course, encoder-decoder heads for conditional generation:

from turbot5 import T5ForConditionalGeneration
from turbot5 import T5Config
from transformers import T5Tokenizer
import torch

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie",
 attention_type = 'flash', #put attention type you want to use
 use_triton=True).to('cuda')

Feedback

We value your input! Share your feedback and suggestions to help us improve our models. Fill out the feedback form

Join Our Discord

Connect with our community on Discord for news, support, and discussion about our models. Join Discord