no tokenizer file is present in the model

#2
by AD233 - opened

how to recreate or use model without tokenizer?

load the tokenizer from base model

tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
Like :

!pip install -q git+https://github.com/huggingface/transformers.git
from transformers import pipeline, AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")

# Load model
classifier = pipeline(
    task="text-classification",
    model="prithivMLmods/MBERT-Context-Specifier",
    tokenizer=tokenizer,
    device=0
)

# Sample text
sample = "The global market for sustainable technologies has seen rapid growth over the past decade as businesses increasingly prioritize environmental sustainability."

# Run classification
result = classifier(sample)
print(result)
tokenizer_config.json:   0%|          | 0.00/20.8k [00:00<?, ?B/s]tokenizer.json:   0%|          | 0.00/2.13M [00:00<?, ?B/s]special_tokens_map.json:   0%|          | 0.00/694 [00:00<?, ?B/s]config.json:   0%|          | 0.00/2.85k [00:00<?, ?B/s]model.safetensors:   0%|          | 0.00/599M [00:00<?, ?B/s]
Device set to use cuda:0

[{'label': 'business-and-industrial', 'score': nan}]
prithivMLmods pinned discussion

bro i am trying to make a text formatter that can take in unformatted text and give proper markdown text but i am new to this field and unable to use you model can u give me a
script as example that uses you model and give text formatting

Sign up or log in to comment