cl100k_base as transformers
GPT2 tokenizer
cl100k_base
vocab converted from tiktoken
to hf via this code by Xenova.
from transformers import GPT2TokenizerFast, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/cl100k_base")
# if issues, try GPT2TokenizerFast directly
details
GPT2TokenizerFast(
name_or_path="BEE-spoke-data/cl100k_base",
vocab_size=100261,
model_max_length=8192,
is_fast=True,
padding_side="right",
truncation_side="right",
special_tokens={
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"unk_token": "<|endoftext|>",
},
clean_up_tokenization_spaces=True,
added_tokens_decoder={
"100257": AddedToken(
"<|endoftext|>",
rstrip=False,
lstrip=False,
single_word=False,
normalized=False,
special=True,
),
"100258": AddedToken(
"<|fim_prefix|>",
rstrip=False,
lstrip=False,
single_word=False,
normalized=False,
special=True,
),
"100259": AddedToken(
"<|fim_middle|>",
rstrip=False,
lstrip=False,
single_word=False,
normalized=False,
special=True,
),
"100260": AddedToken(
"<|fim_suffix|>",
rstrip=False,
lstrip=False,
single_word=False,
normalized=False,
special=True,
),
"100276": AddedToken(
"<|endofprompt|>",
rstrip=False,
lstrip=False,
single_word=False,
normalized=False,
special=True,
),
},
)
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.