model下载下来后使用提供直接load的方式会出现错误

#2
by deleted - opened
deleted

Load model directly

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_path = "/home/kioedru/code/prepare_data/ProtT5"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

Exception Traceback (most recent call last)
Cell In[5], line 6
2 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4 model_path = "/home/kioedru/code/prepare_data/ProtT5"
----> 6 tokenizer = AutoTokenizer.from_pretrained(model_path)
7 model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

File ~/.conda/envs/PT_new/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:843, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
841 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
842 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 843 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
844 else:
845 if tokenizer_class_py is not None:

File ~/.conda/envs/PT_new/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2048, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2045 else:
2046 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 2048 return cls._from_pretrained(
2049 resolved_vocab_files,
2050 pretrained_model_name_or_path,
2051 init_configuration,
2052 *init_inputs,
2053 token=token,
2054 cache_dir=cache_dir,
...
583 "You're trying to run a Unigram model but you're file was trained with a different algorithm"
584 )
586 return tokenizer

Exception: You're trying to run a Unigram model but you're file was trained with a different algorithm
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

model下载下来后使用提供直接load的方式会出现错误

For me it works with T5Tokenizer, T5EncoderModel - But don't ask me why? I saw this question/issue several times but it was never answered.

Sign up or log in to comment