Using the model locally without calling hugginface API

#20
by Mincookie - opened

Hi there, I'm currently attempting to vectorize strings locally by downloading all the files from the hugging face repo. Which JSON file should I be referencing and how should I be setting up the arguments?

I've currently setup a testing scratch file like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

json_file = ["config.json", "data_config.json", "modules.json", "sentence_bert_config.json", "special_tokens_map.json", "tokenizer.json", "tokenizer_config.json"]

for json in json_file:
    try:
        model = SentenceTransformer(rf'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\{json}')
    except Exception as e:
        print(e)
        print(json)
        print("")
        continue

    embeddings = model.encode(sentences)
    print(embeddings)

But all attempts to test the JSON files available in the repository return different error messages:

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
config.json

list indices must be integers or slices, not str
data_config.json

list indices must be integers or slices, not str
modules.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\sentence_bert_config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\sentence_bert_config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
sentence_bert_config.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\special_tokens_map.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\special_tokens_map.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
special_tokens_map.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
tokenizer.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer_config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer_config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
tokenizer_config.json

Which JSON should be used and how should the SentenceTransformer arguments be setup if I'm running this locally?

Edit, I found a way to save the loaded model manually before reloading it from the local location.

https://stackoverflow.com/questions/65419499/download-pre-trained-sentence-transformers-model-locally

Mincookie changed discussion status to closed

Sign up or log in to comment