sentence-transformers/all-MiniLM-L6-v2 · Using the model locally without calling hugginface API

Jun 28, 2023

Hi there, I'm currently attempting to vectorize strings locally by downloading all the files from the hugging face repo. Which JSON file should I be referencing and how should I be setting up the arguments?

I've currently setup a testing scratch file like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

json_file = ["config.json", "data_config.json", "modules.json", "sentence_bert_config.json", "special_tokens_map.json", "tokenizer.json", "tokenizer_config.json"]

for json in json_file:
    try:
        model = SentenceTransformer(rf'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\{json}')
    except Exception as e:
        print(e)
        print(json)
        print("")
        continue

    embeddings = model.encode(sentences)
    print(embeddings)

But all attempts to test the JSON files available in the repository return different error messages:

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
config.json

list indices must be integers or slices, not str
data_config.json

list indices must be integers or slices, not str
modules.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\sentence_bert_config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\sentence_bert_config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
sentence_bert_config.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\special_tokens_map.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\special_tokens_map.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
special_tokens_map.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
tokenizer.json

Unable to load weights from pytorch checkpoint file for 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer_config.json' at 'C:\Users\MinCookie\Documents\git_repos\hyperDB\all-MiniLM-L6-v2\tokenizer_config.json'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
tokenizer_config.json

Which JSON should be used and how should the SentenceTransformer arguments be setup if I'm running this locally?

Mincookie

Jun 28, 2023

Edit, I found a way to save the loaded model manually before reloading it from the local location.

https://stackoverflow.com/questions/65419499/download-pre-trained-sentence-transformers-model-locally

Mincookie changed discussion status to closed Jun 28, 2023