Semantic similarity between two texts !

#63
by Systeme - opened

Hello, I have created a small PHP script that utilizes the API with the model https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2, designed to evaluate the percentage of semantic similarity between two French texts. However, the results I often receive are around 50%, which doesn't seem very relevant, especially when the texts express the same idea with different words. Typically, I would expect scores between 75 and 100%. Do you think it would be better to use a different model, or are there adjustments that could be made to this model to improve the results?

Hello Tom,
Thank you, Tom, for your response. I have chosen these two models for their good efficiency.

https://api-inference.huggingface.co/models/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
https://api-inference.huggingface.co/models/intfloat/multilingual-e5-small

$text1 = "Au dernier trimestre, il y eu de nouvelles technologies.";
$text2 = "Au cours des trois derniers mois, des technologies innovantes sont apparues.";

When applying the same text to these models, I still observe differences in the results obtained.
The intfloat/multilingual-e5-small model displays a similarity score of 94%, while the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 model reaches a score of 78%.

Unfortunately, the precision of interpretation remains inferior to that achieved by human analysis.
Thanks again for your help.
Marc

Sign up or log in to comment