kornwtp/ConGen-WangchanBERT-Small

This is a ConGen model: It maps sentences to a 128 dimensional dense vector space and can be used for tasks like semantic search.

Usage

Using this model becomes easy when you have ConGen installed:

pip install -U git+https://github.com/KornWtp/ConGen.git

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["กลุ่มผู้ชายเล่นฟุตบอลบนชายหาด", "กลุ่มเด็กชายกำลังเล่นฟุตบอลบนชายหาด"]

model = SentenceTransformer('kornwtp/ConGen-WangchanBERT-Small')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Thai Sentence Embeddings Benchmark: Semantic Textual Similarity

Citing & Authors

@inproceedings{limkonchotiwat-etal-2022-congen,
    title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation",
    author = "Limkonchotiwat, Peerat  and
      Ponwitayarat, Wuttikorn  and
      Lowphansirikul, Lalita and
      Udomcharoenchaikit, Can  and
      Chuangsuwanich, Ekapol  and
      Nutanong, Sarana",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}
Downloads last month
790
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.