Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 956 column 3

by zerozj - opened Jan 23

Discussion

zerozj

Jan 23

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 956 column 3

billingsmoore

Owner Jan 23

Hi,

Can you please provide the context or code that produced this error?

Thanks!

zerozj

Jan 24

Hi,

Can you please provide the context or code that produced this error?

Thanks!

Thank you very much for your reply. I don't know why it works now, but the translation is not very accurate, such as the translation result of 'བདེ་མོ' is 'a collection of advice like garlanded beams of nectar from the moon of'.

billingsmoore

Owner Jan 24

Yes, the translations in this dataset aren't great.

I recommend that you instead use "billingsmoore/tibetan-to-english-translation-dataset".

It's a smaller dataset but it's much higher quality.

Let me know if you have any other issues or questions!

zerozj

Jan 24

Yes, the translations in this dataset aren't great.

I recommend that you instead use "billingsmoore/tibetan-to-english-translation-dataset".

It's a smaller dataset but it's much higher quality.

Let me know if you have any other issues or questions!

I have a problem with my usage. Is this model for literature and Buddhism rather than daily life?

billingsmoore

Owner Jan 24

Yes, the datasets that are currently available on my page are extracted from Buddhist texts and the models on my page have been trained on that data.

For daily life translations, I recommend Monlam AI. Their model is a work in progress but is the best option available right now.

You can use their website here: https://monlam.ai/model/mt

If you are interested in training a model for daily life, I don't know of a high quality dataset that is currently available.

If you would like to stay up to date on Tibetan language machine translation, I recommend following the OpenPecha forum which you can find here:

https://forum.openpecha.org/

zerozj

Jan 24

Yes, the datasets that are currently available on my page are extracted from Buddhist texts and the models on my page have been trained on that data.

For daily life translations, I recommend Monlam AI. Their model is a work in progress but is the best option available right now.

You can use their website here: https://monlam.ai/model/mt

If you are interested in training a model for daily life, I don't know of a high quality dataset that is currently available.

If you would like to stay up to date on Tibetan language machine translation, I recommend following the OpenPecha forum which you can find here:

https://forum.openpecha.org/

Thank you very much for your suggestion. I am more concerned about the daily translation of Tibetan.

billingsmoore changed discussion status to closed Jan 29

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment