Post
3708
๐ Releasing a new series of 8 zeroshot classifiers: better performance, fully commercially useable thanks to synthetic data, up to 8192 tokens, run on any hardware.
Summary:
๐ค The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
๐ฆพ The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
๐ The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
๐ชถ The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
๐ค The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
๐ค For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.
Next steps:
โ๏ธ I'll publish a blog post with more details soon
๐ฎ There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.
All models are available for download in this Hugging Face collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543
Summary:
๐ค The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
๐ฆพ The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
๐ The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
๐ชถ The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
๐ค The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
๐ค For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.
Next steps:
โ๏ธ I'll publish a blog post with more details soon
๐ฎ There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.
All models are available for download in this Hugging Face collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543