Which datasets are included in the NLI training data / NLI head?
Very interesting model and multi-task learning approach!
Which datasets were included for training the NLI classification head that is used for 0-shot classification? I understand that it is mostly this collection https://huggingface.co./datasets/tasksource/zero-shot-label-nli ? Was something else included for training the NLI head?
Did you use a binary head (entailment vs. contradiction+neutral)?
Hi, thank you! It's a threeway NLI head
I used weight sharing for allmost all threeway class, so the backbone+head was trained on dozens of NLI datasets (including label-nli
)
Thus, the checkpoint as it is loaded by huggingface can perform zero-shot classification (being trained to do so) and standard NLI
In addition, using tasknet.load_pipeline, you can change the head + a task embedding. By changing the head, you can directly predict for a new task (e.g. sentiment analysis). The task embeddings helps the model focus on a task. For NLI tasks, even if you don't change the head, you can change the task embedding with tasknet.load_pipeline, and make the model a bit more focused for entailment based zero shot NLI, for instance.
@sileod
, thank you for this model!
I have been playing around with this model for a while now and this is really interesting, im new to NLI so forgive my dumb questions xD
I have been using this model withing the hugging face pipeline, somewhat like this nlp = pipeline('zero shot classification', model = model_dir, token = token_dir). This has been working fine, but when i looked to improve this without finetuning, i stumbled upon this comment. How would one go about making this more specific tto lets say classification of some popular us holidays?
@automatron900
Hi, thank you for your kind words !
Improving the model is mostly done by a form of fine-tuning
I suggest using tasknet for easier experience https://github.com/sileod/tasknet
The main thing to do is to convert your dataset to a huggingface Dataset
For more info about how it works:
https://huggingface.co./docs/transformers/tasks/sequence_classification
Yup, that was the direction i was thinking in, however i am worried about overfitting the model. I could choose x instances of yay and nay for all classes. But how would you ensure model generalization abilities remain?
You should monitor validation accuracy
You can try to make the validation split as different as possible to the training split to emphasize generalization
Early stopping also help preserving generalization
You can even formulate your own data as a NLI task to make the most of the initial capabilities
Awesome!
I like the idea of creating an NLI task for this with my own dataset. Using tasknet is the way to do it?
Tasknet or this for specificallly fewshot https://github.com/Knowledgator/LiqFit
oh this very simple and perfect!
thank you!