Text Classification
sentence-transformers
PyTorch
setfit
English
bert

librarian-bots/is_new_dataset_student_model

This is a SetFit model that can be used for text classification. The model is trained to predict whether a title + abstract for a paper on arXiv introduces a new dataset. The model was trained on Arxiv papers returned from the search dataset. The model, therefore, aims to disambiguate papers about datasets vs papers which introduce a new dataset. This model was trained through distillation training using a larger model librarian-bots/is_new_dataset_teacher_model.

Usage

To use this model for inference, first install the SetFit library:

python -m pip install setfit

You can then run inference as follows:

from setfit import SetFitModel

# Download from Hub and run inference
model = SetFitModel.from_pretrained("librarian-bots/is_new_dataset_student_model")
# Run inference
preds = model([Abstract + Title])

During model training, the text was formatted using the following format:

TITLE: title text
ABSTRACT: abstract text

You probably want to use the same format when running inference for this model.

BibTeX entry and citation info

To cite the SetFit approach used to train this model, please use this citation:

@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
20
Inference Examples
Inference API (serverless) does not yet support sentence-transformers models for this pipeline type.

Dataset used to train librarian-bots/is_new_dataset_student_model