Model Card for ANGEL_bc5cdr

This model card provides detailed information about the ANGEL_bc5cdr model, designed for biomedical entity linking.

Model Details

Model Description

  • Developed by: Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
  • Model type: Generative Biomedical Entity Linking Model
  • Language(s): English
  • License: GPL-3.0
  • Finetuned from model: BART-large (Base architecture)

Model Sources

Direct Use

ANGEL_bc5cdr is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within BC5CDR datasets. To use this model, you need to set up a virtual environment and the inference code. Start by cloning our ANGEL GitHub repository. Then, run the following script to set up the environment:

bash script/environment/set_environment.sh

Then, if you want to run the model on a single sample, no preprocessing is required. Simply execute the run_sample.sh script:

bash script/inference/run_sample.sh bc5cdr

To modify the sample with your own example, refer to the Direct Use section in our GitHub repository. If you're interested in training or evaluating the model, check out the Fine-tuning section and Evaluation section.

Training

Training Data

The model was trained on the BC5CDR dataset, which includes annotated disease entities.

Training Procedure

Positive-only Pre-training: Initial training using only positive examples, following the standard approach. Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.

Evaluation

Testing Data

The model was evaluated using BC5CDR dataset.

Metrics

Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.

Scores

Dataset BioSYN
(Sung et al., 2020)
SapBERT
(Liu et al., 2021)
GenBioEL
(Yuan et al., 2022b)
ANGEL
(Ours)
BC5CDR - - 93.1 94.5
The scores of GenBioEL were reproduced.

We excluded the performance of BioSYN and SapBERT, as they were evaluated separately on the chemical and disease subsets, differing from our settings.

Citation

If you use the ANGEL_bc5cdr model, please cite:

@article{kim2024learning,
  title={Learning from Negative Samples in Generative Biomedical Entity Linking},
  author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2408.16493},
  year={2024}
}

Contact

For questions or issues, please contact [email protected].

Downloads last month
11
Inference API
Unable to determine this model's library. Check the docs .

Model tree for dmis-lab/ANGEL_bc5cdr

Finetuned
(4)
this model

Collection including dmis-lab/ANGEL_bc5cdr