--- license: apache-2.0 library_name: span-marker tags: - span-marker - token-classification - ner - named-entity-recognition pipeline_tag: token-classification widget: - text: "X-Linked adrenoleukodystrophy (ALD) is a genetic disease associated with demyelination of the central nervous system, adrenal insufficiency, and accumulation of very long chain fatty acids in tissue and body fluids." example_title: "Example 1" - text: "Canavan disease is inherited as an autosomal recessive trait that is caused by the deficiency of aspartoacylase (ASPA)." example_title: "Example 2" - text: "However, both models lack other frequent DM symptoms including the fibre-type dependent atrophy, myotonia, cataract and male-infertility." example_title: "Example 3" model-index: - name: SpanMarker w. bert-base-cased on NCBI Disease by Tom Aarsen results: - task: type: token-classification name: Named Entity Recognition dataset: type: ncbi_disease name: NCBI Disease split: test revision: acd0e6451198d5b615c12356ab6a05fff4610920 metrics: - type: f1 value: 0.8813 name: F1 - type: precision value: 0.8661 name: Precision - type: recall value: 0.8971 name: Recall datasets: - ncbi_disease language: - en metrics: - f1 - recall - precision --- # SpanMarker for Disease Named Entity Recognition This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [ncbi_disease](https://huggingface.co./datasets/ncbi_disease) dataset. In particular, this SpanMarker model uses [bert-base-cased](https://huggingface.co./bert-base-cased) as the underlying encoder. See [train.py](train.py) for the training script. ## Metrics This model achieves the following results on the testing set: - Overall Precision: 0.8661 - Overall Recall: 0.8971 - Overall F1: 0.8813 - Overall Accuracy: 0.9837 ## Labels | **Label** | **Examples** | |-----------|--------------| | DISEASE | "ataxia-telangiectasia", "T-cell leukaemia", "C5D", "neutrophilic leukocytosis", "pyogenic infection" | ## Usage To use this model for inference, first install the `span_marker` library: ```bash pip install span_marker ``` You can then run inference with this model like so: ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-ncbi-disease") # Run inference entities = model.predict("Canavan disease is inherited as an autosomal recessive trait that is caused by the deficiency of aspartoacylase (ASPA).") ``` See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:| | 0.0038 | 1.41 | 300 | 0.0059 | 0.8141 | 0.8579 | 0.8354 | 0.9818 | | 0.0018 | 2.82 | 600 | 0.0054 | 0.8315 | 0.8720 | 0.8513 | 0.9840 | ### Framework versions - SpanMarker 1.2.4 - Transformers 4.31.0 - Pytorch 1.13.1+cu117 - Datasets 2.14.3 - Tokenizers 0.13.2