bernardo-de-almeida commited on
Commit
cc91ff8
·
verified ·
1 Parent(s): 63e97d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  - genomics
9
  - segmentation
10
  ---
11
- # segment-nt
12
 
13
  SegmentNT is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
14
  elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
@@ -92,7 +92,7 @@ print(f"Intron probabilities shape: {probabilities_intron.shape}")
92
 
93
  ## Training data
94
 
95
- The **segment-nt** model was trained on all human chromosomes except for chromosomes 20 and 21, kept as test set, and chromosome 22, used as a validation set.
96
  During training, sequences are randomly sampled in the genome with associated annotations. However, we keep the sequences in the validation and test set fixed by
97
  using a sliding window of length 30,000 over the chromosomes 20 and 21. The validation set was used to monitor training and for early stopping.
98
 
 
8
  - genomics
9
  - segmentation
10
  ---
11
+ # SegmentNT
12
 
13
  SegmentNT is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
14
  elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
 
92
 
93
  ## Training data
94
 
95
+ The **SegmentNT** model was trained on all human chromosomes except for chromosomes 20 and 21, kept as test set, and chromosome 22, used as a validation set.
96
  During training, sequences are randomly sampled in the genome with associated annotations. However, we keep the sequences in the validation and test set fixed by
97
  using a sliding window of length 30,000 over the chromosomes 20 and 21. The validation set was used to monitor training and for early stopping.
98