File size: 6,467 Bytes
8596eaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: apache-2.0
base_model: openai/whisper-small
tags:
- generated_from_trainer
model-index:
- name: whisper-small-diarization-0.2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-small-diarization-0.2

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co./openai/whisper-small) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4057
- Speech Scored: 802.6678
- Speech Miss: 209.2589
- Speech Falarm: 9.0497
- Speaker Miss: 412.5650
- Speaker Falarm: 185.8256
- Speaker Error: 153.1979
- Speaker Correct: 1198.4045
- Diarization Error: 751.5885
- Frames: 1500.0
- Speaker Wide Frames: 1564.7402
- Speech Scored Ratio: 0.5351
- Speech Miss Ratio: 0.1395
- Speech Falarm Ratio: 0.0060
- Speaker Correct Ratio: 0.7989
- Speaker Miss Ratio: 0.2382
- Speaker Falarm Ratio: 0.1203
- Speaker Error Ratio: 0.0831
- Diarization Error Ratio: 0.4416

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Speech Scored | Speech Miss | Speech Falarm | Speaker Miss | Speaker Falarm | Speaker Error | Speaker Correct | Diarization Error | Frames | Speaker Wide Frames | Speech Scored Ratio | Speech Miss Ratio | Speech Falarm Ratio | Speaker Correct Ratio | Speaker Miss Ratio | Speaker Falarm Ratio | Speaker Error Ratio | Diarization Error Ratio |
|:-------------:|:-----:|:----:|:---------------:|:-------------:|:-----------:|:-------------:|:------------:|:--------------:|:-------------:|:---------------:|:-----------------:|:------:|:-------------------:|:-------------------:|:-----------------:|:-------------------:|:---------------------:|:------------------:|:--------------------:|:-------------------:|:-----------------------:|
| 0.4753        | 1.0   | 225  | 0.4855          | 731.3740      | 280.5527    | 69.0541       | 614.8317     | 199.7986       | 151.0314      | 1127.7690       | 965.6617          | 1500.0 | 1564.7402           | 0.4876              | 0.1870            | 0.0460              | 0.7518                | 0.3772             | 0.1827               | 0.0796              | 0.6395                  |
| 0.4572        | 2.0   | 450  | 0.4857          | 642.0541      | 369.8727    | 29.7105       | 773.0375     | 89.8413        | 131.1212      | 1124.9596       | 994.0             | 1500.0 | 1564.7402           | 0.4280              | 0.2466            | 0.0198              | 0.7500                | 0.4566             | 0.0839               | 0.0709              | 0.6114                  |
| 0.4638        | 3.0   | 675  | 0.4618          | 748.0445      | 263.8823    | 43.8745       | 466.9677     | 356.8117       | 116.3025      | 1147.8718       | 940.0820          | 1500.0 | 1564.7402           | 0.4987              | 0.1759            | 0.0292              | 0.7652                | 0.2922             | 0.2364               | 0.0631              | 0.5917                  |
| 0.4423        | 4.0   | 900  | 0.4477          | 740.9529      | 270.9738    | 30.4473       | 509.6905     | 235.4464       | 132.9747      | 1162.9712       | 878.1116          | 1500.0 | 1564.7402           | 0.4940              | 0.1806            | 0.0203              | 0.7753                | 0.3072             | 0.1581               | 0.0712              | 0.5365                  |
| 0.4164        | 5.0   | 1125 | 0.4309          | 737.6809      | 274.2459    | 12.4037       | 512.4150     | 173.0157       | 138.8300      | 1178.9698       | 824.2607          | 1500.0 | 1564.7402           | 0.4918              | 0.1828            | 0.0083              | 0.7860                | 0.3010             | 0.1130               | 0.0754              | 0.4893                  |
| 0.3924        | 6.0   | 1350 | 0.4112          | 812.0453      | 199.8814    | 13.5414       | 382.1125     | 253.1543       | 140.2999      | 1194.7111       | 775.5667          | 1500.0 | 1564.7402           | 0.5414              | 0.1333            | 0.0090              | 0.7965                | 0.2235             | 0.1713               | 0.0755              | 0.4702                  |
| 0.3765        | 7.0   | 1575 | 0.4085          | 806.7515      | 205.1752    | 12.1369       | 405.6992     | 202.0323       | 149.4699      | 1197.7762       | 757.2014          | 1500.0 | 1564.7402           | 0.5378              | 0.1368            | 0.0081              | 0.7985                | 0.2361             | 0.1250               | 0.0829              | 0.4439                  |
| 0.3814        | 8.0   | 1800 | 0.4051          | 802.6016      | 209.3252    | 9.5911        | 398.2677     | 213.9582       | 144.1378      | 1199.8329       | 756.3636          | 1500.0 | 1564.7402           | 0.5351              | 0.1396            | 0.0064              | 0.7999                | 0.2367             | 0.1275               | 0.0794              | 0.4436                  |
| 0.3965        | 9.0   | 2025 | 0.4111          | 768.8736      | 243.0532    | 6.9250        | 474.9355     | 148.3069       | 146.9695      | 1194.2729       | 770.2119          | 1500.0 | 1564.7402           | 0.5126              | 0.1620            | 0.0046              | 0.7962                | 0.2742             | 0.0932               | 0.0806              | 0.4480                  |
| 0.4048        | 10.0  | 2250 | 0.4057          | 802.6678      | 209.2589    | 9.0497        | 412.5650     | 185.8256       | 153.1979      | 1198.4045       | 751.5885          | 1500.0 | 1564.7402           | 0.5351              | 0.1395            | 0.0060              | 0.7989                | 0.2382             | 0.1203               | 0.0831              | 0.4416                  |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.0.0
- Datasets 2.16.1
- Tokenizers 0.15.0