File size: 6,807 Bytes
8b4da8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: apache-2.0
base_model: openai/whisper-tiny
tags:
- generated_from_trainer
metrics:
- wer
model-index:
- name: whisper-multi-diar-wer
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-multi-diar-wer

This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co./openai/whisper-tiny) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 10.0666
- Wer: 582.8864
- Cer: 168.6522
- Speech Scored: 693831
- Speech Miss: 52252
- Speech Falarm: 117030
- Speaker Miss: 52252
- Speaker Falarm: 117030
- Speaker Error: 187216
- Speaker Correct: 1437.5240
- Diarization Error: 356498
- Frames: 600
- Speaker Wide Frames: 746083
- Speech Scored Ratio: 1156.385
- Speech Miss Ratio: 87.0867
- Speech Falarm Ratio: 195.05
- Speaker Correct Ratio: 2.3959
- Speaker Miss Ratio: 0.0700
- Speaker Falarm Ratio: 0.1569
- Speaker Error Ratio: 0.2509
- Diarization Error Ratio: 0.4778

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 48
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer      | Cer      | Speech Scored | Speech Miss | Speech Falarm | Speaker Miss | Speaker Falarm | Speaker Error | Speaker Correct | Diarization Error | Frames | Speaker Wide Frames | Speech Scored Ratio | Speech Miss Ratio | Speech Falarm Ratio | Speaker Correct Ratio | Speaker Miss Ratio | Speaker Falarm Ratio | Speaker Error Ratio | Diarization Error Ratio |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-------------:|:-----------:|:-------------:|:------------:|:--------------:|:-------------:|:---------------:|:-----------------:|:------:|:-------------------:|:-------------------:|:-----------------:|:-------------------:|:---------------------:|:------------------:|:--------------------:|:-------------------:|:-----------------------:|
| 11.3437       | 1.0   | 42   | 10.7905         | 574.9471 | 166.1650 | 743633        | 2450        | 150302        | 2450         | 150302         | 202641        | 1427.9773       | 355393            | 600    | 746083              | 1239.3883           | 4.0833            | 250.5033            | 2.3800                | 0.0033             | 0.2015               | 0.2716              | 0.4763                  |
| 10.3627       | 2.0   | 84   | 10.4901         | 578.0875 | 167.1479 | 735121        | 10962       | 136397        | 10962        | 136397         | 201434        | 1433.1820       | 348793            | 600    | 746083              | 1225.2017           | 18.27             | 227.3283            | 2.3886                | 0.0147             | 0.1828               | 0.2700              | 0.4675                  |
| 9.9444        | 3.0   | 126  | 10.3015         | 569.6851 | 166.4943 | 715188        | 30895       | 127221        | 30895        | 127221         | 194291        | 1435.5347       | 352407            | 600    | 746083              | 1191.98             | 51.4917           | 212.035             | 2.3926                | 0.0414             | 0.1705               | 0.2604              | 0.4723                  |
| 9.7658        | 4.0   | 168  | 10.2071         | 572.0536 | 166.8688 | 706081        | 40002       | 122962        | 40002        | 122962         | 191357        | 1436.2147       | 354321            | 600    | 746083              | 1176.8017           | 66.67             | 204.9367            | 2.3937                | 0.0536             | 0.1648               | 0.2565              | 0.4749                  |
| 9.5093        | 5.0   | 210  | 10.1640         | 572.3712 | 166.9189 | 703250        | 42833       | 121335        | 42833        | 121335         | 190255        | 1436.8813       | 354423            | 600    | 746083              | 1172.0833           | 71.3883           | 202.225             | 2.3948                | 0.0574             | 0.1626               | 0.2550              | 0.4750                  |
| 9.3069        | 6.0   | 252  | 10.1287         | 573.2534 | 167.0644 | 700202        | 45881       | 119938        | 45881        | 119938         | 189349        | 1436.9886       | 355168            | 600    | 746083              | 1167.0033           | 76.4683           | 199.8967            | 2.3950                | 0.0615             | 0.1608               | 0.2538              | 0.4760                  |
| 9.2209        | 7.0   | 294  | 10.1009         | 582.8864 | 168.6522 | 698009        | 48074       | 118866        | 48074        | 118866         | 188639        | 1437.1880       | 355579            | 600    | 746083              | 1163.3483           | 80.1233           | 198.11              | 2.3953                | 0.0644             | 0.1593               | 0.2528              | 0.4766                  |
| 9.0761        | 8.0   | 336  | 10.0912         | 582.8864 | 168.6522 | 695719        | 50364       | 117834        | 50364        | 117834         | 187684        | 1437.6227       | 355882            | 600    | 746083              | 1159.5317           | 83.94             | 196.39              | 2.3960                | 0.0675             | 0.1579               | 0.2516              | 0.4770                  |
| 8.9928        | 9.0   | 378  | 10.0654         | 582.8864 | 168.6522 | 694031        | 52052       | 117145        | 52052        | 117145         | 187295        | 1437.4753       | 356492            | 600    | 746083              | 1156.7183           | 86.7533           | 195.2417            | 2.3958                | 0.0698             | 0.1570               | 0.2510              | 0.4778                  |
| 8.9674        | 10.0  | 420  | 10.0666         | 582.8864 | 168.6522 | 693831        | 52252       | 117030        | 52252        | 117030         | 187216        | 1437.5240       | 356498            | 600    | 746083              | 1156.385            | 87.0867           | 195.05              | 2.3959                | 0.0700             | 0.1569               | 0.2509              | 0.4778                  |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.0.0
- Datasets 2.16.1
- Tokenizers 0.15.0