End-to-end Neural Diarization (EEND) trained on AMI-headset dataset.
This example could be found at egs2/ami/diar1
.
Configurations:
- Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks.
- Follow the frame concatenation and subsampling strategy described in paper [[2]]. Each frame is concatenated with the preceding and following 7 frames, followed by subsampling with a factor of 10. As a result, a 345-dimensional acoustic feature (23 × 15) is extracted for each 100 ms.
- Training and testing are performed exclusively on data with 4 speakers.
- Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings.
- The training process spans 500 epochs.
- Detailed configurations are defined in
exp/diar/train_diar_diar_raw/config.yaml
.
RESULTS
Environments
- date:
Thu Dec 19 22:03:53 EST 2024
- python version:
3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]
- espnet version:
espnet 202409
- pytorch version:
pytorch 2.4.0
- Git hash:
c12b3d59ca4fd8847edf274e56a1716474d2a30e
- Commit date:
Thu Dec 19 21:58:26 2024 -0500
- Commit date:
diar_train_diar_raw
DER
diarized_test
threshold_median_collar | DER |
---|---|
result_th0.3_med11_collar0.0 | 71.73 |
result_th0.3_med1_collar0.0 | 74.62 |
result_th0.4_med11_collar0.0 | 70.10 |
result_th0.4_med1_collar0.0 | 71.98 |
result_th0.5_med11_collar0.0 | 70.57 |
result_th0.5_med1_collar0.0 | 72.44 |
result_th0.6_med11_collar0.0 | 72.64 |
result_th0.6_med1_collar0.0 | 74.63 |
result_th0.7_med11_collar0.0 | 76.52 |
result_th0.7_med1_collar0.0 | 78.41 |
diar_train_diar_raw
DER
diarized_dev
threshold_median_collar | DER |
---|---|
result_th0.3_med11_collar0.0 | 75.88 |
result_th0.3_med1_collar0.0 | 78.21 |
result_th0.4_med11_collar0.0 | 71.45 |
result_th0.4_med1_collar0.0 | 73.32 |
result_th0.5_med11_collar0.0 | 70.53 |
result_th0.5_med1_collar0.0 | 72.34 |
result_th0.6_med11_collar0.0 | 72.03 |
result_th0.6_med1_collar0.0 | 73.96 |
result_th0.7_med11_collar0.0 | 76.66 |
result_th0.7_med1_collar0.0 | 78.33 |