---
license: cc-by-nc-4.0
---
# ECAPA2 Speaker Embedding Extractor
Link to paper: [ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings](https://arxiv.org/abs/2401.08342).
ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings.
The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features. More information can be found in our original ECAPA2 paper.
## Usage Guide
### Download model
You need to install the `huggingface_hub` package to download the ECAPA2 model:
```bash
pip install --upgrade huggingface_hub
```
Or with Conda:
```bash
conda install -c conda-forge huggingface_hub
```
Download model:
```python
from huggingface_hub import hf_hub_download
# automatically checks for cached file, optionally set `cache_dir` location
model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='ecapa2.pt', cache_dir=None)
```
### Speaker Embedding Extraction
Extracting speaker embeddings is easy and only requires a few lines of code:
```python
import torch
import torchaudio
ecapa2 = torch.jit.load(model_file, map_location='cpu')
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
embedding = ecapa2(audio)
```
For faster, 16-bit half-precision CUDA inference (recommended):
```python
import torch
import torchaudio
ecapa2 = torch.jit.load(model_file, map_location='cuda')
ecapa2.half() # optional, but results in faster inference
audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected
embedding = ecapa2(audio)
```
The initial calls to the JIT-model can in some cases take a very long time because of optimization attempts of the compiler. If you have issues, the JIT-optimizer can be disabled as following:
```python
with torch.jit.optimized_execution(False):
embedding = ecapa2(audio)
```
There is no need for `ecapa2.eval()` or `torch.no_grad()`, this is done automatically.
## Citation
**BibTeX:**
```
@INPROCEEDINGS{ecapa2,
author={Jenthe Thienpondt and Kris Demuynck},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings},
year={2023},
volume={},
number={}
}
```
**APA:**
```
Jenthe Thienpondt, Kris Demuynck (2023). ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
```
## Contact
Name: Jenthe Thienpondt\
E-mail: jenthe.thienpondt@ugent.be