pyannote.audio

non-profit

https://github.com/pyannote/pyannote-audio

pyannote

AI & ML interests

speaker diarization // speaker recognition // speaker segmentation // voice activity detection // overlapped speech detection // speaker change detection

Organization Card

Community About org cards

pyannote.audio is an open-source toolkit for speaker diarization.

Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.

Using it in production?
Consider switching to pyannoteAI for better and faster options.

Benchmark	v2.1	v3.1	pyannoteAI
AISHELL-4	14.1	12.2	11.2
AliMeeting (channel 1)	27.4	24.4	19.3
AMI (IHM)	18.9	18.8	15.8
AMI (SDM)	27.1	22.4	19.3
AVA-AVD	66.3	50.0	44.8
CALLHOME (part 2)	31.6	28.4	19.8
DIHARD 3 (full)	26.9	21.7	16.8
Earnings21	17.0	9.4	9.1
Ego4D (dev.)	61.5	51.2	44.0
MSDWild	32.8	25.3	19.8
RAMC	22.5	22.2	11.1
REPERE (phase2)	8.2	7.8	7.6
VoxConverse (v0.3)	11.2	11.3	9.8
Diarization error rate (in %)

Using high-end NVIDIA hardware,

v2.1 takes around 1m30s to process 1h of audio
v3.1 takes around 1m20s to process 1h of audio
On-premise pyannoteAI takes less than 30s to process 1h of audio

spaces 1

Pretrained pipelines

models 15

pyannote/speech-separation-ami-1.0

Updated Nov 11, 2024 • 8.69k • 52

pyannote/separation-ami-1.0

Updated Jul 16, 2024 • 10

pyannote/speaker-diarization-3.1

Automatic Speech Recognition • Updated May 10, 2024 • 9.89M • 657

pyannote/overlapped-speech-detection

Automatic Speech Recognition • Updated May 10, 2024 • 25.5k • 33

pyannote/speaker-segmentation

Automatic Speech Recognition • Updated May 10, 2024 • 648 • 29

pyannote/voice-activity-detection

Automatic Speech Recognition • Updated May 10, 2024 • 250k • 172

pyannote/segmentation

Voice Activity Detection • Updated May 10, 2024 • 6.58M • 525

pyannote/speaker-diarization

Automatic Speech Recognition • Updated May 10, 2024 • 5.98M • 927

pyannote/speaker-diarization-3.0

Automatic Speech Recognition • Updated May 10, 2024 • 2.07M • 174

pyannote/embedding

Updated May 10, 2024 • 375k • 126

datasets

None public yet