You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

whisper-swedish-telephonic

Model Overview

whisper-swedish-telephonic is a fine-tuned version of OpenAI's Whisper-Small model, specifically designed for transcribing Swedish telephonic audio. The model is optimized for low-bandwidth, multi-speaker conversations such as call center interactions.

Key Features:

  • Language: Swedish (primary), with limited support for minor English segments.
  • Audio Types: Telephonic conversations, customer support recordings, and general low-bandwidth audio.
  • Sample Rate: 8kHz (resampled to 16kHz internally).
  • Special Tokens: Supports conversational markers, disfluencies, and speaker-specific tags.
  • Performance: Demonstrates significantly improved transcription accuracy over the base model for telephonic speech.

Dataset

The model was fine-tuned using the Swedish Telephonic Dataset, consisting of:

  • Duration: ~97 hours of annotated audio.
  • Domains: Call center recordings, customer service conversations.
  • Annotations:
    • Speaker IDs and timestamps.
    • Conversational tags: (()), ~, <overlap>.
    • Language switching: <lang:English>...</lang:English>.

Preprocessing:

  • Audio: Resampled to 16kHz.
  • Segmentations: Aligned with timestamps.
  • Special Tokens: Includes non-speech sounds like [cough], [laugh].

Model Performance

Word Error Rate (WER) Evaluation

The fine-tuned model was benchmarked against OpenAI's base Whisper-Small model using a Swedish telephonic test dataset containing 207 labeled speech segments.

Metric Fine-Tuned Model Base Whisper-Small
WER 0.170 0.888

Key Observations:

  • Fine-Tuned Model:
    • Excellent transcription accuracy for colloquial Swedish, domain-specific terminology, and long utterances.
    • Handles speaker-specific annotations and conversational markers effectively.
  • Base Model:
    • Struggles with Swedish syntax and domain-specific vocabulary.
    • Outputs nonsensical transcriptions for longer or complex sentences.

Example Transcriptions

Segment Ground Truth Fine-Tuned Model Base Model WER (Fine-Tuned) WER (Base)
1 sรฅ nu sรฅ nu so, no 0.000 1.000
2 nu record du bรฅda va nu record du bรฅda va nu rekordar du bรฅda 0.000 0.400
3 ja jag kommer inte ihรฅg ja jag kommer inte ihรฅg i am coming to you 0.000 1.000
5 sen nรคr dรฅ, sen alltid... inga gรคster sen nรคr dรฅ, sen alltid... inga gรคster sen dรฅ, sen alltid... ingen gest 0.000 0.250
14 till frankrike till frankrike thank you 0.000 1.000

Note: Full segment-wise evaluation logs are available in the repository.


Audio Example

This audio file demonstrates the model's transcription abilities:

  • File: trimmed_resampled_audio.wav
  • Content: Hej du har kommit till Dressmann. Du pratar med Isabelle. Vad kan jag hjรคlpa dig?
  • Audio Type: Telephonic conversation.
  • Sample Rate: 16kHz (resampled).
  • Purpose: Showcasing the model's capabilities in transcribing Swedish telephonic speech.

Intended Use

This model is designed for:

  • Customer Support Automation: Transcription and analysis of call center recordings.
  • Telephony Analytics: Sentiment analysis, compliance monitoring, and business intelligence.
  • Swedish Language Research: Study of conversational patterns and colloquial expressions.

Limitations:

  • Language Support: Primarily Swedish; limited support for English.
  • Audio Quality: Optimized for telephonic audio; performance may degrade with studio-quality or highly noisy audio.
  • Preprocessing Requirement: Requires resampling non-8kHz audio to 16kHz.

Try the Model

You can test the model using the Hugging Face Playground or the dedicated endpoint:


How to Use

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import soundfile as sf

# Load model and processor
model_name = "WMRNORDIC/whisper-swedish-telephonic"
model = WhisperForConditionalGeneration.from_pretrained(model_name)
processor = WhisperProcessor.from_pretrained(model_name)

# Load and preprocess audio
audio, sample_rate = sf.read("path_to_audio.wav")
inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")

# Transcribe
generated_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)
Downloads last month
5
Safetensors
Model size
242M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for WMRNORDIC/whisper-swedish-telephonic

Finetuned
(2451)
this model

Dataset used to train WMRNORDIC/whisper-swedish-telephonic

Space using WMRNORDIC/whisper-swedish-telephonic 1

Evaluation results