--- language: sv tags: - speech - audio - automatic-speech-recognition --- ## Wav2Vec 2.0 XLSR Swedish Swedish version of Wav2Vec2.0 XLSR finetuned on NST Swedish Dictation and evaluated using Common Voice **WER**: 23.3% Does not work in the browser for some reason, but can be used as follows (code somewhat copied from Huggingface): ``` #!/usr/bin/env python3 from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC import soundfile as sf from sys import argv,exit import torch import transformers from os.path import basename if __name__ == '__main__': if len(argv) < 3: print(f'usage: {argv[0]} ') exit(1) processor = Wav2Vec2Processor.from_pretrained(argv[1]) model = Wav2Vec2ForCTC.from_pretrained(argv[1]) f = argv[2] s,sample_rate = sf.read(f) input_values = processor(s, return_tensors="pt").input_values logits = model(input_values).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.decode(predicted_ids[0]) print(transcription) ```