whisper-large-v3-ca-3catparla / README.md

Fixing a typo

8ba1522 verified 14 days ago

10.6 kB

	---
	language: ca
	datasets:
	- projecte-aina/3catparla_asr
	tags:
	- audio
	- automatic-speech-recognition
	- catalan
	- whisper-large-v3
	- projecte-aina
	- barcelona-supercomputing-center
	- bsc
	license: apache-2.0
	model-index:
	- name: whisper-large-v3-ca-3catparla
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: 3CatParla (Test)
	type: projecte-aina/3catparla_asr
	split: test
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 0.96
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: 3CatParla (Dev)
	type: projecte-aina/3catparla_asr
	split: dev
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 0.92
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Mozilla Common Voice 17.0 (Test)
	type: mozilla-foundation/common_voice_17_0
	split: test
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 10.32
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Mozilla Common Voice 17.0 (Dev)
	type: mozilla-foundation/common_voice_17_0
	split: validation
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 9.26
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Balearic fem)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Balearic female
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 12.25
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Balearic male)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Balearic male
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 12.18
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Central fem)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Central female
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 8.51
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Central male)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Central male
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 8.73
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Northern fem)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Northern female
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 8.09
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Northern male)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Northern male
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 8.28
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Northwestern fem)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Northwestern female
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 7.88
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Northwestern male)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Northwestern male
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 8.44
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Valencian fem)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Valencian female
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 9.58
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: CV Benchmark Catalan Accents (Valencian male)
	type: projecte-aina/commonvoice_benchmark_catalan_accents
	split: Valencian male
	args:
	language: ca
	metrics:
	- name: WER
	type: wer
	value: 9.1
	library_name: transformers
	---
	# whisper-large-v3-ca-3catparla

	## Table of Contents
	<details>
	<summary>Click to expand</summary>

	- [Model Description](#model-description)
	- [Intended Uses and Limitations](#intended-uses-and-limitations)
	- [How to Get Started with the Model](#how-to-get-started-with-the-model)
	- [Training Details](#training-details)
	- [Citation](#citation)
	- [Additional Information](#additional-information)

	</details>

	## Summary

	The "whisper-large-v3-ca-3catparla" is an acoustic model based on ["openai/whisper-large-v3"](https://huggingface.co./openai/whisper-large-v3) suitable for Automatic Speech Recognition in Catalan.

	## Model Description

	The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co./openai/whisper-large-v3) with 710 hours of Catalan data released by the [Projecte AINA](https://projecteaina.cat/) from Barcelona, Spain.

	## Intended Uses and Limitations

	This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.

	## How to Get Started with the Model

	To see an updated and functional version of this code, please see our our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)

	### Installation

	In order to use this model, you may install [datasets](https://huggingface.co./docs/datasets/installation) and [transformers](https://huggingface.co./docs/transformers/installation):

	Create a virtual environment:
	```bash
	python -m venv /path/to/venv
	```
	Activate the environment:
	```bash
	source /path/to/venv/bin/activate
	```
	Install the modules:
	```bash
	pip install datasets transformers
	```

	### For Inference
	In order to transcribe audio in Catalan using this model, you can follow this example:

	```bash
	#Install Prerequisites
	pip install torch
	pip install datasets
	pip install 'transformers[torch]'
	pip install evaluate
	pip install jiwer
	```

	```python
	#This code works with GPU

	#Notice that: load_metric is no longer part of datasets.
	#you have to remove it and use evaluate's load instead.
	#(Note from November 2024)

	import torch
	from transformers import WhisperForConditionalGeneration, WhisperProcessor

	#Load the processor and model.
	MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
	processor = WhisperProcessor.from_pretrained(MODEL_NAME)
	model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")

	#Load the dataset
	from datasets import load_dataset, load_metric, Audio
	ds=load_dataset("projecte-aina/3catparla_asr",split='test')

	#Downsample to 16kHz
	ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

	#Process the dataset
	def map_to_pred(batch):
	audio = batch["audio"]
	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])

	with torch.no_grad():
	predicted_ids = model.generate(input_features.to("cuda"))[0]

	transcription = processor.decode(predicted_ids)
	batch["prediction"] = processor.tokenizer._normalize(transcription)

	return batch

	#Do the evaluation
	result = ds.map(map_to_pred)

	#Compute the overall WER now.
	from evaluate import load

	wer = load("wer")
	WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
	print(WER)
	```
	Test Result: 0.96

	## Training Details

	### Training data

	The specific dataset used to create the model is called ["3CatParla"](https://huggingface.co./datasets/projecte-aina/3catparla_asr).

	### Training procedure

	This model is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co./openai/whisper-large-v3) by following this [tutorial](https://huggingface.co./blog/fine-tune-whisper) provided by Hugging Face.

	### Training Hyperparameters

	* language: catalan
	* hours of training audio: 710
	* learning rate: 1.95e-07
	* sample rate: 16000
	* train batch size: 32 (x4 GPUs)
	* gradient accumulation steps: 1
	* eval batch size: 32
	* save total limit: 3
	* max steps: 19842
	* warmup steps: 1984
	* eval steps: 3307
	* save steps: 3307
	* shuffle buffer size: 480

	## Citation
	If this model contributes to your research, please cite the work:
	```bibtex
	@misc{mena2024whisperlarge3catparla,
	title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
	author={Hernandez Mena, Carlos Daniel; Armentano-Oller, Carme; Solito, Sarah; Külebi, Baybars},
	organization={Barcelona Supercomputing Center},
	url={https://huggingface.co./projecte-aina/whisper-large-v3-ca-3catparla},
	year={2024}
	}
	```

	## Additional Information

	### Author

	The fine-tuning process was perform during July (2024) in the [Language Technologies Unit](https://huggingface.co./BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co./carlosdanielhernandezmena).

	### Contact
	For further information, please send an email to <[email protected]>.

	### Copyright
	Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.

	### License

	[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)

	### Funding
	This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).

	The training of the model was possible thanks to the compute time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.