--- language: - zh license: cc-by-nc-sa-4.0 library_name: transformers tags: - audio - automatic-speech-recognition widget: - example_title: Model Introduction src: https://huggingface.co./andybi7676/cool-whisper-hf/resolve/main/sample1.weba pipeline_tag: automatic-speech-recognition --- # Cool-Whisper ### Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee [![arXiv](https://img.shields.io/badge/arXiv-Paper-color.svg)](https://arxiv.org/abs/2407.10603) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZikUWKch78Jv3Yw7LtUKUn4wMrFCx6lD?usp=sharing) > ⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience. > ⚠️ 因為隱私安全疑慮,本模型將暫時下架。非常抱歉造成大家困擾。 ## Introduction * Cool-whisper is a distilled version of Whisper, mainly focused on **Mandarin-English** code-switching ASR for people in Taiwan. * We use 60,000 hours of **unlabeled** audio to train the model. * Practically, we utilize *knowledge* not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base). ## Basic Usage ``` python import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline from datasets import load_dataset device = f"cuda" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 model_id = "andybi7676/cool-whisper-hf" model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, use_safetensors=True ) processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, max_new_tokens=256, return_timestamps=True, torch_dtype=torch_dtype, device=device, ) dataset = load_dataset("andybi7676/ntuml2021_long", "default", split="test") sample = dataset[0]["audio"] # or your own audio path # sample = "/your/path/to/audio.wav" result = pipe(sample) print("Basic Result: ") print(result["text"]) # result with timestamps print("\nResult with timestamps: ") for chunk in result['chunks']: print(chunk) ``` ## Faster-Whisper Support [Faster-Whisper](https://github.com/SYSTRAN/faster-whisper) is a commonly used tool to accelerate the transcription generation speed based on [CTranslate2](https://github.com/OpenNMT/CTranslate2/). We also deploy our model in the form of CTranslate2 to allow using it in faster-whisper. Please visit [cool-whisper](https://huggingface.co./andybi7676/cool-whisper) for more details.