Commit
•
fce4f27
1
Parent(s):
b8248f0
finalise message
Browse files
app.py
CHANGED
@@ -178,19 +178,19 @@ if __name__ == "__main__":
|
|
178 |
gr.Markdown(
|
179 |
"""
|
180 |
One of the major claims of the <a href="https://arxiv.org/abs/2311.00430"> Distil-Whisper paper</a> is that
|
181 |
-
that Distil-Whisper hallucinates less than Whisper on long-form audio. To demonstrate this, we
|
182 |
-
transcriptions generated by <a href="https://huggingface.co/openai/whisper-large-v2">
|
183 |
-
and <a href="https://huggingface.co/distil-whisper/distil-large-v2">
|
184 |
<a href="https://huggingface.co/datasets/distil-whisper/tedlium-long-form"> TED-LIUM</a> validation set.
|
185 |
|
186 |
To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
|
187 |
of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Analysis is performed on the
|
188 |
-
overall level
|
189 |
on an individual example basis).
|
190 |
|
191 |
The transcriptions for both models are shown at the bottom of the demo. We compute a text difference for each
|
192 |
relative to the ground truth transcriptions. Insertions are displayed in <span style='background-color:Lightgreen'>green</span>,
|
193 |
-
and deletions in <span style='background-color:#FFCCCB'><s>red</s></span>. Multiple words in <span style='background-color:Lightgreen'>green</span>
|
194 |
indicates that a model has hallucinated, since it has inserted words not present in the ground truth transcription.
|
195 |
|
196 |
Overall, Distil-Whisper has roughly half the number of 5-Dup. and IER. This indicates that it has a lower
|
|
|
178 |
gr.Markdown(
|
179 |
"""
|
180 |
One of the major claims of the <a href="https://arxiv.org/abs/2311.00430"> Distil-Whisper paper</a> is that
|
181 |
+
that Distil-Whisper hallucinates less than Whisper on long-form audio. To demonstrate this, we analyse the
|
182 |
+
transcriptions generated by Whisper <a href="https://huggingface.co/openai/whisper-large-v2"> large-v2</a>
|
183 |
+
and Distil-Whisper <a href="https://huggingface.co/distil-whisper/distil-large-v2"> distil-large-v2</a> on the
|
184 |
<a href="https://huggingface.co/datasets/distil-whisper/tedlium-long-form"> TED-LIUM</a> validation set.
|
185 |
|
186 |
To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
|
187 |
of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Analysis is performed on the
|
188 |
+
<b>overall level</b>, where statistics are computed over the entire dataset, and also a <b>per-sample level</b> (i.e. an
|
189 |
on an individual example basis).
|
190 |
|
191 |
The transcriptions for both models are shown at the bottom of the demo. We compute a text difference for each
|
192 |
relative to the ground truth transcriptions. Insertions are displayed in <span style='background-color:Lightgreen'>green</span>,
|
193 |
+
and deletions in <span style='background-color:#FFCCCB'><s>red</s></span>. Multiple consecutive words in <span style='background-color:Lightgreen'>green</span>
|
194 |
indicates that a model has hallucinated, since it has inserted words not present in the ground truth transcription.
|
195 |
|
196 |
Overall, Distil-Whisper has roughly half the number of 5-Dup. and IER. This indicates that it has a lower
|