update readme
Browse files
README.md
CHANGED
@@ -496,7 +496,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
|
496 |
<th>Size</th>
|
497 |
<th colspan="3">ASR (zh)</th>
|
498 |
<th colspan="3">ASR (en)</th>
|
499 |
-
<th colspan="2">
|
500 |
<th>Emotion</th>
|
501 |
</tr>
|
502 |
<tr>
|
@@ -1101,7 +1101,7 @@ else:
|
|
1101 |
|
1102 |
### Audio-Only mode
|
1103 |
#### Mimick
|
1104 |
-
|
1105 |
```python
|
1106 |
mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
|
1107 |
audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
|
|
|
496 |
<th>Size</th>
|
497 |
<th colspan="3">ASR (zh)</th>
|
498 |
<th colspan="3">ASR (en)</th>
|
499 |
+
<th colspan="2">AST</th>
|
500 |
<th>Emotion</th>
|
501 |
</tr>
|
502 |
<tr>
|
|
|
1101 |
|
1102 |
### Audio-Only mode
|
1103 |
#### Mimick
|
1104 |
+
`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
|
1105 |
```python
|
1106 |
mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
|
1107 |
audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
|