openbmb
/

MiniCPM-o-2_6

Model card Files Files and versions Community

Cuiunbo commited on 5 days ago

Commit

9baca8a

·

1 Parent(s): 9808eca

update readme

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -496,7 +496,7 @@ Note: For proprietary models, we calculate token density based on the image enco
             <th>Size</th>
             <th colspan="3">ASR (zh)</th>
             <th colspan="3">ASR (en)</th>
-            <th colspan="2">ASR</th>
             <th>Emotion</th>
         </tr>
         <tr>
@@ -1101,7 +1101,7 @@ else:
 ### Audio-Only mode
 #### Mimick
-- In this task, you can see the models end-to-end  ability. MiniCPM-o 2.6 takes an audio input and produces both an automatic speech recognition (ASR) transcription and a voice imitation (TTS) output.
 ```python
 mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
 audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)

             <th>Size</th>
             <th colspan="3">ASR (zh)</th>
             <th colspan="3">ASR (en)</th>
+            <th colspan="2">AST</th>
             <th>Emotion</th>
         </tr>
         <tr>
 ### Audio-Only mode
 #### Mimick
+`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
 ```python
 mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
 audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)