hfw commited on
Commit
4d54644
·
1 Parent(s): a884350

update audio demo

Browse files
README.md CHANGED
@@ -1127,7 +1127,7 @@ else:
1127
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1128
  ```python
1129
  mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
1130
- audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
1131
  msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
1132
 
1133
  res = model.chat(
@@ -1155,7 +1155,7 @@ ref_audio, _ = librosa.load('assets/demo.wav', sr=16000, mono=True) # load the r
1155
 
1156
  Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant. (Stable and more suitable for general conversation)
1157
  sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
1158
- user_question = {'role': 'user', 'content': [librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # Try to ask something by recording it in 'xxx.wav'!!!
1159
  ```
1160
  ```python
1161
  msgs = [sys_prompt, user_question]
@@ -1205,8 +1205,8 @@ General Audio:
1205
  Audio Caption: Summarize the main content of the audio.
1206
  Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
1207
  '''
1208
- task_prompt = "" # Choose the task prompt above
1209
- audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
1210
 
1211
  msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
1212
 
 
1127
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1128
  ```python
1129
  mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
1130
+ audio_input, _ = librosa.load('assets/mimick.wav', sr=16000, mono=True)
1131
  msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
1132
 
1133
  res = model.chat(
 
1155
 
1156
  Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant. (Stable and more suitable for general conversation)
1157
  sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
1158
+ user_question = {'role': 'user', 'content': [librosa.load('assets/qa.wav', sr=16000, mono=True)[0]]} # Try to ask something by recording it in 'xxx.wav'!!!
1159
  ```
1160
  ```python
1161
  msgs = [sys_prompt, user_question]
 
1205
  Audio Caption: Summarize the main content of the audio.
1206
  Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
1207
  '''
1208
+ task_prompt = "Summarize the main content of the audio.\n" # Choose the task prompt above
1209
+ audio_input, _ = librosa.load('assets/audio_understanding.mp3', sr=16000, mono=True)
1210
 
1211
  msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
1212
 
assets/audio_understanding.mp3 ADDED
Binary file (321 kB). View file
 
assets/mimick.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbb0860cb4dd7c7003b6f0406299fc7c0febc5c6a990e1c670d29b763e84e7ed
3
+ size 384046