livekit/turn-detector · Inquiry on Upcoming Language Support and Fine-Tuning Feasibility

about 1 month ago

Hi LiveKit team! 👋

First, thank you for your incredible work on the livekit/turn-detector model and the open-source ecosystem around it. The end-of-turn detection capabilities have been a game-changer for our conversational AI projects, especially with the improved accuracy over traditional VAD methods.

I wanted to ask about your plans for expanding language support. I recall seeing a post on X.com suggesting that multilingual support is in the pipeline for the near future. Could you share any updates on this? Many communities would greatly benefit from non-English implementations, and we’re eager to know timelines or prioritized languages.

Additionally, if broader language support isn’t imminent, is it feasible to fine-tune the current model on a custom language corpus?

For instance:

Training Requirements: What dataset format/size is recommended (e.g., conversational transcripts with turn boundaries labeled)?

Annotation Guidelines: Are specific metadata or annotations (e.g., silence duration, speaker roles) needed for training?

Architecture Constraints: Does the ONNX-based inference setup 68 allow for fine-tuning, or would adjustments to the model architecture be necessary?

We’re prepared to collaborate on preparing a training set for our target language and would appreciate guidance on best practices.

Thanks again for your transparency and dedication to advancing real-time communication tools! Looking forward to your insights.

Amr-khaled

16 days ago

Hi johndili, I hope you are doing well,

I wanted to ask if you tried to train your own EOU (bert based) classification model instead of being willing to fine-tune "livekit/turn-detector",
Maybe using some bert-like (encoder-only) model of your language? I want to hear your thoughts about that, Thanks. 🥰

johndili

16 days ago

Hi mate,

It's a great idea !
I wonder how it would behave in production though.

You recommend on training the model from scratch or just fine tuning a pre-trained BERT ?

Thanks