license: apache-2.0
language:
- en
datasets:
- jondurbin/truthy-dpo-v0.1
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.
This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.
We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
Please see this notebook for an inference example.