How to finetune using DPO?

#31

by Maverick17 - opened Nov 12

Nov 12

Hello,

I have a standard DPO dataset with columns for images, rejected points, and chosen points, containing 2D coordinates for GUI visual grounding tasks. What prompt format is needed to correctly train the model using the DPO technique? The paper mentions that a 2D PixMo-Points dataset was used to train the model, but could you clarify the exact approach?

amanrangapur

Ai2 org Nov 14

Hello @Maverick17 , we are releasing paper with complete details of dataset, training and evaluation shortly.

Maverick17

Nov 14

Hello @amanrangapur , shortly means by the end of this week or by the end of november? :)

I'm really looking forward to the release of the dataset, training and eval. scripts!

amanrangapur

Ai2 org Nov 15

Hi @Maverick17 , I mean last week of November..

Maverick17

30 days ago

Hello @amanrangapur , what is the state of data release? We are entering the end of November :)

amanrangapur

Ai2 org 30 days ago

Hey @Maverick17 , we're planning to release this week. Stay tuned.

Maverick17

21 days ago

@amanrangapur Seems you guys are still not ready...

amanrangapur

Ai2 org 21 days ago

•

edited 13 days ago

Hi @Maverick17 , dataset is out(subset) check this: https://huggingface.co./collections/allenai/pixmo-674746ea613028006285687b
Training, evals, checkpoints are here: https://github.com/allenai/molmo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment