meta-llama/LlamaGuard-7b · Replicating AUPRC of 0.624 in ToxiChat: Understanding Model Inference

Dec 21, 2023

Hello! 👋

I've been working on replicating the AUPRC score of 0.62 achieved in ToxiChat using the LlamaGuard-7b model, and I'm seeking some guidance on the specifics of the process. I would greatly appreciate insights from anyone familiar with the model or the dataset.

Prompt Configuration: Could someone shed light on the ideal prompt used for the ToxiChat model? What kind of inputs or context tend to yield optimal results?
Dataset Handling: How is the ToxiChat dataset treated during inference? Any special preprocessing steps or considerations that contribute to the model's success?
Probability for "safe" and "unsafe": Could someone share insights on how to extract the probabilities associated with the predictions of the words "safe" and "unsafe" from the model's output? Right now, I obtain the logits of the words "safe" and "unsafe" and pass them through a softmax to obtain the probabilities that add up to 1.
Probability Thresholds for AUPRC: In calculating the AUPRC, which probability thresholds are used to determine positive and negative predictions?

Looking forward to a fruitful discussion!

Thank you! 🙌

Neuraugment

Feb 12

Same boat here, any progress?

paurue

Mar 7

Same here. Interested in your progress so far, @JaimeUPM and @Neuraugment .