Aman Singh Thakur's picture

1 2

Aman Singh Thakur

singh96aman

·

AI & ML interests

Responsible AI and Classical Machine Learning

Articles

𝗝𝘂𝗱𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗝𝘂𝗱𝗴𝗲𝘀: 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗻𝗴 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗮𝗻𝗱 𝗩𝘂𝗹𝗻𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗶𝗻 𝗟𝗟𝗠𝘀-𝗮𝘀-𝗝𝘂𝗱𝗴𝗲𝘀

Organizations

None yet

Posts 1

Post

2086

𝗝𝘂𝗱𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗝𝘂𝗱𝗴𝗲𝘀: 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗻𝗴 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗮𝗻𝗱 𝗩𝘂𝗹𝗻𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗶𝗻 𝗟𝗟𝗠𝘀-𝗮𝘀-𝗝𝘂𝗱𝗴𝗲𝘀
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges (2406.12624)

𝐂𝐚𝐧 𝐋𝐋𝐌𝐬 𝐬𝐞𝐫𝐯𝐞 𝐚𝐬 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐣𝐮𝐝𝐠𝐞𝐬 ⚖️?

We aim to identify the right metrics for evaluating Judge LLMs and understand their sensitivities to prompt guidelines, engineering, and specificity. With this paper, we want to raise caution ⚠️ to blindly using LLMs as human proxy.

Blog - https://huggingface.co./blog/singh96aman/judgingthejudges
Arxiv - https://arxiv.org/abs/2406.12624
Tweet - https://x.com/iamsingh96aman/status/1804148173008703509

@singh96aman @kartik727 @Srinik-1 @sankaranv @dieuwkehupkes

Papers 1

arxiv:2406.12624

models

None public yet

datasets

None public yet