@rizzware on Hugging Face: "Question about LightEval 🤗: I've been searching for an LLM evaluation suite…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

rizzware

posted an update Aug 25

Post

516

Question about LightEval 🤗:

I've been searching for an LLM evaluation suite that can, out-of-the-box, compare the outputs of a model(s) without any enhancements vs. the same model with better prompt engineering, vs. the same model with RAG vs. the same model with fine-tuning.

I unfortunately have not found a tool that fits my exact description, but of course I ran into LightEval.

A huge pain-point of building large-scale projects that use LLMs is that prior to building an MVP, it is difficult to evaluate whether better prompt engineering, or RAG, or fine-tuning, or some combination of all is needed for satisfactory LLM output in terms of the project's given use case.

Time and resources is then wasted R&D'ing exactly what LLM enhancements are needed.

I believe an out-of-the-box solution to compare models w/ or w/out the aforementioned LLM enhancements could help teams of any size better decide what LLM enhancements are needed prior to building.

I wanted to know if the LightEval team or Hugging Face in general is thinking about such a tool.

SaylorTwift

Aug 26

Hi! Lighteval makes it easy to compare model enhancements, such as different prompting or fine-tuning. You can change the prompts for a given task or even create a new task using a different prompt, generation size, stop words, etc.
All you need to create a new task is listed in the lighteval readme.
Do you have a more specific use case in mind so that we can eventually help you ?

In this post