Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Miaoran000
commited on
Commit
·
8135339
1
Parent(s):
7ef82ad
update text description
Browse files- .gitignore +3 -0
- app.py +1 -1
- src/display/about.py +5 -6
.gitignore
CHANGED
@@ -14,12 +14,15 @@ auto_evals/
|
|
14 |
eval-queue-bk/
|
15 |
eval-results-bk/
|
16 |
eval-results-bk_hhem21/
|
|
|
|
|
17 |
|
18 |
src/assets/model_counts.html
|
19 |
|
20 |
generation_results/
|
21 |
Hallucination Leaderboard Results
|
22 |
dataset_stats.py
|
|
|
23 |
|
24 |
get_comparison.py
|
25 |
GPT-4-Turbo_v.s._GPT-4o.csv
|
|
|
14 |
eval-queue-bk/
|
15 |
eval-results-bk/
|
16 |
eval-results-bk_hhem21/
|
17 |
+
eval-results_hhem21/
|
18 |
+
hhem21_server/
|
19 |
|
20 |
src/assets/model_counts.html
|
21 |
|
22 |
generation_results/
|
23 |
Hallucination Leaderboard Results
|
24 |
dataset_stats.py
|
25 |
+
hhem_v21_eval.py
|
26 |
|
27 |
get_comparison.py
|
28 |
GPT-4-Turbo_v.s._GPT-4o.csv
|
app.py
CHANGED
@@ -24,7 +24,7 @@ except Exception:
|
|
24 |
try:
|
25 |
print(envs.EVAL_RESULTS_PATH)
|
26 |
snapshot_download(
|
27 |
-
repo_id=envs.RESULTS_REPO, local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
|
28 |
)
|
29 |
except Exception:
|
30 |
restart_space()
|
|
|
24 |
try:
|
25 |
print(envs.EVAL_RESULTS_PATH)
|
26 |
snapshot_download(
|
27 |
+
repo_id=envs.RESULTS_REPO, revision='hhem21', local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
|
28 |
)
|
29 |
except Exception:
|
30 |
restart_space()
|
src/display/about.py
CHANGED
@@ -25,7 +25,6 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
|
|
25 |
INTRODUCTION_TEXT = """
|
26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
27 |
The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
|
28 |
-
An improved version (HHEM v2) is integrated into the [Vectara platform](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=integration&utm_content=console&utm_campaign=huggingface-space-integration-console).
|
29 |
|
30 |
"""
|
31 |
|
@@ -46,7 +45,7 @@ The model card for HHEM can be found [here](https://huggingface.co/vectara/hallu
|
|
46 |
## Evaluation Dataset
|
47 |
|
48 |
Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
|
49 |
-
We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://
|
50 |
|
51 |
## Metrics Explained
|
52 |
- Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
|
@@ -55,14 +54,14 @@ We generate summaries for each of these documents using submitted LLMs and compu
|
|
55 |
- Average Summary Length: The average word count of generated summaries
|
56 |
|
57 |
## Note on non-Hugging Face models
|
58 |
-
On HHEM leaderboard,
|
59 |
-
If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at
|
60 |
|
61 |
## Model Submissions and Reproducibility
|
62 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
63 |
|
64 |
### For models not available on the Hugging Face model hub:
|
65 |
-
1) Access generated summaries used for evaluation [here](https://
|
66 |
2) The text generation prompt is available under "Prompt Used" section in the repository's README.
|
67 |
3) Details on API Integration for evaluations are under "API Integration Details".
|
68 |
|
@@ -114,7 +113,7 @@ The results are structured in JSON as follows:
|
|
114 |
}
|
115 |
}
|
116 |
```
|
117 |
-
For additional queries or model submissions, please contact
|
118 |
"""
|
119 |
|
120 |
EVALUATION_QUEUE_TEXT = """
|
|
|
25 |
INTRODUCTION_TEXT = """
|
26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
27 |
The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
|
|
|
28 |
|
29 |
"""
|
30 |
|
|
|
45 |
## Evaluation Dataset
|
46 |
|
47 |
Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
|
48 |
+
We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://github.com/vectara/hallucination-leaderboard))
|
49 |
|
50 |
## Metrics Explained
|
51 |
- Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
|
|
|
54 |
- Average Summary Length: The average word count of generated summaries
|
55 |
|
56 |
## Note on non-Hugging Face models
|
57 |
+
On HHEM leaderboard, there are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
|
58 |
+
If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at ofer@vectara.com.
|
59 |
|
60 |
## Model Submissions and Reproducibility
|
61 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
62 |
|
63 |
### For models not available on the Hugging Face model hub:
|
64 |
+
1) Access generated summaries used for evaluation [here](https://huggingface.co/datasets/vectara/leaderboard_results).
|
65 |
2) The text generation prompt is available under "Prompt Used" section in the repository's README.
|
66 |
3) Details on API Integration for evaluations are under "API Integration Details".
|
67 |
|
|
|
113 |
}
|
114 |
}
|
115 |
```
|
116 |
+
For additional queries or model submissions, please contact ofer@vectara.com.
|
117 |
"""
|
118 |
|
119 |
EVALUATION_QUEUE_TEXT = """
|