Miaoran000 commited on
Commit
8135339
·
1 Parent(s): 7ef82ad

update text description

Browse files
Files changed (3) hide show
  1. .gitignore +3 -0
  2. app.py +1 -1
  3. src/display/about.py +5 -6
.gitignore CHANGED
@@ -14,12 +14,15 @@ auto_evals/
14
  eval-queue-bk/
15
  eval-results-bk/
16
  eval-results-bk_hhem21/
 
 
17
 
18
  src/assets/model_counts.html
19
 
20
  generation_results/
21
  Hallucination Leaderboard Results
22
  dataset_stats.py
 
23
 
24
  get_comparison.py
25
  GPT-4-Turbo_v.s._GPT-4o.csv
 
14
  eval-queue-bk/
15
  eval-results-bk/
16
  eval-results-bk_hhem21/
17
+ eval-results_hhem21/
18
+ hhem21_server/
19
 
20
  src/assets/model_counts.html
21
 
22
  generation_results/
23
  Hallucination Leaderboard Results
24
  dataset_stats.py
25
+ hhem_v21_eval.py
26
 
27
  get_comparison.py
28
  GPT-4-Turbo_v.s._GPT-4o.csv
app.py CHANGED
@@ -24,7 +24,7 @@ except Exception:
24
  try:
25
  print(envs.EVAL_RESULTS_PATH)
26
  snapshot_download(
27
- repo_id=envs.RESULTS_REPO, local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
28
  )
29
  except Exception:
30
  restart_space()
 
24
  try:
25
  print(envs.EVAL_RESULTS_PATH)
26
  snapshot_download(
27
+ repo_id=envs.RESULTS_REPO, revision='hhem21', local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
28
  )
29
  except Exception:
30
  restart_space()
src/display/about.py CHANGED
@@ -25,7 +25,6 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
25
  INTRODUCTION_TEXT = """
26
  This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
27
  The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
28
- An improved version (HHEM v2) is integrated into the [Vectara platform](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=integration&utm_content=console&utm_campaign=huggingface-space-integration-console).
29
 
30
  """
31
 
@@ -46,7 +45,7 @@ The model card for HHEM can be found [here](https://huggingface.co/vectara/hallu
46
  ## Evaluation Dataset
47
 
48
  Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
49
- We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard))
50
 
51
  ## Metrics Explained
52
  - Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
@@ -55,14 +54,14 @@ We generate summaries for each of these documents using submitted LLMs and compu
55
  - Average Summary Length: The average word count of generated summaries
56
 
57
  ## Note on non-Hugging Face models
58
- On HHEM leaderboard, There are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
59
- If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at minseok@vectara.com.
60
 
61
  ## Model Submissions and Reproducibility
62
  You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
63
 
64
  ### For models not available on the Hugging Face model hub:
65
- 1) Access generated summaries used for evaluation [here](https://github.com/vectara/hallucination-leaderboard) in "leaderboard_summaries.csv".
66
  2) The text generation prompt is available under "Prompt Used" section in the repository's README.
67
  3) Details on API Integration for evaluations are under "API Integration Details".
68
 
@@ -114,7 +113,7 @@ The results are structured in JSON as follows:
114
  }
115
  }
116
  ```
117
- For additional queries or model submissions, please contact minseok@vectara.com.
118
  """
119
 
120
  EVALUATION_QUEUE_TEXT = """
 
25
  INTRODUCTION_TEXT = """
26
  This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
27
  The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
 
28
 
29
  """
30
 
 
45
  ## Evaluation Dataset
46
 
47
  Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
48
+ We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://github.com/vectara/hallucination-leaderboard))
49
 
50
  ## Metrics Explained
51
  - Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
 
54
  - Average Summary Length: The average word count of generated summaries
55
 
56
  ## Note on non-Hugging Face models
57
+ On HHEM leaderboard, there are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
58
+ If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at ofer@vectara.com.
59
 
60
  ## Model Submissions and Reproducibility
61
  You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
62
 
63
  ### For models not available on the Hugging Face model hub:
64
+ 1) Access generated summaries used for evaluation [here](https://huggingface.co/datasets/vectara/leaderboard_results).
65
  2) The text generation prompt is available under "Prompt Used" section in the repository's README.
66
  3) Details on API Integration for evaluations are under "API Integration Details".
67
 
 
113
  }
114
  }
115
  ```
116
+ For additional queries or model submissions, please contact ofer@vectara.com.
117
  """
118
 
119
  EVALUATION_QUEUE_TEXT = """