dslim/bert-base-NER · Unable to produce the same Eval and Test Results

Jun 18, 2022

•

edited Jun 18, 2022

Dataset used : https://huggingface.co./datasets/conll2003

Evaluation Metric : load_metric("seqeval")

**Results Obtained : **
{'eval_loss': 2.3160810470581055,
'eval_precision': 0.6153949670300094,
'eval_recall': 0.7696061932009425,
'eval_f1': 0.6839153518283106,
'eval_accuracy': 0.9621769588508859,
'eval_runtime': 556.8392,
'eval_samples_per_second': 5.838,
'eval_steps_per_second': 0.731}

Ner label alignment code : Code from : https://huggingface.co./course/chapter7/2

def align_labels_with_tokens(labels, word_ids):
new_labels = []
current_word = None
for word_id in word_ids:
if word_id != current_word:
# Start of a new word!
current_word = word_id
label = -100 if word_id is None else labels[word_id]
new_labels.append(label)
elif word_id is None:
# Special token
new_labels.append(-100)
else:
# Same word as previous token
label = labels[word_id]
# If the label is B-XXX we change it to I-XXX
if label % 2 == 1:
label += 1
new_labels.append(label)

return new_labels

Compute Metric

def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)

true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
    [id2labels[str(p)] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
    "precision": all_metrics["overall_precision"],
    "recall": all_metrics["overall_recall"],
    "f1": all_metrics["overall_f1"],
    "accuracy": all_metrics["overall_accuracy"],
}

Note : Using id2labels from ur models. Please comment on this

gsidiro

Jul 29, 2022

Was there any update on this? Did you manage to reproduce the results?

clive777

Aug 17, 2022

no