--- library_name: transformers license: mit base_model: intfloat/multilingual-e5-small tags: - generated_from_trainer metrics: - precision - recall - accuracy model-index: - name: owm-math-scorer-multilingual-e5-small results: [] --- # FineMath classifier ## Model summary This is a classifier for evaluating mathematical reasoning and deduction in web pages, fine-tuned from [intfloat/multilingual-e5-small](https://huggingface.co./intfloat/multilingual-e5-small). It was developed to filter and curate mathematical content from web datasets and was trained on 1M annotations generated by [LLama3-70B-instruct](https://huggingface.co./meta-llama/Meta-Llama-3-70B-Instruct) for web samples from Common Crawl, which were extracted using the [OpenWebMath](https://github.com/keirp/OpenWebMath) text extraction pipeline. To ensure a balanced dataset, we upsampled pages containing mathematical content in the annotations, using a preliminary math classifier on 5M samples. We used this classifier to build [FineMath](https://huggingface.co./datasets/HuggingFaceTB/finemath) dataset. ### How to use in transformers To load the FineMath classifier, use the following code: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/finemath-classifier") model = AutoModelForSequenceClassification.from_pretrained("HuggingFaceTB/finemath-classifier") text = "This is a test sentence." inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True) outputs = model(**inputs) logits = outputs.logits.squeeze(-1).float().detach().numpy() score = logits.item() result = { "text": text, "score": score, "int_score": int(round(max(0, min(score, 5)))), } print(result) # {'text': 'This is a test sentence.', 'score': 0.07964489609003067, 'int_score': 0} ``` ## Training The classifier was trained on 1M pairs of web samples and their scores from 0 to 5, generated by Llama3. The samples were annotated based on their usefulness for studying mathematics with 0 being not educational or containing matematical content and 5 being outstanding for mathetmatics education. Below is the prompt used for LLama3 annotations:
Prompt for LLM annotation
We added a classification head with a single regression output to [intfloat/multilingual-e5-small](https://huggingface.co./intfloat/multilingual-e5-small) and trained the model for 20 epochs with a learning rate of 3e-4. During training, the embedding and encoder layers were frozen to focus on the classification head. The model achieved an F1 score of 87% when converted to a binary classifier using a score threshold of 3. **Training Details:** - Model: intfloat/multilingual-e5-smallwith a classification head - Dataset: 1M samples from Llama3 annotations - Epochs: 20 - Learning Rate: 3e-4 - Evaluation Metric: F1 score **Evaluation:** The model achieves the following results on the evaluation set: - Loss: 0.4478 - Precision: 0.8771 - Recall: 0.8769 - F1 Macro: 0.8770 - Accuracy: 0.8770