Adding the Open Portuguese LLM Leaderboard Evaluation Results

211bfd8 verified 5 months ago

6.54 kB

	---
	language:
	- pt
	license: apache-2.0
	model-index:
	- name: CabraMistral-v3-7b-32k
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 58.64
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 45.62
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 41.46
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 86.14
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 68.06
	name: pearson
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 47.46
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 70.46
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 62.39
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 65.71
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraMistral-v3-7b-32k
	name: Open Portuguese LLM Leaderboard
	---

	# Cabra Mistral 7b v3 - 32k
	<img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/660b1a4d574293d8a1ce48ca_cabra1.png" width="400" height="400">

	Esse modelo é um finetune do [Mistral 7b Instruct 0.3](https://huggingface.co./mistralai/mistral-7b-instruct-v0.3) com o dataset BotBot Cabra 10k. Esse modelo é optimizado para português.

	Conheça os nossos outros modelos: [Cabra](https://huggingface.co./collections/botbot-ai/models-6604c2069ceef04f834ba99b).

	## Detalhes do Modelo

	### Modelo: Mistral 7b Instruct 0.3

	Mistral-7B-v0.3 é um modelo de transformador, com as seguintes escolhas arquitetônicas:

	- Grouped-Query Attention
	- Sliding-Window Attention
	- Byte-fallback BPE tokenizer

	### dataset: Cabra 10k

	Dataset interno para finetuning. Vamos lançar em breve.

	### Quantização / GGUF

	Colocamos diversas versões (GGUF) quantanizadas no branch "quantanization".

	### Exemplo

	```
	<s> [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter.</s>
	```

	### Paramentros de trainamento

	```
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.01
	- num_epochs: 3
	```

	### Framework

	- Transformers 4.39.0.dev0
	- Pytorch 2.1.2+cu118
	- Datasets 2.14.6
	- Tokenizers 0.15.2

	### Evals


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co./datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/botbot-ai/CabraMistral-v3-7b-32k) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|60.66\|
	\|ENEM Challenge (No Images)\| 58.64\|
	\|BLUEX (No Images) \| 45.62\|
	\|OAB Exams \| 41.46\|
	\|Assin2 RTE \| 86.14\|
	\|Assin2 STS \| 68.06\|
	\|FaQuAD NLI \| 47.46\|
	\|HateBR Binary \| 70.46\|
	\|PT Hate Speech Binary \| 62.39\|
	\|tweetSentBR \| 65.71\|