--- pipeline_tag: feature-extraction tags: - feature-extraction - transformers license: apache-2.0 language: - id metrics: - accuracy - f1 - precision - recall datasets: - squad_v2 --- ### indo-dpr-question_encoder-single-squad-base

Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 dataset in DPR format.

### Evaluation | Class | Precision | Recall | F1-Score | Support | |-------|-----------|--------|----------|---------| | hard_negative | 0.9963 | 0.9963 | 0.9963 | 183090 | | positive | 0.8849 | 0.8849 | 0.8849 | 5910 | | Metric | Value | |--------|-------| | Accuracy | 0.9928 | | Macro Average | 0.9406 | | Weighted Average | 0.9928 |

Note: This report is for evaluation on the dev set, after 12000 batches.

### Usage ```python from transformers import DPRContextEncoder, DPRContextEncoderTokenizer tokenizer = DPRContextEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base') model = DPRContextEncoder.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base') input_ids = tokenizer("Ibukota Indonesia terletak dimana?", return_tensors='pt')["input_ids"] embeddings = model(input_ids).pooler_output ``` You can use it using `haystack` as follows: ``` from haystack.nodes import DensePassageRetriever from haystack.document_stores import InMemoryDocumentStore retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(), query_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base", passage_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base", max_seq_len_query=64, max_seq_len_passage=256, batch_size=16, use_gpu=True, embed_title=True, use_fast_tokenizers=True) ```