Subhabrata Mukherjee
commited on
Commit
•
fbedb81
1
Parent(s):
79d89b0
Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,8 @@ XtremeDistil is a distilled task-agnostic transformer model leveraging multi-tas
|
|
12 |
|
13 |
This l6-h384 checkpoint with **6** layers, **384** hidden size, **12** attention heads corresponds to **22 million** parameters with **5.3x** speedup over BERT-base.
|
14 |
|
|
|
|
|
15 |
The following table shows the results on GLUE dev set and SQuAD-v2.
|
16 |
|
17 |
| Models | #Params | Speedup | MNLI | QNLI | QQP | RTE | SST | MRPC | SQUAD2 | Avg |
|
|
|
12 |
|
13 |
This l6-h384 checkpoint with **6** layers, **384** hidden size, **12** attention heads corresponds to **22 million** parameters with **5.3x** speedup over BERT-base.
|
14 |
|
15 |
+
Other available checkpoints: [xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) and [xtremedistil-l6-h384-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h384-uncased)
|
16 |
+
|
17 |
The following table shows the results on GLUE dev set and SQuAD-v2.
|
18 |
|
19 |
| Models | #Params | Speedup | MNLI | QNLI | QQP | RTE | SST | MRPC | SQUAD2 | Avg |
|