YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co./docs/hub/model-cards#model-card-metadata)
StructBERT: Un-Official Copy
Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT
Claimer
- This model card is not produced by AliceMind Team
Reproduce HFHub models:
Download model/tokenizer vocab
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin
from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer
config = AutoConfig.from_pretrained("./config.json")
model = AutoModelForMaskedLM.from_pretrained(".", config=config)
tokenizer = AutoTokenizer.from_pretrained(".", config=config)
model.push_to_hub("structbert-large")
tokenizer.push_to_hub("structbert-large")
https://arxiv.org/abs/1908.04577
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Introduction
We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.
Pre-trained models
Model | Description | #params | Download |
---|---|---|---|
structbert.en.large | StructBERT using the BERT-large architecture | 340M | structbert.en.large |
structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon |
structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | structbert.ch.large |
Results
The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
structbert.en.large
Model | MNLI | QNLIv2 | QQP | SST-2 | MRPC |
---|---|---|---|---|---|
structbert.en.large | 86.86% | 93.04% | 91.67% | 93.23% | 86.51% |
structbert.ch.large
Model | CMNLI | OCNLI | TNEWS | AFQMC |
---|---|---|---|---|
structbert.ch.large | 84.47% | 81.28% | 68.67% | 76.11% |
Example usage
Requirements and Installation
PyTorch version >= 1.0.1
Install other libraries via
pip install -r requirements.txt
- For faster training install NVIDIA's apex library
Finetune MNLI
python run_classifier_multi_task.py \
--task_name MNLI \
--do_train \
--do_eval \
--do_test \
--amp_type O1 \
--lr_decay_factor 1 \
--dropout 0.1 \
--do_lower_case \
--detach_index -1 \
--core_encoder bert \
--data_dir path_to_glue_data \
--vocab_file config/vocab.txt \
--bert_config_file config/large_bert_config.json \
--init_checkpoint path_to_pretrained_model \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--fast_train \
--gradient_accumulation_steps 1 \
--output_dir path_to_output_dir
Citation
If you use our work, please cite:
@article{wang2019structbert,
title={Structbert: Incorporating language structures into pre-training for deep language understanding},
author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
journal={arXiv preprint arXiv:1908.04577},
year={2019}
}
- Downloads last month
- 34
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.