Transformers documentation

ConvBERT

Transformers

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

ConvBERT

개요

ConvBERT 모델은 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan에 의해 제안되었으며, 제안 논문 제목은 ConvBERT: Improving BERT with Span-based Dynamic Convolution입니다.

논문의 초록은 다음과 같습니다:

BERT와 그 변형 모델과 같은 사전 학습된 언어 모델들은 최근 다양한 자연어 이해 과제에서 놀라운 성과를 이루었습니다. 그러나 BERT는 글로벌 셀프 어텐션 블록에 크게 의존하기 때문에 메모리 사용량이 많고 계산 비용이 큽니다. 모든 어텐션 헤드가 글로벌 관점에서 어텐션 맵을 생성하기 위해 입력 시퀀스 전체를 탐색하지만, 일부 헤드는 로컬 종속성만 학습할 필요가 있다는 것을 발견했습니다. 이는 불필요한 계산이 포함되어 있음을 의미합니다. 따라서 우리는 이러한 self-attention 헤드들을 대체하여 로컬 종속성을 직접 모델링하기 위해 새로운 span 기반 동적 컨볼루션을 제안합니다. 새로운 컨볼루션 헤드와 나머지 self-attention 헤드들이 결합하여 글로벌 및 로컬 문맥 학습에 더 효율적인 혼합 어텐션 블록을 구성합니다. 우리는 BERT에 이 혼합 어텐션 설계를 적용하여 ConvBERT 모델을 구축했습니다. 실험 결과, ConvBERT는 다양한 다운스트림 과제에서 BERT 및 그 변형 모델보다 더 우수한 성능을 보였으며, 훈련 비용과 모델 파라미터 수가 더 적었습니다. 특히 ConvBERTbase 모델은 GLUE 스코어 86.4를 달성하여 ELECTRAbase보다 0.7 높은 성과를 보이며, 훈련 비용은 1/4 이하로 줄었습니다. 코드와 사전 학습된 모델은 공개될 예정입니다.

이 모델은 abhishek에 의해 기여되었으며, 원본 구현은 여기에서 찾을 수 있습니다 : https://github.com/yitu-opensource/ConvBert

사용 팁

ConvBERT 훈련 팁은 BERT와 유사합니다. 사용 팁은 BERT 문서.를 참고하십시오.

Transformers

ConvBERT

개요

사용 팁

리소스

ConvBertConfig

class transformers.ConvBertConfig

ConvBertTokenizer

class transformers.ConvBertTokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

ConvBertTokenizerFast

class transformers.ConvBertTokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

ConvBertModel

class transformers.ConvBertModel

forward

ConvBertForMaskedLM

class transformers.ConvBertForMaskedLM

forward

ConvBertForSequenceClassification

class transformers.ConvBertForSequenceClassification

forward

ConvBertForMultipleChoice

class transformers.ConvBertForMultipleChoice

forward

ConvBertForTokenClassification

class transformers.ConvBertForTokenClassification

forward

ConvBertForQuestionAnswering

class transformers.ConvBertForQuestionAnswering

forward

TFConvBertModel

class transformers.TFConvBertModel

call

TFConvBertForMaskedLM

class transformers.TFConvBertForMaskedLM

call

TFConvBertForSequenceClassification

class transformers.TFConvBertForSequenceClassification

call

TFConvBertForMultipleChoice

class transformers.TFConvBertForMultipleChoice

call

TFConvBertForTokenClassification

class transformers.TFConvBertForTokenClassification

call

TFConvBertForQuestionAnswering

class transformers.TFConvBertForQuestionAnswering

call