Transformers documentation

πŸ€— Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.47.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Transformers

State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX.

πŸ€— Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:

πŸ“ Natural Language Processing: text classification, named entity recognition, question answering, language modeling, code generation, summarization, translation, multiple choice, and text generation.
πŸ–ΌοΈ Computer Vision: image classification, object detection, and segmentation.
πŸ—£οΈ Audio: automatic speech recognition and audio classification.
πŸ™ Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

πŸ€— Transformers support framework interoperability between PyTorch, TensorFlow, and JAX. This provides the flexibility to use a different framework at each stage of a model’s life; train a model in three lines of code in one framework, and load it for inference in another. Models can also be exported to a format like ONNX and TorchScript for deployment in production environments.

Join the growing community on the Hub, forum, or Discord today!

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Contents

The documentation is organized into five sections:

  • GET STARTED provides a quick tour of the library and installation instructions to get up and running.

  • TUTORIALS are a great place to start if you’re a beginner. This section will help you gain the basic skills you need to start using the library.

  • HOW-TO GUIDES show you how to achieve a specific goal, like finetuning a pretrained model for language modeling or how to write and share a custom model.

  • CONCEPTUAL GUIDES offers more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of πŸ€— Transformers.

  • API describes all classes and functions:

    • MAIN CLASSES details the most important classes like configuration, model, tokenizer, and pipeline.
    • MODELS details the classes and functions related to each model implemented in the library.
    • INTERNAL HELPERS details utility classes and functions used internally.

Supported models and frameworks

The table below represents the current support in the library for each of those models, whether they have a Python tokenizer (called β€œslow”). A β€œfast” tokenizer backed by the πŸ€— Tokenizers library, whether they have support in Jax (via Flax), PyTorch, and/or TensorFlow.

Model PyTorch support TensorFlow support Flax Support
ALBERT βœ… βœ… βœ…
ALIGN βœ… ❌ ❌
AltCLIP βœ… ❌ ❌
Aria βœ… ❌ ❌
AriaText βœ… ❌ ❌
Audio Spectrogram Transformer βœ… ❌ ❌
Autoformer βœ… ❌ ❌
Bamba βœ… ❌ ❌
Bark βœ… ❌ ❌
BART βœ… βœ… βœ…
BARThez βœ… βœ… βœ…
BARTpho βœ… βœ… βœ…
BEiT βœ… ❌ βœ…
BERT βœ… βœ… βœ…
Bert Generation βœ… ❌ ❌
BertJapanese βœ… βœ… βœ…
BERTweet βœ… βœ… βœ…
BigBird βœ… ❌ βœ…
BigBird-Pegasus βœ… ❌ ❌
BioGpt βœ… ❌ ❌
BiT βœ… ❌ ❌
Blenderbot βœ… βœ… βœ…
BlenderbotSmall βœ… βœ… βœ…
BLIP βœ… βœ… ❌
BLIP-2 βœ… ❌ ❌
BLOOM βœ… ❌ βœ…
BORT βœ… βœ… βœ…
BridgeTower βœ… ❌ ❌
BROS βœ… ❌ ❌
ByT5 βœ… βœ… βœ…
CamemBERT βœ… βœ… ❌
CANINE βœ… ❌ ❌
Chameleon βœ… ❌ ❌
Chinese-CLIP βœ… ❌ ❌
CLAP βœ… ❌ ❌
CLIP βœ… βœ… βœ…
CLIPSeg βœ… ❌ ❌
CLVP βœ… ❌ ❌
CodeGen βœ… ❌ ❌
CodeLlama βœ… ❌ βœ…
Cohere βœ… ❌ ❌
Cohere2 βœ… ❌ ❌
ColPali βœ… ❌ ❌
Conditional DETR βœ… ❌ ❌
ConvBERT βœ… βœ… ❌
ConvNeXT βœ… βœ… ❌
ConvNeXTV2 βœ… βœ… ❌
CPM βœ… βœ… βœ…
CPM-Ant βœ… ❌ ❌
CTRL βœ… βœ… ❌
CvT βœ… βœ… ❌
DAC βœ… ❌ ❌
Data2VecAudio βœ… ❌ ❌
Data2VecText βœ… ❌ ❌
Data2VecVision βœ… βœ… ❌
DBRX βœ… ❌ ❌
DeBERTa βœ… βœ… ❌
DeBERTa-v2 βœ… βœ… ❌
Decision Transformer βœ… ❌ ❌
Deformable DETR βœ… ❌ ❌
DeiT βœ… βœ… ❌
DePlot βœ… ❌ ❌
Depth Anything βœ… ❌ ❌
DETA βœ… ❌ ❌
DETR βœ… ❌ ❌
DialoGPT βœ… βœ… βœ…
DiNAT βœ… ❌ ❌
DINOv2 βœ… ❌ βœ…
DINOv2 with Registers βœ… ❌ ❌
DistilBERT βœ… βœ… βœ…
DiT βœ… ❌ βœ…
DonutSwin βœ… ❌ ❌
DPR βœ… βœ… ❌
DPT βœ… ❌ ❌
EfficientFormer βœ… βœ… ❌
EfficientNet βœ… ❌ ❌
ELECTRA βœ… βœ… βœ…
EnCodec βœ… ❌ ❌
Encoder decoder βœ… βœ… βœ…
ERNIE βœ… ❌ ❌
ErnieM βœ… ❌ ❌
ESM βœ… βœ… ❌
FairSeq Machine-Translation βœ… ❌ ❌
Falcon βœ… ❌ ❌
Falcon3 βœ… ❌ βœ…
FalconMamba βœ… ❌ ❌
FastSpeech2Conformer βœ… ❌ ❌
FLAN-T5 βœ… βœ… βœ…
FLAN-UL2 βœ… βœ… βœ…
FlauBERT βœ… βœ… ❌
FLAVA βœ… ❌ ❌
FNet βœ… ❌ ❌
FocalNet βœ… ❌ ❌
Funnel Transformer βœ… βœ… ❌
Fuyu βœ… ❌ ❌
Gemma βœ… ❌ βœ…
Gemma2 βœ… ❌ ❌
GIT βœ… ❌ ❌
GLM βœ… ❌ ❌
GLPN βœ… ❌ ❌
GPT Neo βœ… ❌ βœ…
GPT NeoX βœ… ❌ ❌
GPT NeoX Japanese βœ… ❌ ❌
GPT-J βœ… βœ… βœ…
GPT-Sw3 βœ… βœ… βœ…
GPTBigCode βœ… ❌ ❌
GPTSAN-japanese βœ… ❌ ❌
Granite βœ… ❌ ❌
GraniteMoeMoe βœ… ❌ ❌
Graphormer βœ… ❌ ❌
Grounding DINO βœ… ❌ ❌
GroupViT βœ… βœ… ❌
HerBERT βœ… βœ… βœ…
Hiera βœ… ❌ ❌
Hubert βœ… βœ… ❌
I-BERT βœ… ❌ ❌
I-JEPA βœ… ❌ ❌
IDEFICS βœ… βœ… ❌
Idefics2 βœ… ❌ ❌
Idefics3 βœ… ❌ ❌
Idefics3VisionTransformer ❌ ❌ ❌
ImageGPT βœ… ❌ ❌
Informer βœ… ❌ ❌
InstructBLIP βœ… ❌ ❌
InstructBlipVideo βœ… ❌ ❌
Jamba βœ… ❌ ❌
JetMoe βœ… ❌ ❌
Jukebox βœ… ❌ ❌
KOSMOS-2 βœ… ❌ ❌
LayoutLM βœ… βœ… ❌
LayoutLMv2 βœ… ❌ ❌
LayoutLMv3 βœ… βœ… ❌
LayoutXLM βœ… ❌ ❌
LED βœ… βœ… ❌
LeViT βœ… ❌ ❌
LiLT βœ… ❌ ❌
LLaMA βœ… ❌ βœ…
Llama2 βœ… ❌ βœ…
Llama3 βœ… ❌ βœ…
LLaVa βœ… ❌ ❌
LLaVA-NeXT βœ… ❌ ❌
LLaVa-NeXT-Video βœ… ❌ ❌
LLaVA-Onevision βœ… ❌ ❌
Longformer βœ… βœ… ❌
LongT5 βœ… ❌ βœ…
LUKE βœ… ❌ ❌
LXMERT βœ… βœ… ❌
M-CTC-T βœ… ❌ ❌
M2M100 βœ… ❌ ❌
MADLAD-400 βœ… βœ… βœ…
Mamba βœ… ❌ ❌
mamba2 βœ… ❌ ❌
Marian βœ… βœ… βœ…
MarkupLM βœ… ❌ ❌
Mask2Former βœ… ❌ ❌
MaskFormer βœ… ❌ ❌
MatCha βœ… ❌ ❌
mBART βœ… βœ… βœ…
mBART-50 βœ… βœ… βœ…
MEGA βœ… ❌ ❌
Megatron-BERT βœ… ❌ ❌
Megatron-GPT2 βœ… βœ… βœ…
MGP-STR βœ… ❌ ❌
Mimi βœ… ❌ ❌
Mistral βœ… βœ… βœ…
Mixtral βœ… ❌ ❌
Mllama βœ… ❌ ❌
mLUKE βœ… ❌ ❌
MMS βœ… βœ… βœ…
MobileBERT βœ… βœ… ❌
MobileNetV1 βœ… ❌ ❌
MobileNetV2 βœ… ❌ ❌
MobileViT βœ… βœ… ❌
MobileViTV2 βœ… ❌ ❌
ModernBERT βœ… ❌ ❌
Moshi βœ… ❌ ❌
MPNet βœ… βœ… ❌
MPT βœ… ❌ ❌
MRA βœ… ❌ ❌
MT5 βœ… βœ… βœ…
MusicGen βœ… ❌ ❌
MusicGen Melody βœ… ❌ ❌
MVP βœ… ❌ ❌
NAT βœ… ❌ ❌
Nemotron βœ… ❌ ❌
Nezha βœ… ❌ ❌
NLLB βœ… ❌ ❌
NLLB-MOE βœ… ❌ ❌
Nougat βœ… βœ… βœ…
NystrΓΆmformer βœ… ❌ ❌
OLMo βœ… ❌ ❌
OLMo2 βœ… ❌ ❌
OLMoE βœ… ❌ ❌
OmDet-Turbo βœ… ❌ ❌
OneFormer βœ… ❌ ❌
OpenAI GPT βœ… βœ… ❌
OpenAI GPT-2 βœ… βœ… βœ…
OpenLlama βœ… ❌ ❌
OPT βœ… βœ… βœ…
OWL-ViT βœ… ❌ ❌
OWLv2 βœ… ❌ ❌
PaliGemma βœ… ❌ ❌
PatchTSMixer βœ… ❌ ❌
PatchTST βœ… ❌ ❌
Pegasus βœ… βœ… βœ…
PEGASUS-X βœ… ❌ ❌
Perceiver βœ… ❌ ❌
Persimmon βœ… ❌ ❌
Phi βœ… ❌ ❌
Phi3 βœ… ❌ ❌
Phimoe βœ… ❌ ❌
PhoBERT βœ… βœ… βœ…
Pix2Struct βœ… ❌ ❌
Pixtral βœ… ❌ ❌
PLBart βœ… ❌ ❌
PoolFormer βœ… ❌ ❌
Pop2Piano βœ… ❌ ❌
ProphetNet βœ… ❌ ❌
PVT βœ… ❌ ❌
PVTv2 βœ… ❌ ❌
QDQBert βœ… ❌ ❌
Qwen2 βœ… ❌ ❌
Qwen2Audio βœ… ❌ ❌
Qwen2MoE βœ… ❌ ❌
Qwen2VL βœ… ❌ ❌
RAG βœ… βœ… ❌
REALM βœ… ❌ ❌
RecurrentGemma βœ… ❌ ❌
Reformer βœ… ❌ ❌
RegNet βœ… βœ… βœ…
RemBERT βœ… βœ… ❌
ResNet βœ… βœ… βœ…
RetriBERT βœ… ❌ ❌
RoBERTa βœ… βœ… βœ…
RoBERTa-PreLayerNorm βœ… βœ… βœ…
RoCBert βœ… ❌ ❌
RoFormer βœ… βœ… βœ…
RT-DETR βœ… ❌ ❌
RT-DETR-ResNet βœ… ❌ ❌
RWKV βœ… ❌ ❌
SAM βœ… βœ… ❌
SeamlessM4T βœ… ❌ ❌
SeamlessM4Tv2 βœ… ❌ ❌
SegFormer βœ… βœ… ❌
SegGPT βœ… ❌ ❌
SEW βœ… ❌ ❌
SEW-D βœ… ❌ ❌
SigLIP βœ… ❌ ❌
Speech Encoder decoder βœ… ❌ βœ…
Speech2Text βœ… βœ… ❌
SpeechT5 βœ… ❌ ❌
Splinter βœ… ❌ ❌
SqueezeBERT βœ… ❌ ❌
StableLm βœ… ❌ ❌
Starcoder2 βœ… ❌ ❌
SuperPoint βœ… ❌ ❌
SwiftFormer βœ… βœ… ❌
Swin Transformer βœ… βœ… ❌
Swin Transformer V2 βœ… ❌ ❌
Swin2SR βœ… ❌ ❌
SwitchTransformers βœ… ❌ ❌
T5 βœ… βœ… βœ…
T5v1.1 βœ… βœ… βœ…
Table Transformer βœ… ❌ ❌
TAPAS βœ… βœ… ❌
TAPEX βœ… βœ… βœ…
Time Series Transformer βœ… ❌ ❌
TimeSformer βœ… ❌ ❌
TimmWrapperModel βœ… ❌ ❌
Trajectory Transformer βœ… ❌ ❌
Transformer-XL βœ… βœ… ❌
TrOCR βœ… ❌ ❌
TVLT βœ… ❌ ❌
TVP βœ… ❌ ❌
UDOP βœ… ❌ ❌
UL2 βœ… βœ… βœ…
UMT5 βœ… ❌ ❌
UniSpeech βœ… ❌ ❌
UniSpeechSat βœ… ❌ ❌
UnivNet βœ… ❌ ❌
UPerNet βœ… ❌ ❌
VAN βœ… ❌ ❌
VideoLlava βœ… ❌ ❌
VideoMAE βœ… ❌ ❌
ViLT βœ… ❌ ❌
VipLlava βœ… ❌ ❌
Vision Encoder decoder βœ… βœ… βœ…
VisionTextDualEncoder βœ… βœ… βœ…
VisualBERT βœ… ❌ ❌
ViT βœ… βœ… βœ…
ViT Hybrid βœ… ❌ ❌
VitDet βœ… ❌ ❌
ViTMAE βœ… βœ… ❌
ViTMatte βœ… ❌ ❌
ViTMSN βœ… ❌ ❌
VITS βœ… ❌ ❌
ViViT βœ… ❌ ❌
Wav2Vec2 βœ… βœ… βœ…
Wav2Vec2-BERT βœ… ❌ ❌
Wav2Vec2-Conformer βœ… ❌ ❌
Wav2Vec2Phoneme βœ… βœ… βœ…
WavLM βœ… ❌ ❌
Whisper βœ… βœ… βœ…
X-CLIP βœ… ❌ ❌
X-MOD βœ… ❌ ❌
XGLM βœ… βœ… βœ…
XLM βœ… βœ… ❌
XLM-ProphetNet βœ… ❌ ❌
XLM-RoBERTa βœ… βœ… βœ…
XLM-RoBERTa-XL βœ… ❌ ❌
XLM-V βœ… βœ… βœ…
XLNet βœ… βœ… ❌
XLS-R βœ… βœ… βœ…
XLSR-Wav2Vec2 βœ… βœ… βœ…
YOLOS βœ… ❌ ❌
YOSO βœ… ❌ ❌
Zamba βœ… ❌ ❌
ZoeDepth βœ… ❌ ❌
< > Update on GitHub