YAML Metadata Error: "datasets[1]" with value "Custom Rosetta" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata Error: "language" with value "protein" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

ProtBert-BFD finetuned on Rosetta 20,40,60AA dataset

This model is finetuned to predict Rosetta fold energy using a dataset of 300k protein sequences: 100k of 20AA, 100k of 40AA, and 100k of 60AA

Current model in this repo: prot_bert_bfd-finetuned-032822_1323

Performance

  • 20AA sequences (1k eval set):
    Metrics: 'mae': 0.100418, 'r2': 0.989028, 'mse': 0.016266, 'rmse': 0.127537
  • 40AA sequences (10k eval set):
    Metrics: 'mae': 0.173888, 'r2': 0.963361, 'mse': 0.048218, 'rmse': 0.219587
  • 60AA sequences (10k eval set):
    Metrics: 'mae': 0.235238, 'r2': 0.930164, 'mse': 0.088131, 'rmse': 0.2968

prot_bert_bfd from ProtTrans

The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD. It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.

Created by Ladislav Rampasek

Downloads last month
25
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.