QBind: QLoRA for ESM-2 Binding Sites Prediction
QLoRAs for various ESM-2 models for predicting binding sites of protein sequences.
Updated • 151Note A QLoRA trained on ~16M protein sequences with binding site annotations from UniProt.
AmelieSchreiber/esm2_t33_650M_qlora_binding_12M
Updated • 1Note A QLoRA trained on ~12M protein sequences with binding site annotations from UniProt.
AmelieSchreiber/esm2_t6_8m_qlora_binding_sites_v0
Updated • 12 • 1Note While this model is not overfit, it still shows more signs of overfitting than a model that uses more QLoRA adapter layer. This model only uses adapters for the query, key, and value weight matrices.
AmelieSchreiber/esm2_t12_35M_qlora_binding_sites_v0
Updated • 3Note This model only uses adapters for the query, key, and value weight matrices. Thus, it is more overfit than a model that uses more adapter layers.
AmelieSchreiber/esm2_t6_8m_qlora_binding_sites_v1
UpdatedNote Less overfitting occurs in this model due to more weight matrices being adapted with QLoRA.
AmelieSchreiber/esm2_t12_35M_qlora_binding_sites_v1
UpdatedNote Less overfitting occurs in this model due to more weight matrices being adapted with QLoRA.
AmelieSchreiber/esm2_t12_35M_qlora_binding_2600K_cp1
Updated
AmelieSchreiber/600K_binding_sites
Updated • 49Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.
AmelieSchreiber/1111K_binding_sites
Updated • 80Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.
AmelieSchreiber/2600K_binding_sites
Updated • 40Note This dataset is curated from UniProt. The test set was created by selecting entire families of proteins to separate out at random. The train/test split is approximately 80/20. All binding site and active site annotations were merged. All sequences longer than 1000 amino acids were split into non-overlapping chunks of 1000 residues or less.