Pro-1-preview
Pro-1 is a reasoning model trained using GRPO towards a physics based reward function for protein stability.
It takes in a protein sequence + text description of the protein + effects of previous engineering attempts, reasons over the information given, and proposes modifications to improve the stability of the given sequence.
LORA checkpoints
Model | Checkpoint |
---|---|
8b base GRPO | best-checkpoint |
8b creative reward | creativity-lm-grpo-mega-run-full |
8b creative + specificity reward (default) | all-lm-grpo-mega-run |
70b SFT only | llama_70b_4bit_sft_lora_model |
Example Usage
from unsloth import FastLanguageModel
from transformers import TextIteratorStreamer
import threading
def run_protein_engineering_example():
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/meta-Llama-3.1-8B-Instruct",
max_seq_length=32768,
load_in_4bit=True,
fast_inference=True,
max_lora_rank=32,
gpu_memory_utilization=0.6,
)
# Load the protein engineering adapter weights
model.load_adapter("your-username/protein-engineering-llama-3.1")
FastLanguageModel.for_inference(model)
protein_sequence = "MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTY"
prompt = f"""
...{STRUCTURED PROMPT SEE https://github.com/michaelhla/pro-1 FOR CORRECT USAGE}...
"""
# Initialize the streamer for text generation
streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)
# Set up generation parameters
generation_kwargs = dict(
input_ids=tokenizer(prompt, return_tensors="pt").input_ids.to(model.device),
streamer=streamer,
max_new_tokens=4096,
temperature=0.9,
top_p=0.95,
do_sample=True
)
# Create a thread to run the generation
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
# Print the response as it streams
print("Model response (streaming):")
for new_text in streamer:
print(new_text, end="", flush=True)
thread.join() # Ensure generation is complete
if __name__ == "__main__":
run_protein_engineering_example()
Note: While the model was specifically trained on enzymes, it should work for any protein sequence. Curious to hear if this is true!
Disclaimer: This is a preview version and as a result the model can be very dumb. Always double check sure your modified sequences have the correct mutations applied. Assume all references from the model are hallucinated.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for mhla/pro-1
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.3-70B-Instruct