Pro-1-preview

GitHub Twitter Hugging Face Blog Post

Pro-1 is a reasoning model trained using GRPO towards a physics based reward function for protein stability.

It takes in a protein sequence + text description of the protein + effects of previous engineering attempts, reasons over the information given, and proposes modifications to improve the stability of the given sequence.

LORA checkpoints

Model Checkpoint
8b base GRPO best-checkpoint
8b creative reward creativity-lm-grpo-mega-run-full
8b creative + specificity reward (default) all-lm-grpo-mega-run
70b SFT only llama_70b_4bit_sft_lora_model

Example Usage

from unsloth import FastLanguageModel
from transformers import TextIteratorStreamer
import threading

def run_protein_engineering_example():
    # Load the model and tokenizer
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="unsloth/meta-Llama-3.1-8B-Instruct",
        max_seq_length=32768,
        load_in_4bit=True,
        fast_inference=True,
        max_lora_rank=32,
        gpu_memory_utilization=0.6,
    )
    
    # Load the protein engineering adapter weights
    model.load_adapter("your-username/protein-engineering-llama-3.1")
    FastLanguageModel.for_inference(model)
    
    protein_sequence = "MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTY"
    
    prompt = f"""

...{STRUCTURED PROMPT SEE https://github.com/michaelhla/pro-1 FOR CORRECT USAGE}...

"""

    # Initialize the streamer for text generation
    streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)
    
    # Set up generation parameters
    generation_kwargs = dict(
        input_ids=tokenizer(prompt, return_tensors="pt").input_ids.to(model.device),
        streamer=streamer,
        max_new_tokens=4096,
        temperature=0.9,
        top_p=0.95,
        do_sample=True
    )
    
    # Create a thread to run the generation
    thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
    thread.start()
    
    # Print the response as it streams
    print("Model response (streaming):")
    for new_text in streamer:
        print(new_text, end="", flush=True)
    
    thread.join()  # Ensure generation is complete

if __name__ == "__main__":
    run_protein_engineering_example()

Note: While the model was specifically trained on enzymes, it should work for any protein sequence. Curious to hear if this is true!

Disclaimer: This is a preview version and as a result the model can be very dumb. Always double check sure your modified sequences have the correct mutations applied. Assume all references from the model are hallucinated.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for mhla/pro-1

Finetuned
(138)
this model