Vijayendra's picture
Update README.md
dacbe40 verified
metadata
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
library_name: peft
license: mit
language:
  - en
pipeline_tag: text-generation

Model Card for Model ID

The DeepSeek-Llama3.1-8B-DeepThinker-v1 model exhibits clear emergent behavior, particularly in self-correction, epistemic humility, structured reasoning, and ethical awareness. It is not just reciting knowledge but actively verifying, analyzing, and refining its reasoning—key traits that distinguish advanced models from basic language generators.

How to Use

!pip install bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# **Model Name on Hugging Face**
MODEL_NAME = "Vijayendra/DeepSeek-Llama3.1-8B-DeepThinker-v1"

# 🛠 **Load Model & Tokenizer from Hugging Face**
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, 
    device_map="auto",  # Automatically assigns model layers to available GPUs/CPUs
    torch_dtype=torch.float16  # Use 16-bit precision for memory efficiency
).to("cuda" if torch.cuda.is_available() else "cpu")  # Send model to GPU if available

# 🛠 **Define Inference Function**
def generate_response(model, tokenizer, prompt, max_new_tokens=2048, temperature=0.7):
    
    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
    # Ensure attention mask is passed
    attention_mask = inputs.attention_mask
    # Generate response
    with torch.no_grad():
        generated_tokens = model.generate(
            inputs.input_ids,
            attention_mask=inputs.attention_mask,  # Ensure attention mask is passed
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            do_sample=True,
            top_k=40,
            top_p=0.9,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    # Decode response
    return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)

# **Test Questions**
questions = [
    "The sun orbits the Earth once every 365 days. Is this true?",
    "Write a brief summary about the impact of World War I and World War II on human history, ensuring that you do not hallucinate numbers or dates.",
    "Explain in detail how a nuclear reactor works, including the roles of moderation, control rods, and coolant, without resorting to overly generic explanations.",
    "Analyze the ethical implications of using AI in decision-making within the criminal justice system, highlighting both potential benefits and risks."
]

# **Generate and Print Responses**
for i, question in enumerate(questions, 1):
    response = generate_response(model, tokenizer, question)
    print(f"\n🟢 Question {i}: {question}")
    print(f"🔵 Response: {response}")

For Example

Example: Historical Impact of World War I & II Without Hallucinating Dates

Prompt: "Write a brief summary about the impact of World War I and World War II on human history, ensuring that you do not hallucinate numbers or dates."

Model’s Response: "While precise casualty estimates vary, both wars collectively caused immense human and economic devastation. World War I significantly reshaped geopolitical structures, leading to the Treaty of Versailles, while World War II accelerated global technological advancements and institutional reforms, such as the formation of the United Nations. However, the exact number of deaths varies by source, and caution should be taken when citing figures."

Observed Meta-Cognition:

The model explicitly states uncertainty regarding the exact casualty figures rather than generating misleading or confident-sounding but incorrect numbers.
The phrase "caution should be taken when citing figures" indicates a higher-order awareness of knowledge limitations.
Instead of making up statistics, the model provides a generalized but accurate analysis of historical impact.

✅ Significance:

This demonstrates an emergent form of reasoning awareness.
The model avoids hallucination, showing it can restrain overconfidence in areas of factual uncertainty.
Similar behavior to expert human researchers who qualify statements when precise data is unavailable.