linyueqian
/

ME555_llava_v1.5_finetuned

vision-language

Model card Files Files and versions Community

Model Card for Unsolvable Robotic Task Detection

Model Details

Purpose: Detects when robotic tasks are impossible to complete
Base Model: LLaVA v1.5 7B
Developed by: Duke University
Type: Vision-Language Model

Use Cases

Identifying unsolvable robotic tasks in real-time
Explaining why tasks cannot be completed
Supporting safe human-robot interaction

Training Data

4,920 synthetic images with question-answer pairs
Covers five categories: Status Conflicts, Item Absences, Logical Contradictions, Ambiguous Tasks, and Ethical Constraints

Performance

Success rate on SDXL synthetic data: 78.05%
Success rate on simulator synthetic data: 81.00%

Limitations

Works only with tasks similar to training data
Requires human oversight
May not catch novel types of impossible tasks

Getting Started

# Basic configuration
config = {
    "USE_LORA": True,
    "LORA_R": 8,
    "LORA_ALPHA": 8,
    "MODEL_MAX_LEN": 1024
}

Contact

{yixuan.yang,yueqian.lin}@duke.edu

Downloads last month: 4

Video Preview

loading

Model tree for linyueqian/ME555_llava_v1.5_finetuned

Base model

llava-hf/llava-1.5-7b-hf

Adapter

(115)

this model