Spaces:

ServiceNow
/

browsergym-leaderboard

Running

File size: 3,593 Bytes

---
title: BrowserGym Leaderboard
emoji: 🏆
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
license: mit
---

# BrowserGym Leaderboard

This leaderboard tracks performance of various agents on web navigation tasks.

## How to Submit Results for New Agents

### 1. Create Results Directory
Create a new folder in the `results` directory with your agent's name:
```bash
results/
└── your-agent-name/
    ├── README.md
    ├── webarena.json
    ├── workarena-l1.json
    ├── workarena++-l2.json
    ├── workarena++-l3.json
    └── miniwob.json
```


### 2. Add Agent Details

Create a `README.md` in your agent's folder with the following details:

#### Required Information
- **Model Name**: Base model used (e.g., GPT-4, Claude-2)
- **Model Architecture**: Architecture details and any modifications
- **Input/Output Format**: How inputs are processed and outputs generated
- **Training Details**: Training configuration if applicable
  - Dataset used
  - Number of training steps
  - Hardware used
  - Training time

#### Optional Information
- **Paper Link**: Link to published paper/preprint if available
- **Code Repository**: Link to public code implementation
- **Additional Notes**: Any special configurations or requirements
- **License**: License information for your agent

Make sure to organize the information in clear sections using Markdown.

### 3. Add Benchmark Results

Create separate JSON files for each benchmark following this format:

```json
[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Original"
    }
]
```

Please add all the benchmark files in separate json files named as follows:

- `webarena.json`
- `workarena-l1.json`
- `workarena-l2.json`
- `workarena-l3.json`
- `miniwob.json`

Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename.

### 4. Submit PR

1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch 

## How to Submit Reproducibility Results for Existing Agents

Open the results file for the agent and benchmark you reproduced the results for.

### 1. Add reproduced results


Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`.

```json
[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Reproduced"
    }
]
```

### 2. Submit PR

1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch

## License

MIT