|
--- |
|
title: BrowserGym Leaderboard |
|
emoji: π |
|
colorFrom: purple |
|
colorTo: green |
|
sdk: docker |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# BrowserGym Leaderboard |
|
|
|
This leaderboard tracks performance of various agents on web navigation tasks. |
|
|
|
## How to Submit Results for New Agents |
|
|
|
### 1. Create Results Directory |
|
Create a new folder in the `results` directory with your agent's name: |
|
```bash |
|
results/ |
|
βββ your-agent-name/ |
|
βββ README.md |
|
βββ webarena.json |
|
βββ workarena-l1.json |
|
βββ workarena++-l2.json |
|
βββ workarena++-l3.json |
|
βββ miniwob.json |
|
``` |
|
|
|
|
|
### 2. Add Agent Details |
|
|
|
Create a `README.md` in your agent's folder with the following details: |
|
|
|
#### Required Information |
|
- **Model Name**: Base model used (e.g., GPT-4, Claude-2) |
|
- **Model Architecture**: Architecture details and any modifications |
|
- **Input/Output Format**: How inputs are processed and outputs generated |
|
- **Training Details**: Training configuration if applicable |
|
- Dataset used |
|
- Number of training steps |
|
- Hardware used |
|
- Training time |
|
|
|
#### Optional Information |
|
- **Paper Link**: Link to published paper/preprint if available |
|
- **Code Repository**: Link to public code implementation |
|
- **Additional Notes**: Any special configurations or requirements |
|
- **License**: License information for your agent |
|
|
|
Make sure to organize the information in clear sections using Markdown. |
|
|
|
### 3. Add Benchmark Results |
|
|
|
Create separate JSON files for each benchmark following this format: |
|
|
|
```json |
|
[ |
|
{ |
|
"agent_name": "your-agent-name", |
|
"study_id": "unique-study-identifier-from-agentlab", |
|
"date_time": "YYYY-MM-DD HH:MM:SS", |
|
"benchmark": "WebArena", |
|
"score": 0.0, |
|
"std_err": 0.0, |
|
"benchmark_specific": "Yes/No", |
|
"benchmark_tuned": "Yes/No", |
|
"followed_evaluation_protocol": "Yes/No", |
|
"reproducible": "Yes/No", |
|
"comments": "Additional details", |
|
"original_or_reproduced": "Original" |
|
} |
|
] |
|
``` |
|
|
|
Please add all the benchmark files in separate json files named as follows: |
|
|
|
- `webarena.json` |
|
- `workarena-l1.json` |
|
- `workarena-l2.json` |
|
- `workarena-l3.json` |
|
- `miniwob.json` |
|
|
|
Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename. |
|
|
|
### 4. Submit PR |
|
|
|
1. Open the community tab and press "New Pull Request" |
|
2. Give it a new title to the PR and follow the steps mentioned |
|
3. Publish the branch |
|
|
|
## How to Submit Reproducibility Results for Existing Agents |
|
|
|
Open the results file for the agent and benchmark you reproduced the results for. |
|
|
|
### 1. Add reproduced results |
|
|
|
|
|
Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`. |
|
|
|
```json |
|
[ |
|
{ |
|
"agent_name": "your-agent-name", |
|
"study_id": "unique-study-identifier-from-agentlab", |
|
"date_time": "YYYY-MM-DD HH:MM:SS", |
|
"benchmark": "WebArena", |
|
"score": 0.0, |
|
"std_err": 0.0, |
|
"benchmark_specific": "Yes/No", |
|
"benchmark_tuned": "Yes/No", |
|
"followed_evaluation_protocol": "Yes/No", |
|
"reproducible": "Yes/No", |
|
"comments": "Additional details", |
|
"original_or_reproduced": "Reproduced" |
|
} |
|
] |
|
``` |
|
|
|
### 2. Submit PR |
|
|
|
1. Open the community tab and press "New Pull Request" |
|
2. Give it a new title to the PR and follow the steps mentioned |
|
3. Publish the branch |
|
|
|
## License |
|
|
|
MIT |