File size: 3,593 Bytes
cc74085 92c92ae 8627a70 cc74085 8627a70 cc74085 92c92ae 3d7a66f 92c92ae 3d7a66f 92c92ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
title: BrowserGym Leaderboard
emoji: π
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
license: mit
---
# BrowserGym Leaderboard
This leaderboard tracks performance of various agents on web navigation tasks.
## How to Submit Results for New Agents
### 1. Create Results Directory
Create a new folder in the `results` directory with your agent's name:
```bash
results/
βββ your-agent-name/
βββ README.md
βββ webarena.json
βββ workarena-l1.json
βββ workarena++-l2.json
βββ workarena++-l3.json
βββ miniwob.json
```
### 2. Add Agent Details
Create a `README.md` in your agent's folder with the following details:
#### Required Information
- **Model Name**: Base model used (e.g., GPT-4, Claude-2)
- **Model Architecture**: Architecture details and any modifications
- **Input/Output Format**: How inputs are processed and outputs generated
- **Training Details**: Training configuration if applicable
- Dataset used
- Number of training steps
- Hardware used
- Training time
#### Optional Information
- **Paper Link**: Link to published paper/preprint if available
- **Code Repository**: Link to public code implementation
- **Additional Notes**: Any special configurations or requirements
- **License**: License information for your agent
Make sure to organize the information in clear sections using Markdown.
### 3. Add Benchmark Results
Create separate JSON files for each benchmark following this format:
```json
[
{
"agent_name": "your-agent-name",
"study_id": "unique-study-identifier-from-agentlab",
"date_time": "YYYY-MM-DD HH:MM:SS",
"benchmark": "WebArena",
"score": 0.0,
"std_err": 0.0,
"benchmark_specific": "Yes/No",
"benchmark_tuned": "Yes/No",
"followed_evaluation_protocol": "Yes/No",
"reproducible": "Yes/No",
"comments": "Additional details",
"original_or_reproduced": "Original"
}
]
```
Please add all the benchmark files in separate json files named as follows:
- `webarena.json`
- `workarena-l1.json`
- `workarena-l2.json`
- `workarena-l3.json`
- `miniwob.json`
Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename.
### 4. Submit PR
1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch
## How to Submit Reproducibility Results for Existing Agents
Open the results file for the agent and benchmark you reproduced the results for.
### 1. Add reproduced results
Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`.
```json
[
{
"agent_name": "your-agent-name",
"study_id": "unique-study-identifier-from-agentlab",
"date_time": "YYYY-MM-DD HH:MM:SS",
"benchmark": "WebArena",
"score": 0.0,
"std_err": 0.0,
"benchmark_specific": "Yes/No",
"benchmark_tuned": "Yes/No",
"followed_evaluation_protocol": "Yes/No",
"reproducible": "Yes/No",
"comments": "Additional details",
"original_or_reproduced": "Reproduced"
}
]
```
### 2. Submit PR
1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch
## License
MIT |