File size: 3,593 Bytes
cc74085
92c92ae
 
8627a70
 
cc74085
 
8627a70
cc74085
 
92c92ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d7a66f
 
 
92c92ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d7a66f
 
 
92c92ae
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: BrowserGym Leaderboard
emoji: πŸ†
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
license: mit
---

# BrowserGym Leaderboard

This leaderboard tracks performance of various agents on web navigation tasks.

## How to Submit Results for New Agents

### 1. Create Results Directory
Create a new folder in the `results` directory with your agent's name:
```bash
results/
└── your-agent-name/
    β”œβ”€β”€ README.md
    β”œβ”€β”€ webarena.json
    β”œβ”€β”€ workarena-l1.json
    β”œβ”€β”€ workarena++-l2.json
    β”œβ”€β”€ workarena++-l3.json
    └── miniwob.json
```


### 2. Add Agent Details

Create a `README.md` in your agent's folder with the following details:

#### Required Information
- **Model Name**: Base model used (e.g., GPT-4, Claude-2)
- **Model Architecture**: Architecture details and any modifications
- **Input/Output Format**: How inputs are processed and outputs generated
- **Training Details**: Training configuration if applicable
  - Dataset used
  - Number of training steps
  - Hardware used
  - Training time

#### Optional Information
- **Paper Link**: Link to published paper/preprint if available
- **Code Repository**: Link to public code implementation
- **Additional Notes**: Any special configurations or requirements
- **License**: License information for your agent

Make sure to organize the information in clear sections using Markdown.

### 3. Add Benchmark Results

Create separate JSON files for each benchmark following this format:

```json
[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Original"
    }
]
```

Please add all the benchmark files in separate json files named as follows:

- `webarena.json`
- `workarena-l1.json`
- `workarena-l2.json`
- `workarena-l3.json`
- `miniwob.json`

Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename.

### 4. Submit PR

1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch 

## How to Submit Reproducibility Results for Existing Agents

Open the results file for the agent and benchmark you reproduced the results for.

### 1. Add reproduced results


Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`.

```json
[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Reproduced"
    }
]
```

### 2. Submit PR

1. Open the community tab and press "New Pull Request"
2. Give it a new title to the PR and follow the steps mentioned
3. Publish the branch

## License

MIT