Spaces:

ServiceNow
/

browsergym-leaderboard

Running

App Files Files Community

browsergym-leaderboard / README.md

meghsn

Updated readme for PR

3d7a66f 3 months ago

preview code

raw

history blame contribute delete

3.59 kB

	---
	title: BrowserGym Leaderboard
	emoji: 🏆
	colorFrom: purple
	colorTo: green
	sdk: docker
	pinned: false
	license: mit
	---

	# BrowserGym Leaderboard

	This leaderboard tracks performance of various agents on web navigation tasks.

	## How to Submit Results for New Agents

	### 1. Create Results Directory
	Create a new folder in the `results` directory with your agent's name:
	```bash
	results/
	└── your-agent-name/
	├── README.md
	├── webarena.json
	├── workarena-l1.json
	├── workarena++-l2.json
	├── workarena++-l3.json
	└── miniwob.json
	```


	### 2. Add Agent Details

	Create a `README.md` in your agent's folder with the following details:

	#### Required Information
	- Model Name: Base model used (e.g., GPT-4, Claude-2)
	- Model Architecture: Architecture details and any modifications
	- Input/Output Format: How inputs are processed and outputs generated
	- Training Details: Training configuration if applicable
	- Dataset used
	- Number of training steps
	- Hardware used
	- Training time

	#### Optional Information
	- Paper Link: Link to published paper/preprint if available
	- Code Repository: Link to public code implementation
	- Additional Notes: Any special configurations or requirements
	- License: License information for your agent

	Make sure to organize the information in clear sections using Markdown.

	### 3. Add Benchmark Results

	Create separate JSON files for each benchmark following this format:

	```json
	[
	{
	"agent_name": "your-agent-name",
	"study_id": "unique-study-identifier-from-agentlab",
	"date_time": "YYYY-MM-DD HH:MM:SS",
	"benchmark": "WebArena",
	"score": 0.0,
	"std_err": 0.0,
	"benchmark_specific": "Yes/No",
	"benchmark_tuned": "Yes/No",
	"followed_evaluation_protocol": "Yes/No",
	"reproducible": "Yes/No",
	"comments": "Additional details",
	"original_or_reproduced": "Original"
	}
	]
	```

	Please add all the benchmark files in separate json files named as follows:

	- `webarena.json`
	- `workarena-l1.json`
	- `workarena-l2.json`
	- `workarena-l3.json`
	- `miniwob.json`

	Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename.

	### 4. Submit PR

	1. Open the community tab and press "New Pull Request"
	2. Give it a new title to the PR and follow the steps mentioned
	3. Publish the branch

	## How to Submit Reproducibility Results for Existing Agents

	Open the results file for the agent and benchmark you reproduced the results for.

	### 1. Add reproduced results


	Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`.

	```json
	[
	{
	"agent_name": "your-agent-name",
	"study_id": "unique-study-identifier-from-agentlab",
	"date_time": "YYYY-MM-DD HH:MM:SS",
	"benchmark": "WebArena",
	"score": 0.0,
	"std_err": 0.0,
	"benchmark_specific": "Yes/No",
	"benchmark_tuned": "Yes/No",
	"followed_evaluation_protocol": "Yes/No",
	"reproducible": "Yes/No",
	"comments": "Additional details",
	"original_or_reproduced": "Reproduced"
	}
	]
	```

	### 2. Submit PR

	1. Open the community tab and press "New Pull Request"
	2. Give it a new title to the PR and follow the steps mentioned
	3. Publish the branch

	## License

	MIT