Skip to content

refreshdotdev/webarena-environments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebArena Environments

Self-contained web application environments for evaluating browser-use agents. Each app is a standalone HTML/CSS/JS application served by a small Python HTTP server.

Based on WebArena-Infinity.

Quick Start

# Install dependencies
bash setup.sh

# Run a single app
cd apps/gmail && python3 server.py --port 8000

# Run all apps with a hub page
python3 serve_all.py --port 9000 --demo

Open http://localhost:8000 (single app) or http://localhost:9000 (hub) in your browser.

Available Environments

App Description
elation-clinical-records EHR clinical records management
elation-patient-communication EHR patient messaging
elation-prescriptions EHR prescription management
figma-slides Slide deck editor (Figma-style)
figma-text-and-typography Text/typography editor (Figma-style)
gitlab-plan-and-track Project planning and issue tracking
gmail Email client
gmail-accounts-and-contacts Gmail account and contacts management
google-sheets Spreadsheet editor with formulas, charts, and multi-sheet workbooks
handshake-career-exploration Career exploration platform
linear-account-settings Project management account settings
paypal-my-wallet Digital wallet management
superhuman-general Email client (Superhuman-style)
xero-invoicing Invoice management

Running Agent Evaluations

# Run a single task
uv run python evaluation/run_eval_parallel.py \
    --model gpt \
    --task-id task_e1 \
    --workers 1 \
    --web-app apps/gmail

# Run all easy tasks with visible browser
uv run python evaluation/run_eval_parallel.py \
    --model gpt \
    --difficulty easy \
    --workers 1 \
    --web-app apps/google-sheets \
    --headed

Supported Models

Flag Model API Key
gpt GPT-4o OPENAI_API_KEY
gemini-flash Gemini Flash 3 GOOGLE_API_KEY
gemini-pro Gemini Pro 3 GOOGLE_API_KEY
claude Claude Sonnet 4.6 ANTHROPIC_API_KEY

Test Mode

Add --test-mode when launching a server to get an in-browser test panel for manually running and verifying tasks:

cd apps/google-sheets && python3 server.py --port 8000 --test-mode

About

WebArena-Infinity environments rendered as a Python webapp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors