App
Running Evaluations
Launch and monitor evaluations from the web UI.
Select a Config
Open Run Evaluation from the sidebar. The page lists all eval configs found in the evals directory. Pick one to load its scenarios and agents.

Choose Agents
The agent picker shows agents defined in the selected config plus any agents loaded from the library. Select one or more agents — each selected agent runs every scenario.
Set Variance Runs
Increase the run count to execute each scenario multiple times. This surfaces consistency issues — an agent that passes 3 out of 5 runs on the same prompt is less reliable than one that passes 5 out of 5.
Launch and Monitor
Hit Run to start the evaluation. The page shows live progress as scenarios complete. When all scenarios finish, the results are saved and you can navigate to the Result Detail.
