CLI
Running Evaluations
The mcplab run command and all its options.
Basic Run
Point mcplab at your eval config to run all scenarios.
mcplab run -c eval.yamlFilter Scenarios
Run a single scenario by its ID using -s. Pass the flag multiple times to run several.
mcplab run -c eval.yaml -s basic-testmcplab run -c eval.yaml -s test-one -s test-twoSelect Agents
By default all agents defined in the config are used. Narrow the selection with --agents or expand to include all agents defined in the library with --agents-all.
mcplab run -c eval.yaml --agents claude,gpt4omcplab run -c eval.yaml --agents-allVariance Runs
Run each scenario multiple times to measure consistency. The -n flag sets the number of runs per scenario. Results include a pass rate across all runs.
mcplab run -c eval.yaml -n 5Interactive Mode
Interactive mode prompts you to pick a config and scenarios at the terminal instead of specifying them as flags. Useful for ad-hoc runs during development.
mcplab run --interactiveAnnotate and Organise Runs
Add a human-readable note to a run for easier identification in reports and the App. Change the output directory with --runs-dir.
mcplab run -c eval.yaml --run-note "after refactor"mcplab run -c eval.yaml --runs-dir ./my-runsExit Codes
mcplab run exits 0 when all scenarios pass and non-zero when any scenario fails. Use this in CI to fail a pipeline on a regression.