CLI

Running Evaluations

The mcplab run command and all its options.

Basic Run

Point mcplab at your eval config to run all scenarios.

run all scenarios

mcplab run -c eval.yaml

Filter Scenarios

Run a single scenario by its ID using -s. Pass the flag multiple times to run several.

single scenario

mcplab run -c eval.yaml -s basic-test

multiple scenarios

mcplab run -c eval.yaml -s test-one -s test-two

Select Agents

By default all agents defined in the config are used. Narrow the selection with --agents or expand to include all agents defined in the library with --agents-all.

specific agents

mcplab run -c eval.yaml --agents claude,gpt4o

all agents (config + library)

mcplab run -c eval.yaml --agents-all

Variance Runs

Run each scenario multiple times to measure consistency. The -n flag sets the number of runs per scenario. Results include a pass rate across all runs.

5 runs per scenario

mcplab run -c eval.yaml -n 5

Interactive Mode

Interactive mode prompts you to pick a config and scenarios at the terminal instead of specifying them as flags. Useful for ad-hoc runs during development.

interactive

mcplab run --interactive

Annotate and Organise Runs

Add a human-readable note to a run for easier identification in reports and the App. Change the output directory with --runs-dir.

annotated run

mcplab run -c eval.yaml --run-note "after refactor"

custom output dir

mcplab run -c eval.yaml --runs-dir ./my-runs

Exit Codes

mcplab run exits 0 when all scenarios pass and non-zero when any scenario fails. Use this in CI to fail a pipeline on a regression.