CLI

Configuration

Write eval.yaml — servers, agents, scenarios, assertions, and auth.

Structure Overview

An eval file has three required top-level keys: servers, agents, and scenarios.

eval.yaml skeleton

servers:
  - id: my-server
    transport: http
    url: http://localhost:3000/mcp

agents:
  - id: claude
    provider: anthropic
    model: claude-haiku-4-5-20251001
    temperature: 0

scenarios:
  - id: basic-test
    servers: [my-server]
    prompt: Describe what you want the agent to do.
    eval:
      tool_constraints:
        required_tools: [tool_name]

Servers

Each server entry needs an id, a transport (http for HTTP/SSE), and the URL of the MCP endpoint.

If the endpoint requires a bearer token, add a token field. Use a literal string for a hardcoded value or a $ENV_VAR reference to read from the environment.

server with bearer token

servers:
  - id: my-server
    transport: http
    url: http://localhost:3000/mcp
    token: "my-static-token"          # literal value

  - id: prod-server
    transport: http
    url: https://api.example.com/mcp
    token: $SERVER_API_TOKEN           # reads from env

Agents

Each agent entry needs an id, a provider, and a model. Supported providers are anthropic, openai, and azure.

temperature defaults to 0. Lower values produce more deterministic results which is generally better for eval consistency.

agents

agents:
  - id: claude
    provider: anthropic
    model: claude-haiku-4-5-20251001
    temperature: 0

  - id: gpt4o
    provider: openai
    model: gpt-4o
    temperature: 0

  - id: azure-gpt
    provider: azure
    model: gpt-4o
    temperature: 0

Scenarios

Each scenario has an id, a list of servers to give the agent access to, a prompt describing the task, and an eval block with assertions.

The agent field is optional — when omitted all agents in the config run the scenario.

scenario

scenarios:
  - id: weather-lookup
    servers: [weather-server]
    prompt: What is the current weather in Amsterdam?
    eval:
      tool_constraints:
        required_tools: [get_weather]
        forbidden_tools: [send_email]
      response_assertions:
        - type: regex
          pattern: "Amsterdam"
        - type: contains
          value: "temperature"

Assertions

Two types of assertions are available in the eval block.

tool_constraints.required_tools — list of tool names the agent MUST call.
tool_constraints.forbidden_tools — list of tool names the agent MUST NOT call.
response_assertions type: regex — the agent response must match the regular expression in pattern.
response_assertions type: contains — the agent response must contain the exact string in value.

Reusable Refs

Use $ref to reference a server or agent definition from a separate file instead of repeating it across configs.

servers.yaml (shared library file)

servers:
  - id: my-server
    transport: http
    url: http://localhost:3000/mcp

eval.yaml using a library ref

servers:
  - $ref: servers.yaml#my-server

agents:
  - id: claude
    provider: anthropic
    model: claude-haiku-4-5-20251001

scenarios:
  - id: basic-test
    servers: [my-server]
    prompt: Complete the task.

Library Files

A library is a directory of shared agents.yaml and servers.yaml files loaded by mcplab at startup. Library items are available to all eval configs without explicit $ref — you reference them by id.

Pass --libraries-dir when starting mcplab app to point it at a library directory. See the App / Library docs for managing library content through the UI.

agents.yaml (library file)

agents:
  - id: claude-haiku
    provider: anthropic
    model: claude-haiku-4-5-20251001
    temperature: 0

  - id: gpt4o-mini
    provider: openai
    model: gpt-4o-mini
    temperature: 0

using a library agent in eval.yaml

# No agents block needed — claude-haiku comes from the library
scenarios:
  - id: basic-test
    agent: claude-haiku
    servers: [my-server]
    prompt: Complete the task.