Skip to content

CLI Commands Reference

AgentTest provides a comprehensive command-line interface with pytest-like functionality for AI agent testing. This guide covers all available commands and their options.

πŸ“‹ Command Overview

Command Purpose Common Usage
init Initialize project Set up new testing environment
run Execute tests Run test suites with various options
log View test history Browse past test results
compare Compare results Git-based performance comparison
generate Auto-generate tests AI-powered test creation
dashboard Web interface Visual test monitoring

πŸš€ agenttest init

Initialize a new AgentTest project with configuration and directory structure.

Syntax

agenttest init [PATH] [OPTIONS]

Parameters

Parameter Type Default Description
PATH Path . Directory to initialize
--template, -t String basic Configuration template
--overwrite Boolean false Overwrite existing config

Templates

Template Description Best For
basic Standard setup with core evaluators General agent testing
langchain Optimized for LangChain agents LangChain applications
llamaindex Optimized for LlamaIndex LlamaIndex applications

Examples

# Initialize in current directory
agenttest init

# Initialize new project
agenttest init ./my-agent-project

# Use LangChain template
agenttest init --template langchain

# Force overwrite existing config
agenttest init --overwrite

Generated Structure

project/
β”œβ”€β”€ .agenttest/
β”‚   β”œβ”€β”€ config.yaml         # Main configuration
β”‚   └── results/            # Test results storage
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── test_example.py     # Sample test
└── .env                    # Environment variables

πŸ§ͺ agenttest run

Execute test suites with comprehensive filtering and output options.

Syntax

agenttest run [OPTIONS]

Core Parameters

Parameter Type Default Description
--path, -p Path tests/ Test files or directory
--pattern String test_*.py File name pattern
--verbose, -v Boolean false Detailed output
--quiet, -q Boolean false Minimal output
--ci Boolean false CI mode (exit on failure)

Output Parameters

Parameter Type Default Description
--output, -o Path None Save results to file
--log-output Path None Export detailed logs

Filtering Parameters

Parameter Type Default Description
--tag, -t String[] None Run tests with specific tags

Examples

Basic Usage

# Run all tests
agenttest run

# Run with verbose output
agenttest run --verbose

# Run specific test file
agenttest run --path tests/test_summarization.py

# Run tests matching pattern
agenttest run --pattern "*integration*"

Filtering Tests

# Run tests with specific tags
agenttest run --tag summarization --tag quality

# Run tests in specific directory
agenttest run --path tests/integration/

Output and Logging

# Save results to JSON
agenttest run --output results.json

# Export detailed logs
agenttest run --log-output debug.log

# Combined output
agenttest run --output results.json --log-output debug.log --verbose

CI/CD Integration

# CI mode (exits with error code on failure)
agenttest run --ci --quiet

# Generate reports for CI
agenttest run --ci --output ci-results.json --quiet

Output Format

Standard Output

πŸ§ͺ Running AgentTest suite...

πŸ“Š Test Results Summary:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Test                          ┃ Status  ┃ Score   ┃ Duration     ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━┩
β”‚ test_summarization            β”‚ βœ… PASS β”‚ 0.873   β”‚ 2.45s        β”‚
β”‚ test_qa_accuracy              β”‚ ❌ FAIL β”‚ 0.654   β”‚ 1.23s        β”‚
β”‚ test_content_generation       β”‚ βœ… PASS β”‚ 0.912   β”‚ 3.67s        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ˆ Overall Results:
β€’ Total Tests: 3
β€’ Passed: 2 (67%)
β€’ Failed: 1 (33%)
β€’ Average Score: 0.813
β€’ Total Duration: 7.35s

❌ Failures Detected:
β€’ test_qa_accuracy: Score 0.654 below threshold 0.7
  - similarity: 0.654 (threshold: 0.7)
  - Expected: "The answer is 42"
  - Actual: "I think the answer might be 42"

Verbose Output

πŸ” SESSION: Starting test session (2024-06-26 14:30:15)
πŸ” DISCOVERY: Found 3 test files in tests/
πŸ” DISCOVERY: Discovered 5 test functions

πŸ” TEST_START: test_summarization
  πŸ“Š Evaluation Results:
    β€’ similarity: Score: 0.873, Passed: βœ…
    β€’ llm_judge: Score: 0.845, Passed: βœ…
βœ… TEST_PASS: test_summarization (2.45s)

πŸ” TEST_START: test_qa_accuracy
  πŸ“Š Evaluation Results:
    β€’ similarity: Score: 0.654, Passed: ❌
❌ TEST_FAIL: test_qa_accuracy (1.23s)

πŸ“š agenttest log

View and browse test execution history with git integration.

Syntax

agenttest log [OPTIONS]

Parameters

Parameter Type Default Description
--limit, -l Integer 10 Number of runs to show
--commit, -c String None Show results for specific commit
--branch, -b String None Show results for specific branch

Examples

# Show last 10 test runs
agenttest log

# Show last 20 runs
agenttest log --limit 20

# Show results for specific commit
agenttest log --commit abc123

# Show results for main branch
agenttest log --branch main

Output Format

πŸ“š Test History (last 10 runs):

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Commit     ┃ Timestamp           ┃ Branch        ┃ Tests         ┃ Pass Rate     ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
β”‚ e1c83a6d   β”‚ 2024-06-26 14:45:12 β”‚ main          β”‚ 5 passed, 0   β”‚ 100%          β”‚
β”‚ 95eadec3   β”‚ 2024-06-26 14:38:33 β”‚ main          β”‚ 3 passed, 2   β”‚ 60%           β”‚
β”‚ 7b2af91e   β”‚ 2024-06-26 12:15:44 β”‚ feature-123   β”‚ 4 passed, 1   β”‚ 80%           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ agenttest compare

Compare test results between git commits or branches with detailed analysis.

Syntax

agenttest compare BASE [TARGET] [OPTIONS]

Parameters

Parameter Type Default Description
BASE String Required Base commit/branch
TARGET String HEAD Target commit/branch
--metric, -m String None Focus on specific evaluator
--filter, -f String None Filter tests by name pattern
--min-change, -c Float 0.01 Minimum change threshold
--include-unchanged, -u Boolean false Include unchanged tests
--detailed, -d Boolean false Show evaluator-level details
--export, -e Path None Export to JSON file

Examples

Basic Comparison

# Compare current HEAD with previous commit
agenttest compare abc123

# Compare two specific commits
agenttest compare abc123 def456

# Compare branches
agenttest compare main feature-branch

Filtered Comparison

# Focus on similarity evaluator only
agenttest compare abc123 --metric similarity

# Filter tests by name
agenttest compare abc123 --filter "summarization"

# Show only significant changes (>5%)
agenttest compare abc123 --min-change 0.05

Detailed Analysis

# Show evaluator-level details
agenttest compare abc123 --detailed

# Include unchanged tests
agenttest compare abc123 --include-unchanged

# Export full comparison
agenttest compare abc123 --export comparison.json

Complex Filtering

# Combine multiple filters
agenttest compare abc123 def456 \
  --metric similarity \
  --filter "qa" \
  --min-change 0.02 \
  --detailed

Output Format

Standard Comparison

πŸ“Š Comparing abc123 β†’ def456
Base: abc123 (2024-06-26T14:38:33)
Target: def456 (2024-06-26T14:45:12)

πŸ“Š Overall Summary Changes:
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ Metric       ┃   Base ┃  Target ┃  Change ┃ % Change  ┃
┑━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
β”‚ Pass Rate    β”‚   0.75 β”‚    0.85 β”‚  +0.100 β”‚    +13.3% β”‚
β”‚ Average Scoreβ”‚   0.692β”‚    0.751β”‚  +0.059 β”‚     +8.5% β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ” Test Changes Overview
β”œβ”€β”€ πŸ“ˆ Improvements: 2
β”œβ”€β”€ πŸ“‰ Regressions: 1
└── πŸ†• New Tests: 0

πŸ“ˆ Improvements:
  β€’ test_summarization: score: 0.734 β†’ 0.856 (+0.122)
  β€’ test_qa_accuracy: FAIL β†’ PASS, score: 0.650 β†’ 0.823 (+0.173)

πŸ“‰ Regressions:
  β€’ test_content_generation: score: 0.891 β†’ 0.734 (-0.157)

Detailed Comparison

agenttest compare abc123 --detailed
πŸ” Evaluator-Specific Changes:

  similarity:
    β€’ test_summarization: 0.734 β†’ 0.856 (+0.122)
    β€’ test_qa_accuracy: 0.432 β†’ 0.678 (+0.246)

  llm_judge:
    β€’ test_summarization: 0.823 β†’ 0.867 (+0.044)
    β€’ test_content_generation: 0.912 β†’ 0.745 (-0.167)

πŸ€– agenttest generate

Automatically analyze your code and generate comprehensive test cases with intelligent project structure understanding.

Syntax

agenttest generate [FILE_PATH] [OPTIONS]

Core Parameters

Parameter Type Default Description
FILE_PATH String Required Agent file to analyze and test
--count, -c Integer 5 Number of test cases to generate
--format, -f String python Output format (python/yaml/json)
--output, -o Path None Save generated tests to file
--template, -t Path None Custom Jinja2 template

Advanced Parameters

Parameter Type Default Description
--no-llm Boolean false Use fallback mode (no LLM)
--search-dirs String None Additional directories to search
--include-edge Boolean true Include edge case tests
--include-error Boolean true Include error handling tests

Intelligence Features

The generator automatically:

  • πŸ” Analyzes project structure to generate correct imports
  • 🎯 Understands functions and classes to create proper test calls
  • πŸ“ Generates realistic test data based on parameter names and types
  • πŸ§ͺ Creates multiple test scenarios (basic, edge cases, error handling)
  • πŸ—οΈ Handles class instantiation automatically for method testing

Examples

Basic Generation

# Generate tests for a specific file
agenttest generate examples/agents_sample.py

# Generate more test cases
agenttest generate examples/agents_sample.py --count 10

# Generate with specific format
agenttest generate examples/agents_sample.py --format yaml

Advanced Generation

# Save to specific file
agenttest generate agents/my_agent.py --output tests/generated_test.py

# Use custom template
agenttest generate agents/my_agent.py --template custom_template.py.j2

# Generate without LLM (fallback mode)
agenttest generate agents/my_agent.py --no-llm --count 3

Multiple Files

# Generate for multiple files
agenttest generate examples/*.py --count 3

# Generate for specific patterns
agenttest generate "agents/*_agent.py" --count 2

Output Formats

Python Format (Default)

Generates executable Python test files:

@agent_test(
    criteria=["execution", "output_type", "functionality"],
    tags=["basic", "function"]
)
def test_handle_customer_query_basic():
    """Test basic functionality of handle_customer_query"""
    input_data = {
        "query": "test query",
        "customer_type": "premium",
        "urgency": "high"
    }

    # Automatically generated function call
    actual = handle_customer_query(**input_data)

    return {
        "input": input_data,
        "actual": actual,
        "evaluation_criteria": {
            "execution": "Function should execute without errors",
            "output_type": "Should return appropriate type"
        }
    }

YAML Format

agent: agents_sample
description: Generated tests for agents_sample
test_cases:
  - name: test_handle_customer_query_basic
    description: Test basic functionality of handle_customer_query
    function_to_test: handle_customer_query
    input_data:
      query: 'test query'
      customer_type: 'premium'
    expected_behavior: Should execute handle_customer_query successfully

JSON Format

{
  "agent": "agents_sample",
  "description": "Generated tests for agents_sample",
  "test_cases": [
    {
      "name": "test_handle_customer_query_basic",
      "description": "Test basic functionality of handle_customer_query",
      "function_to_test": "handle_customer_query",
      "input_data": {
        "query": "test query",
        "customer_type": "premium"
      }
    }
  ]
}

Generated Test Types

Function Tests

  • Basic functionality: Normal operation with typical inputs
  • Edge cases: Empty inputs, boundary values, null conditions
  • Error handling: Invalid inputs and exception scenarios

Class Tests

  • Constructor tests: Object creation with proper arguments
  • Method tests: Instance method calls with realistic data
  • Integration tests: Multi-method workflows

Customization

Custom Templates

Create .agenttest/templates/test_template.py.j2:

"""
Custom test template for {{ agent_name }}.
"""

from agent_test import agent_test
{%- if agent_module_path %}
from {{ agent_module_path }} import *
{%- endif %}

{%- for test_case in test_cases %}
@agent_test(
    criteria=[{%- for criterion in test_case.evaluation_criteria.keys() -%}"{{ criterion }}"{%- if not loop.last %}, {% endif -%}{%- endfor -%}],
    tags={{ test_case.tags | tojson }}
)
def {{ test_case.name }}():
    """{{ test_case.description }}"""
    # Your custom test logic here
    pass
{%- endfor %}

Configuration

Set generation preferences in .agenttest/config.yaml:

generation:
  default_count: 5
  include_edge_cases: true
  include_error_handling: true
  default_format: python

llm:
  provider: openai
  model: gpt-4
  temperature: 0.7
  max_tokens: 3000

Formats

Format Description File Extension
python Python test functions .py
yaml YAML test cases .yaml
json JSON test data .json

Examples

# Generate 5 test cases for an agent
agenttest generate --agent my_agent.py

# Generate 10 tests and save to file
agenttest generate --agent my_agent.py --count 10 --output tests/generated_tests.py

# Generate YAML format tests
agenttest generate --agent my_agent.py --format yaml --output test_cases.yaml

πŸ–₯️ agenttest dashboard

Launch a web-based dashboard for monitoring test results and performance.

Syntax

agenttest dashboard [OPTIONS]

Parameters

Parameter Type Default Description
--port, -p Integer 8080 Server port
--host String localhost Server host

Examples

# Start dashboard on default port
agenttest dashboard

# Use custom port
agenttest dashboard --port 3000

# Bind to all interfaces
agenttest dashboard --host 0.0.0.0 --port 8080

Dashboard Features

  • Test Results Timeline: Visual performance tracking
  • Evaluator Breakdown: Per-evaluator performance analysis
  • Git Integration: Commit-based result comparison
  • Filter and Search: Find specific tests and patterns
  • Export Options: Download results and reports

🌟 Advanced Usage Patterns

CI/CD Pipeline Integration

# .github/workflows/test.yml
- name: Run AgentTest
  run: |
    agenttest run --ci --output results.json
    agenttest compare ${{ github.event.before }} HEAD --export comparison.json

Development Workflow

# Quick development loop
agenttest run --path tests/test_new_feature.py --verbose

# Check regression before commit
agenttest compare HEAD~1 HEAD --detailed

# Monitor specific evaluator
agenttest run --verbose | grep similarity

Batch Processing

# Run multiple test suites
for suite in unit integration e2e; do
  agenttest run --path tests/$suite/ --output results-$suite.json
done

# Compare across branches
for branch in main develop feature-123; do
  git checkout $branch
  agenttest run --quiet --output results-$branch.json
done

Performance Monitoring

# Track performance over time
agenttest log --limit 50 > performance-history.txt

# Generate detailed comparison reports
agenttest compare $(git rev-parse HEAD~10) HEAD \
  --detailed \
  --export performance-report.json

πŸ”§ Global Options

These options work with all commands:

Option Description
--help Show command help
--version Show AgentTest version
--config Use custom config file

Examples

# Show version
agenttest --version

# Use custom configuration
agenttest run --config /path/to/custom-config.yaml

# Get help for any command
agenttest compare --help