CLI Commands Reference AgentTest provides a comprehensive command-line interface with pytest-like functionality for AI agent testing. This guide covers all available commands and their options.
π Command Overview Command Purpose Common Usage init
Initialize project Set up new testing environment run
Execute tests Run test suites with various options log
View test history Browse past test results compare
Compare results Git-based performance comparison generate
Auto-generate tests AI-powered test creation dashboard
Web interface Visual test monitoring
π agenttest init Initialize a new AgentTest project with configuration and directory structure.
Syntax agenttest init [ PATH] [ OPTIONS]
Parameters Parameter Type Default Description PATH
Path .
Directory to initialize --template, -t
String basic
Configuration template --overwrite
Boolean false
Overwrite existing config
Templates Template Description Best For basic
Standard setup with core evaluators General agent testing langchain
Optimized for LangChain agents LangChain applications llamaindex
Optimized for LlamaIndex LlamaIndex applications
Examples # Initialize in current directory
agenttest init
# Initialize new project
agenttest init ./my-agent-project
# Use LangChain template
agenttest init --template langchain
# Force overwrite existing config
agenttest init --overwrite
Generated Structure project/
βββ .agenttest/
β βββ config.yaml # Main configuration
β βββ results/ # Test results storage
βββ tests/
β βββ __init__.py
β βββ test_example.py # Sample test
βββ .env # Environment variables
π§ͺ agenttest run Execute test suites with comprehensive filtering and output options.
Syntax Core Parameters Parameter Type Default Description --path, -p
Path tests/
Test files or directory --pattern
String test_*.py
File name pattern --verbose, -v
Boolean false
Detailed output --quiet, -q
Boolean false
Minimal output --ci
Boolean false
CI mode (exit on failure)
Output Parameters Parameter Type Default Description --output, -o
Path None Save results to file --log-output
Path None Export detailed logs
Filtering Parameters Parameter Type Default Description --tag, -t
String[] None Run tests with specific tags
Examples Basic Usage # Run all tests
agenttest run
# Run with verbose output
agenttest run --verbose
# Run specific test file
agenttest run --path tests/test_summarization.py
# Run tests matching pattern
agenttest run --pattern "*integration*"
Filtering Tests # Run tests with specific tags
agenttest run --tag summarization --tag quality
# Run tests in specific directory
agenttest run --path tests/integration/
Output and Logging # Save results to JSON
agenttest run --output results.json
# Export detailed logs
agenttest run --log-output debug.log
# Combined output
agenttest run --output results.json --log-output debug.log --verbose
CI/CD Integration # CI mode (exits with error code on failure)
agenttest run --ci --quiet
# Generate reports for CI
agenttest run --ci --output ci-results.json --quiet
Standard Output π§ͺ Running AgentTest suite...
π Test Results Summary:
βββββββββββββββββββββββββββββββββ³ββββββββββ³ββββββββββ³βββββββββββββββ
β Test β Status β Score β Duration β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β test_summarization β β
PASS β 0.873 β 2.45s β
β test_qa_accuracy β β FAIL β 0.654 β 1.23s β
β test_content_generation β β
PASS β 0.912 β 3.67s β
βββββββββββββββββββββββββββββββββ΄ββββββββββ΄ββββββββββ΄βββββββββββββββ
π Overall Results:
β’ Total Tests: 3
β’ Passed: 2 (67%)
β’ Failed: 1 (33%)
β’ Average Score: 0.813
β’ Total Duration: 7.35s
β Failures Detected:
β’ test_qa_accuracy: Score 0.654 below threshold 0.7
- similarity: 0.654 (threshold: 0.7)
- Expected: "The answer is 42"
- Actual: "I think the answer might be 42"
Verbose Output π SESSION: Starting test session (2024-06-26 14:30:15)
π DISCOVERY: Found 3 test files in tests/
π DISCOVERY: Discovered 5 test functions
π TEST_START: test_summarization
π Evaluation Results:
β’ similarity: Score: 0.873, Passed: β
β’ llm_judge: Score: 0.845, Passed: β
β
TEST_PASS: test_summarization (2.45s)
π TEST_START: test_qa_accuracy
π Evaluation Results:
β’ similarity: Score: 0.654, Passed: β
β TEST_FAIL: test_qa_accuracy (1.23s)
π agenttest log View and browse test execution history with git integration.
Syntax Parameters Parameter Type Default Description --limit, -l
Integer 10
Number of runs to show --commit, -c
String None Show results for specific commit --branch, -b
String None Show results for specific branch
Examples # Show last 10 test runs
agenttest log
# Show last 20 runs
agenttest log --limit 20
# Show results for specific commit
agenttest log --commit abc123
# Show results for main branch
agenttest log --branch main
π Test History (last 10 runs):
ββββββββββββββ³ββββββββββββββββββββββ³ββββββββββββββββ³βββββββββββββββ³βββββββββββββββ
β Commit β Timestamp β Branch β Tests β Pass Rate β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β e1c83a6d β 2024-06-26 14:45:12 β main β 5 passed, 0 β 100% β
β 95eadec3 β 2024-06-26 14:38:33 β main β 3 passed, 2 β 60% β
β 7b2af91e β 2024-06-26 12:15:44 β feature-123 β 4 passed, 1 β 80% β
ββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ
π agenttest compare Compare test results between git commits or branches with detailed analysis.
Syntax agenttest compare BASE [ TARGET] [ OPTIONS]
Parameters Parameter Type Default Description BASE
String Required Base commit/branch TARGET
String HEAD
Target commit/branch --metric, -m
String None Focus on specific evaluator --filter, -f
String None Filter tests by name pattern --min-change, -c
Float 0.01
Minimum change threshold --include-unchanged, -u
Boolean false
Include unchanged tests --detailed, -d
Boolean false
Show evaluator-level details --export, -e
Path None Export to JSON file
Examples Basic Comparison # Compare current HEAD with previous commit
agenttest compare abc123
# Compare two specific commits
agenttest compare abc123 def456
# Compare branches
agenttest compare main feature-branch
Filtered Comparison # Focus on similarity evaluator only
agenttest compare abc123 --metric similarity
# Filter tests by name
agenttest compare abc123 --filter "summarization"
# Show only significant changes (>5%)
agenttest compare abc123 --min-change 0 .05
Detailed Analysis # Show evaluator-level details
agenttest compare abc123 --detailed
# Include unchanged tests
agenttest compare abc123 --include-unchanged
# Export full comparison
agenttest compare abc123 --export comparison.json
Complex Filtering # Combine multiple filters
agenttest compare abc123 def456 \
--metric similarity \
--filter "qa" \
--min-change 0 .02 \
--detailed
Standard Comparison π Comparing abc123 β def456
Base: abc123 (2024-06-26T14:38:33)
Target: def456 (2024-06-26T14:45:12)
π Overall Summary Changes:
ββββββββββββββββ³βββββββββ³ββββββββββ³ββββββββββ³ββββββββββββ
β Metric β Base β Target β Change β % Change β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β Pass Rate β 0.75 β 0.85 β +0.100 β +13.3% β
β Average Scoreβ 0.692β 0.751β +0.059 β +8.5% β
ββββββββββββββββ΄βββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββββ
π Test Changes Overview
βββ π Improvements: 2
βββ π Regressions: 1
βββ π New Tests: 0
π Improvements:
β’ test_summarization: score: 0.734 β 0.856 (+0.122)
β’ test_qa_accuracy: FAIL β PASS, score: 0.650 β 0.823 (+0.173)
π Regressions:
β’ test_content_generation: score: 0.891 β 0.734 (-0.157)
Detailed Comparison agenttest compare abc123 --detailed
π Evaluator-Specific Changes:
similarity:
β’ test_summarization: 0.734 β 0.856 (+0.122)
β’ test_qa_accuracy: 0.432 β 0.678 (+0.246)
llm_judge:
β’ test_summarization: 0.823 β 0.867 (+0.044)
β’ test_content_generation: 0.912 β 0.745 (-0.167)
π€ agenttest generate Automatically analyze your code and generate comprehensive test cases with intelligent project structure understanding.
Syntax agenttest generate [ FILE_PATH] [ OPTIONS]
Core Parameters Parameter Type Default Description FILE_PATH
String Required Agent file to analyze and test --count, -c
Integer 5
Number of test cases to generate --format, -f
String python
Output format (python/yaml/json) --output, -o
Path None Save generated tests to file --template, -t
Path None Custom Jinja2 template
Advanced Parameters Parameter Type Default Description --no-llm
Boolean false
Use fallback mode (no LLM) --search-dirs
String None Additional directories to search --include-edge
Boolean true
Include edge case tests --include-error
Boolean true
Include error handling tests
Intelligence Features The generator automatically:
π Analyzes project structure to generate correct imports π― Understands functions and classes to create proper test calls π Generates realistic test data based on parameter names and types π§ͺ Creates multiple test scenarios (basic, edge cases, error handling) ποΈ Handles class instantiation automatically for method testing Examples Basic Generation # Generate tests for a specific file
agenttest generate examples/agents_sample.py
# Generate more test cases
agenttest generate examples/agents_sample.py --count 10
# Generate with specific format
agenttest generate examples/agents_sample.py --format yaml
Advanced Generation # Save to specific file
agenttest generate agents/my_agent.py --output tests/generated_test.py
# Use custom template
agenttest generate agents/my_agent.py --template custom_template.py.j2
# Generate without LLM (fallback mode)
agenttest generate agents/my_agent.py --no-llm --count 3
Multiple Files # Generate for multiple files
agenttest generate examples/*.py --count 3
# Generate for specific patterns
agenttest generate "agents/*_agent.py" --count 2
Generates executable Python test files:
@agent_test (
criteria = [ "execution" , "output_type" , "functionality" ],
tags = [ "basic" , "function" ]
)
def test_handle_customer_query_basic ():
"""Test basic functionality of handle_customer_query"""
input_data = {
"query" : "test query" ,
"customer_type" : "premium" ,
"urgency" : "high"
}
# Automatically generated function call
actual = handle_customer_query ( ** input_data )
return {
"input" : input_data ,
"actual" : actual ,
"evaluation_criteria" : {
"execution" : "Function should execute without errors" ,
"output_type" : "Should return appropriate type"
}
}
agent : agents_sample
description : Generated tests for agents_sample
test_cases :
- name : test_handle_customer_query_basic
description : Test basic functionality of handle_customer_query
function_to_test : handle_customer_query
input_data :
query : 'test query'
customer_type : 'premium'
expected_behavior : Should execute handle_customer_query successfully
{
"agent" : "agents_sample" ,
"description" : "Generated tests for agents_sample" ,
"test_cases" : [
{
"name" : "test_handle_customer_query_basic" ,
"description" : "Test basic functionality of handle_customer_query" ,
"function_to_test" : "handle_customer_query" ,
"input_data" : {
"query" : "test query" ,
"customer_type" : "premium"
}
}
]
}
Generated Test Types Function Tests Basic functionality : Normal operation with typical inputs Edge cases : Empty inputs, boundary values, null conditions Error handling : Invalid inputs and exception scenarios Class Tests Constructor tests : Object creation with proper arguments Method tests : Instance method calls with realistic data Integration tests : Multi-method workflows Customization Custom Templates Create .agenttest/templates/test_template.py.j2
:
"""
Custom test template for {{ agent_name }}.
"""
from agent_test import agent_test
{%- if agent_module_path %}
from {{ agent_module_path }} import *
{%- endif %}
{%- for test_case in test_cases %}
@agent_test(
criteria=[{%- for criterion in test_case.evaluation_criteria.keys() -%}"{{ criterion }}"{%- if not loop.last %}, {% endif -%}{%- endfor -%}],
tags={{ test_case.tags | tojson }}
)
def {{ test_case.name }}():
"""{{ test_case.description }}"""
# Your custom test logic here
pass
{%- endfor %}
Configuration Set generation preferences in .agenttest/config.yaml
:
generation :
default_count : 5
include_edge_cases : true
include_error_handling : true
default_format : python
llm :
provider : openai
model : gpt-4
temperature : 0.7
max_tokens : 3000
Format Description File Extension python
Python test functions .py
yaml
YAML test cases .yaml
json
JSON test data .json
Examples # Generate 5 test cases for an agent
agenttest generate --agent my_agent.py
# Generate 10 tests and save to file
agenttest generate --agent my_agent.py --count 10 --output tests/generated_tests.py
# Generate YAML format tests
agenttest generate --agent my_agent.py --format yaml --output test_cases.yaml
π₯οΈ agenttest dashboard Launch a web-based dashboard for monitoring test results and performance.
Syntax agenttest dashboard [ OPTIONS]
Parameters Parameter Type Default Description --port, -p
Integer 8080
Server port --host
String localhost
Server host
Examples # Start dashboard on default port
agenttest dashboard
# Use custom port
agenttest dashboard --port 3000
# Bind to all interfaces
agenttest dashboard --host 0 .0.0.0 --port 8080
Dashboard Features Test Results Timeline : Visual performance tracking Evaluator Breakdown : Per-evaluator performance analysis Git Integration : Commit-based result comparison Filter and Search : Find specific tests and patterns Export Options : Download results and reports π Advanced Usage Patterns CI/CD Pipeline Integration # .github/workflows/test.yml
- name: Run AgentTest
run: |
agenttest run --ci --output results.json
agenttest compare ${ { github.event.before } } HEAD --export comparison.json
Development Workflow # Quick development loop
agenttest run --path tests/test_new_feature.py --verbose
# Check regression before commit
agenttest compare HEAD~1 HEAD --detailed
# Monitor specific evaluator
agenttest run --verbose | grep similarity
Batch Processing # Run multiple test suites
for suite in unit integration e2e; do
agenttest run --path tests/$suite / --output results-$suite .json
done
# Compare across branches
for branch in main develop feature-123; do
git checkout $branch
agenttest run --quiet --output results-$branch .json
done
# Track performance over time
agenttest log --limit 50 > performance-history.txt
# Generate detailed comparison reports
agenttest compare $( git rev-parse HEAD~10) HEAD \
--detailed \
--export performance-report.json
π§ Global Options These options work with all commands:
Option Description --help
Show command help --version
Show AgentTest version --config
Use custom config file
Examples # Show version
agenttest --version
# Use custom configuration
agenttest run --config /path/to/custom-config.yaml
# Get help for any command
agenttest compare --help