Configuration Guide¶
AgentTest uses YAML-based configuration files to define evaluators, LLM providers, testing parameters, and more. This guide covers all configuration options and best practices.
๐ Configuration File Location¶
AgentTest looks for configuration in the following order:
.agenttest/config.yaml
in current directory.agenttest/config.yaml
in parent directories (walking up)- Custom path specified with
--config
flag
๐ Basic Configuration Structure¶
version: '1.0'
project_name: 'My Agent Project'
# LLM Configuration
llm:
provider: 'openai'
model: 'gpt-3.5-turbo'
temperature: 0.0
# Evaluator Configurations
evaluators:
- name: 'similarity'
type: 'string_similarity'
config:
method: 'cosine'
threshold: 0.8
weight: 1.0
enabled: true
# Testing Configuration
testing:
test_dirs: ['tests']
test_patterns: ['test_*.py']
parallel: false
timeout: 300
# Logging Configuration
logging:
level: 'INFO'
git_aware: true
results_dir: '.agenttest/results'
๐ค LLM Configuration¶
Configure language model providers for LLM judge evaluators.
Basic LLM Config¶
llm:
provider: 'openai' # openai, anthropic, gemini
model: 'gpt-3.5-turbo' # Model name
api_key: null # Optional, uses env vars by default
temperature: 0.0 # Generation temperature
max_tokens: null # Max tokens (optional)
Provider-Specific Configurations¶
OpenAI¶
llm:
provider: 'openai'
model: 'gpt-4' # gpt-3.5-turbo, gpt-4, gpt-4-turbo
api_key: ${OPENAI_API_KEY} # Environment variable
api_base: null # Custom API base URL
temperature: 0.0
max_tokens: 1000
Supported Models:
gpt-3.5-turbo
- Fast, cost-effectivegpt-4
- High quality, slowergpt-4-turbo
- Latest GPT-4 variantgpt-4o
- Optimized GPT-4
Anthropic¶
llm:
provider: 'anthropic'
model: 'claude-3-sonnet-20240229'
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.0
max_tokens: 1000
Supported Models:
claude-3-haiku-20240307
- Fast, lightweightclaude-3-sonnet-20240229
- Balanced performanceclaude-3-opus-20240229
- Highest capability
Google Gemini¶
llm:
provider: 'gemini'
model: 'gemini-pro'
api_key: ${GOOGLE_API_KEY}
temperature: 0.0
max_output_tokens: 1000
Supported Models:
gemini-pro
- General purposegemini-1.5-pro
- Latest versiongemini-1.5-flash
- Fast inference
Environment Variables¶
AgentTest automatically resolves API keys from environment variables:
Provider | Environment Variables |
---|---|
OpenAI | OPENAI_API_KEY |
Anthropic | ANTHROPIC_API_KEY |
GOOGLE_API_KEY , GEMINI_API_KEY |
๐ Evaluator Configuration¶
Configure multiple evaluators with specific settings and weights.
Evaluator Structure¶
evaluators:
- name: 'evaluator_name' # Unique identifier
type: 'evaluator_type' # Implementation type
config: # Evaluator-specific configuration
key: value
weight: 1.0 # Weight in overall score (0-1)
enabled: true # Enable/disable evaluator
String Similarity Evaluator¶
evaluators:
- name: 'similarity'
type: 'string_similarity'
config:
method: 'cosine' # cosine, levenshtein, jaccard, exact
threshold: 0.8 # Pass threshold (0-1)
weight: 1.0
enabled: true
Method Options:
Method | Description | Best For | Performance |
---|---|---|---|
cosine | TF-IDF cosine similarity | General text comparison | Medium |
levenshtein | Edit distance | Exact matching with typos | Fast |
jaccard | Set intersection/union | Keyword similarity | Fast |
exact | Exact string match | Precise outputs | Very Fast |
LLM Judge Evaluator¶
evaluators:
- name: 'llm_judge'
type: 'llm_as_judge'
config:
provider: 'openai' # Override global LLM config
model: 'gpt-4'
criteria: ['accuracy', 'relevance', 'clarity']
temperature: 0.0
threshold: 0.7 # Pass threshold
weight: 1.0
enabled: true
Built-in Criteria:
accuracy
- Factual correctnessrelevance
- Topic relevanceclarity
- Communication claritycreativity
- Original thinkinghelpfulness
- Practical utilityconciseness
- Brevitycompleteness
- Comprehensive coverage
Metrics Evaluator¶
evaluators:
- name: 'metrics'
type: 'nlp_metrics'
config:
rouge:
variants: ['rouge1', 'rouge2', 'rougeL']
min_score: 0.4
bleu:
n_grams: [1, 2, 3, 4]
min_score: 0.3
meteor:
min_score: 0.5
weight: 1.0
enabled: true
Contains Evaluator¶
evaluators:
- name: 'contains'
type: 'contains_check'
config:
case_sensitive: false # Case-sensitive matching
partial_match: true # Allow partial word matches
weight: 1.0
enabled: true
Regex Evaluator¶
evaluators:
- name: 'regex'
type: 'regex_pattern'
config:
flags: ['IGNORECASE'] # Regex flags
weight: 1.0
enabled: true
๐งช Testing Configuration¶
Configure test discovery, execution, and behavior.
testing:
test_dirs: ['tests', 'integration'] # Directories to search
test_patterns: ['test_*.py', '*_test.py'] # File patterns
parallel: false # Parallel execution
timeout: 300 # Test timeout (seconds)
retry_count: 0 # Retry failed tests
Advanced Testing Options¶
testing:
test_dirs:
- 'tests/unit'
- 'tests/integration'
- 'tests/e2e'
test_patterns:
- 'test_*.py'
- '*_test.py'
- 'test_*.yaml' # YAML test files
parallel: true # Enable parallel execution
max_workers: 4 # Number of parallel workers
timeout: 600 # Longer timeout for complex tests
retry_count: 2 # Retry failed tests up to 2 times
retry_delay: 1.0 # Delay between retries (seconds)
# Test filtering
include_tags: ['smoke', 'regression']
exclude_tags: ['slow', 'experimental']
# Test environment
environment:
TEST_MODE: 'true'
DEBUG_LEVEL: 'INFO'
๐ Logging Configuration¶
Configure logging behavior and output formats.
logging:
level: 'INFO' # DEBUG, INFO, WARNING, ERROR
file: null # Log file path (optional)
git_aware: true # Include git information
results_dir: '.agenttest/results' # Results storage directory
format: 'rich' # Output format
# Advanced logging options
console_output: true # Enable console output
json_logs: false # JSON formatted logs
log_evaluations: true # Log individual evaluations
performance_tracking: true # Track performance metrics
Log Levels¶
Level | Description | Use Case |
---|---|---|
DEBUG | Detailed execution info | Development, debugging |
INFO | General information | Normal operation |
WARNING | Warning messages | Production monitoring |
ERROR | Error messages only | Production, CI/CD |
๐ Agent Configuration¶
Configure specific agents for testing (optional).
agents:
- name: 'chatbot'
type: 'langchain'
path: 'agents/chatbot.py'
config:
llm_provider: 'openai'
memory_type: 'buffer'
- name: 'summarizer'
type: 'function'
path: 'agents/summarizer.py'
config:
max_length: 100
style: 'concise'
๐ Plugin Configuration¶
Configure plugins and extensions.
plugins:
- 'custom_evaluators' # Custom evaluator modules
- 'langchain_integration' # LangChain support
- 'dashboard_extensions' # Dashboard plugins
custom_evaluators:
sentiment: 'evaluators.sentiment.SentimentEvaluator'
toxicity: 'evaluators.safety.ToxicityEvaluator'
๐ Environment-Specific Configurations¶
Development Configuration¶
# .agenttest/config.dev.yaml
version: '1.0'
project_name: 'Agent Project (Development)'
llm:
provider: 'openai'
model: 'gpt-3.5-turbo' # Faster model for development
temperature: 0.1
evaluators:
- name: 'similarity'
config:
threshold: 0.7 # Lower threshold for development
testing:
timeout: 120 # Shorter timeout
retry_count: 0 # No retries in development
logging:
level: 'DEBUG' # Verbose logging
console_output: true
Production Configuration¶
# .agenttest/config.prod.yaml
version: '1.0'
project_name: 'Agent Project (Production)'
llm:
provider: 'openai'
model: 'gpt-4' # Higher quality model
temperature: 0.0
evaluators:
- name: 'similarity'
config:
threshold: 0.85 # Higher threshold for production
- name: 'llm_judge'
config:
criteria: ['accuracy', 'safety', 'relevance']
testing:
timeout: 600 # Longer timeout
retry_count: 2 # Retry failed tests
parallel: true # Parallel execution
logging:
level: 'INFO' # Standard logging
json_logs: true # Structured logs
git_aware: true
CI/CD Configuration¶
# .agenttest/config.ci.yaml
version: '1.0'
project_name: 'Agent Project (CI/CD)'
llm:
provider: 'openai'
model: 'gpt-3.5-turbo' # Cost-effective for CI
evaluators:
- name: 'similarity'
config:
threshold: 0.8
testing:
timeout: 300
parallel: true
max_workers: 2 # Limited parallelism in CI
logging:
level: 'WARNING' # Minimal logging
console_output: false # No console output in CI
json_logs: true
results_dir: 'ci-results'
๐ Configuration Templates¶
Basic Template¶
version: '1.0'
project_name: 'Basic Agent Testing'
llm:
provider: 'openai'
model: 'gpt-3.5-turbo'
temperature: 0.0
evaluators:
- name: 'similarity'
type: 'string_similarity'
config:
threshold: 0.8
weight: 1.0
enabled: true
testing:
test_dirs: ['tests']
test_patterns: ['test_*.py']
timeout: 300
logging:
level: 'INFO'
git_aware: true
Comprehensive Template¶
version: '1.0'
project_name: 'Comprehensive Agent Testing'
llm:
provider: 'openai'
model: 'gpt-4'
temperature: 0.0
evaluators:
- name: 'similarity'
type: 'string_similarity'
config:
method: 'cosine'
threshold: 0.8
weight: 0.3
enabled: true
- name: 'llm_judge'
type: 'llm_as_judge'
config:
criteria: ['accuracy', 'relevance', 'clarity']
threshold: 0.7
weight: 0.4
enabled: true
- name: 'contains'
type: 'contains_check'
config:
case_sensitive: false
weight: 0.1
enabled: true
- name: 'metrics'
type: 'nlp_metrics'
config:
rouge:
variants: ['rouge1', 'rougeL']
min_score: 0.4
weight: 0.2
enabled: true
testing:
test_dirs: ['tests', 'integration']
test_patterns: ['test_*.py', '*_test.py']
parallel: true
timeout: 600
retry_count: 1
logging:
level: 'INFO'
git_aware: true
performance_tracking: true
json_logs: true
๐ง Configuration Validation¶
AgentTest validates configuration files on startup. Common validation errors:
Required Fields¶
version
- Configuration versionllm.provider
- LLM providerllm.model
- Model name
Valid Values¶
llm.provider
:openai
,anthropic
,gemini
logging.level
:DEBUG
,INFO
,WARNING
,ERROR
evaluator.weight
: 0.0 to 1.0
Example Validation Error¶
โ Configuration Error:
- llm.provider: 'invalid_provider' is not a valid provider
- evaluators[0].weight: 1.5 is greater than maximum allowed value 1.0
- testing.timeout: must be a positive integer
๐ Configuration Inheritance¶
You can extend configurations using inheritance:
# base-config.yaml
version: "1.0"
llm:
provider: "openai"
temperature: 0.0
# specific-config.yaml
extends: "base-config.yaml"
project_name: "Specific Project"
llm:
model: "gpt-4" # Overrides base config
๐ ๏ธ Configuration Best Practices¶
1. Environment-Specific Configs¶
- Separate configs for dev/staging/prod
- Use environment variables for secrets
- Lower thresholds in development
2. Evaluator Selection¶
- Start with similarity evaluator
- Add LLM judge for subjective quality
- Use metrics for specific NLP tasks
- Combine multiple evaluators for robustness
3. Performance Optimization¶
- Use faster models in CI/CD
- Enable parallel execution for large test suites
- Adjust timeouts based on test complexity
4. Security¶
- Never commit API keys
- Use environment variables
- Restrict permissions in production
5. Monitoring¶
- Enable git integration
- Use structured logging in production
- Track performance metrics over time
๐ Related Documentation¶
- Installation - Setting up configuration
- Evaluators - Detailed evaluator options
- CLI Commands - Using configuration with commands
- Git Integration - Git-aware configuration