Git Integration¶
AgentTest provides powerful git integration features that track test performance across commits, branches, and time. This enables regression detection, performance monitoring, and comparison workflows.
π Overview¶
Git integration automatically:
- Tracks Performance: Records test results with git metadata
- Enables Comparisons: Compare results between commits/branches
- Detects Regressions: Identify performance degradations
- Provides History: Browse test results over time
- Supports Workflows: Integrate with CI/CD pipelines
π Automatic Test Tracking¶
Every test run is automatically logged with git information:
# Run tests - automatically logged with git metadata
agenttest run
# Results are stored with:
# - Commit hash
# - Branch name
# - Timestamp
# - Test outcomes
# - Evaluation scores
Stored Information¶
Each test run includes:
Field | Description | Example |
---|---|---|
commit_hash | Full commit SHA | e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c |
commit_hash_short | Short commit SHA | e1c83a6d |
branch | Git branch name | main , feature-123 |
timestamp | Execution time | 2024-06-26T14:45:12.789012 |
author | Commit author | john.doe@example.com |
message | Commit message | Fix: Improve summarization accuracy |
test_results | Individual test outcomes | Scores, pass/fail status |
summary | Overall test summary | Pass rate, average score |
π Viewing Test History¶
Basic History¶
Filtered History¶
# Show results for specific commit
agenttest log --commit abc123
# Show results for specific branch
agenttest log --branch main
# Show results for feature branch
agenttest log --branch feature-summarization
History Output¶
π Test History (last 10 runs):
ββββββββββββββ³ββββββββββββββββββββββ³ββββββββββββββββ³βββββββββββββββ³βββββββββββββββ
β Commit β Timestamp β Branch β Tests β Pass Rate β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β e1c83a6d β 2024-06-26 14:45:12 β main β 5 passed, 0 β 100% β
β 95eadec3 β 2024-06-26 14:38:33 β main β 3 passed, 2 β 60% β
β 7b2af91e β 2024-06-26 12:15:44 β feature-123 β 4 passed, 1 β 80% β
β 4c9e8f1d β 2024-06-26 11:22:15 β main β 5 passed, 0 β 100% β
β 2a7b5e9f β 2024-06-25 16:45:33 β develop β 4 passed, 1 β 80% β
ββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββ
π Performance Comparison¶
The compare
command provides detailed analysis between any two git references.
Basic Comparison¶
# Compare current commit with previous
agenttest compare HEAD~1
# Compare specific commits
agenttest compare abc123 def456
# Compare branches
agenttest compare main feature-branch
Advanced Comparison Options¶
# Focus on specific evaluator
agenttest compare abc123 def456 --metric similarity
# Filter tests by name pattern
agenttest compare abc123 def456 --filter "summarization"
# Adjust sensitivity threshold
agenttest compare abc123 def456 --min-change 0.05
# Show detailed evaluator breakdown
agenttest compare abc123 def456 --detailed
# Include unchanged tests
agenttest compare abc123 def456 --include-unchanged
# Export results to JSON
agenttest compare abc123 def456 --export comparison.json
Comparison Output¶
Summary Changes¶
π Comparing abc123 β def456
Base: abc123 (2024-06-26T14:38:33)
Target: def456 (2024-06-26T14:45:12)
π Overall Summary Changes:
ββββββββββββββββ³βββββββββ³ββββββββββ³ββββββββββ³ββββββββββββ
β Metric β Base β Target β Change β % Change β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β Pass Rate β 0.75 β 0.85 β +0.100 β +13.3% β
β Average Scoreβ 0.692β 0.751β +0.059 β +8.5% β
β Total Tests β 4 β 5 β +1 β +25.0% β
ββββββββββββββββ΄βββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββββ
Test-Level Changes¶
π Test Changes Overview
βββ π Improvements: 2
βββ π Regressions: 1
βββ π New Tests: 1
βββ ποΈ Removed Tests: 0
π Improvements:
β’ test_summarization: score: 0.734 β 0.856 (+0.122)
β’ test_qa_accuracy: FAIL β PASS, score: 0.650 β 0.823 (+0.173)
π Regressions:
β’ test_content_generation: score: 0.891 β 0.734 (-0.157)
π New Tests:
β’ test_new_feature: PASS, score: 0.912
Detailed Evaluator Analysis¶
π Evaluator-Specific Changes:
similarity:
β’ test_summarization: 0.734 β 0.856 (+0.122)
β’ test_qa_accuracy: 0.432 β 0.678 (+0.246)
β’ test_content_generation: 0.891 β 0.734 (-0.157)
llm_judge:
β’ test_summarization: 0.823 β 0.867 (+0.044)
β’ test_qa_accuracy: 0.712 β 0.845 (+0.133)
β’ test_content_generation: 0.912 β 0.745 (-0.167)
π Development Workflows¶
Pre-Commit Checks¶
# Check for regressions before committing
agenttest compare HEAD~1 HEAD --detailed
# Fail if significant regressions detected
agenttest compare HEAD~1 HEAD --min-change 0.05 | grep "π Regressions: 0" || exit 1
Feature Development¶
# Start feature branch
git checkout -b feature-new-capability
# Develop and test iteratively
agenttest run --verbose
# Compare with main branch before merge
agenttest compare main feature-new-capability --detailed
# Check specific evaluator performance
agenttest compare main feature-new-capability --metric llm_judge
Release Validation¶
# Compare release candidate with previous release
agenttest compare v1.2.0 v1.3.0-rc1 --detailed --export release-comparison.json
# Validate no regressions in core functionality
agenttest compare v1.2.0 v1.3.0-rc1 --filter "core" --min-change 0.02
π§ CI/CD Integration¶
GitHub Actions¶
# .github/workflows/agent-tests.yml
name: Agent Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # Fetch full history for comparisons
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install agenttest
- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
agenttest run --ci --output results.json
- name: Compare with main (for PRs)
if: github.event_name == 'pull_request'
run: |
agenttest compare origin/main HEAD \
--detailed \
--export comparison.json \
--min-change 0.02
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: test-results
path: |
results.json
comparison.json
- name: Comment PR with results
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const comparison = JSON.parse(fs.readFileSync('comparison.json', 'utf8'));
const comment = `## π§ͺ Agent Test Results
### Summary Changes
- **Pass Rate**: ${comparison.summary_changes.pass_rate?.change || 'N/A'}
- **Average Score**: ${comparison.summary_changes.average_score?.change || 'N/A'}
### Changes
- π Improvements: ${comparison.improvements.length}
- π Regressions: ${comparison.regressions.length}
- π New Tests: ${comparison.new_tests.length}
`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: comment
});
GitLab CI¶
# .gitlab-ci.yml
stages:
- test
- compare
agent_tests:
stage: test
script:
- pip install agenttest
- agenttest run --ci --output results.json
artifacts:
reports:
junit: results.json
paths:
- results.json
expire_in: 1 week
only:
- main
- develop
- merge_requests
compare_performance:
stage: compare
script:
- agenttest compare $CI_MERGE_REQUEST_TARGET_BRANCH_NAME $CI_COMMIT_SHA --detailed --export comparison.json
artifacts:
paths:
- comparison.json
only:
- merge_requests
π Performance Monitoring¶
Long-term Tracking¶
# Generate performance trend report
agenttest log --limit 50 --branch main > performance-history.txt
# Compare performance over time windows
agenttest compare $(git rev-parse HEAD~20) HEAD --detailed --export trend-analysis.json
Automated Monitoring Script¶
#!/bin/bash
# monitor-performance.sh
# Get current and previous commit
CURRENT=$(git rev-parse HEAD)
PREVIOUS=$(git rev-parse HEAD~1)
# Run comparison
agenttest compare $PREVIOUS $CURRENT \
--detailed \
--min-change 0.02 \
--export comparison.json
# Check for significant regressions
REGRESSIONS=$(cat comparison.json | jq '.regressions | length')
if [ $REGRESSIONS -gt 0 ]; then
echo "β οΈ Performance regressions detected!"
agenttest compare $PREVIOUS $CURRENT --detailed
exit 1
else
echo "β
No significant performance regressions detected"
fi
π Debugging Performance Issues¶
Identify Problem Areas¶
# Focus on specific evaluator that's regressing
agenttest compare abc123 def456 --metric similarity --detailed
# Look at specific test patterns
agenttest compare abc123 def456 --filter "summarization" --detailed
# Find tests with largest score drops
agenttest compare abc123 def456 --min-change 0.1 --detailed
Historical Analysis¶
# Find when performance started declining
for commit in $(git rev-list --reverse HEAD~10..HEAD); do
echo "Checking commit: $commit"
agenttest compare HEAD~10 $commit --quiet | grep "Average Score"
done
# Bisect performance issues
git bisect start HEAD HEAD~10
# Use agenttest compare in bisect script
ποΈ Data Storage¶
File Structure¶
.agenttest/results/
βββ index.json # Master index of all runs
βββ 20240626_144512_e1c83a6d.json # Individual test run results
βββ 20240626_143833_95eadec3.json
βββ 20240626_122144_7b2af91e.json
Index Format¶
{
"runs": [
{
"timestamp": "2024-06-26T14:45:12.789012",
"commit_hash": "e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c",
"commit_hash_short": "e1c83a6d",
"branch": "main",
"summary": {
"total_tests": 5,
"passed": 5,
"failed": 0,
"pass_rate": 100.0,
"average_score": 0.887
},
"filename": "20240626_144512_e1c83a6d.json"
}
],
"by_commit": {
"e1c83a6d": [...],
"95eadec3": [...]
},
"by_branch": {
"main": [...],
"feature-123": [...]
}
}
Individual Result Format¶
{
"timestamp": "2024-06-26T14:45:12.789012",
"git_info": {
"commit_hash": "e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c",
"commit_hash_short": "e1c83a6d",
"branch": "main",
"author": "john.doe@example.com",
"message": "Fix: Improve summarization accuracy"
},
"summary": {
"total_tests": 5,
"passed": 5,
"failed": 0,
"pass_rate": 100.0,
"average_score": 0.887,
"total_duration": 12.45
},
"test_results": [
{
"test_name": "test_summarization",
"passed": true,
"score": 0.856,
"duration": 2.45,
"evaluations": {
"similarity": { "score": 0.834, "passed": true },
"llm_judge": { "score": 0.878, "passed": true }
}
}
]
}
π§ Configuration¶
Enable git integration in configuration:
logging:
git_aware: true # Enable git integration
results_dir: '.agenttest/results' # Results storage location
Advanced Git Options¶
logging:
git_aware: true
results_dir: '.agenttest/results'
git_config:
track_author: true # Include commit author
track_message: true # Include commit message
track_changes: true # Include file changes
max_history: 100 # Limit stored results
cleanup_days: 30 # Auto-cleanup old results
π οΈ Best Practices¶
1. Commit Hygiene¶
- Make atomic commits for better tracking
- Use descriptive commit messages
- Tag releases for easy comparison
2. Branch Strategy¶
- Test feature branches before merging
- Compare with target branch regularly
- Use meaningful branch names
3. Performance Monitoring¶
- Set up automated comparison checks
- Monitor long-term trends
- Investigate regressions quickly
4. CI/CD Integration¶
- Include comparison in PR workflows
- Fail builds on significant regressions
- Generate comparison reports
5. Data Management¶
- Regular cleanup of old results
- Export important comparisons
- Back up critical performance data
π Related Documentation¶
- CLI Commands - Detailed command reference
- Configuration - Git integration configuration
- Writing Tests - Test structure for tracking
- Enhanced logging features (built-in)