Skip to content

Git Integration

AgentTest provides powerful git integration features that track test performance across commits, branches, and time. This enables regression detection, performance monitoring, and comparison workflows.

πŸ” Overview

Git integration automatically:

  • Tracks Performance: Records test results with git metadata
  • Enables Comparisons: Compare results between commits/branches
  • Detects Regressions: Identify performance degradations
  • Provides History: Browse test results over time
  • Supports Workflows: Integrate with CI/CD pipelines

πŸ“Š Automatic Test Tracking

Every test run is automatically logged with git information:

# Run tests - automatically logged with git metadata
agenttest run

# Results are stored with:
# - Commit hash
# - Branch name
# - Timestamp
# - Test outcomes
# - Evaluation scores

Stored Information

Each test run includes:

Field Description Example
commit_hash Full commit SHA e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c
commit_hash_short Short commit SHA e1c83a6d
branch Git branch name main, feature-123
timestamp Execution time 2024-06-26T14:45:12.789012
author Commit author john.doe@example.com
message Commit message Fix: Improve summarization accuracy
test_results Individual test outcomes Scores, pass/fail status
summary Overall test summary Pass rate, average score

πŸ“š Viewing Test History

Basic History

# Show last 10 test runs
agenttest log

# Show last 20 runs
agenttest log --limit 20

Filtered History

# Show results for specific commit
agenttest log --commit abc123

# Show results for specific branch
agenttest log --branch main

# Show results for feature branch
agenttest log --branch feature-summarization

History Output

πŸ“š Test History (last 10 runs):

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Commit     ┃ Timestamp           ┃ Branch        ┃ Tests         ┃ Pass Rate     ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
β”‚ e1c83a6d   β”‚ 2024-06-26 14:45:12 β”‚ main          β”‚ 5 passed, 0   β”‚ 100%          β”‚
β”‚ 95eadec3   β”‚ 2024-06-26 14:38:33 β”‚ main          β”‚ 3 passed, 2   β”‚ 60%           β”‚
β”‚ 7b2af91e   β”‚ 2024-06-26 12:15:44 β”‚ feature-123   β”‚ 4 passed, 1   β”‚ 80%           β”‚
β”‚ 4c9e8f1d   β”‚ 2024-06-26 11:22:15 β”‚ main          β”‚ 5 passed, 0   β”‚ 100%          β”‚
β”‚ 2a7b5e9f   β”‚ 2024-06-25 16:45:33 β”‚ develop       β”‚ 4 passed, 1   β”‚ 80%           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Performance Comparison

The compare command provides detailed analysis between any two git references.

Basic Comparison

# Compare current commit with previous
agenttest compare HEAD~1

# Compare specific commits
agenttest compare abc123 def456

# Compare branches
agenttest compare main feature-branch

Advanced Comparison Options

# Focus on specific evaluator
agenttest compare abc123 def456 --metric similarity

# Filter tests by name pattern
agenttest compare abc123 def456 --filter "summarization"

# Adjust sensitivity threshold
agenttest compare abc123 def456 --min-change 0.05

# Show detailed evaluator breakdown
agenttest compare abc123 def456 --detailed

# Include unchanged tests
agenttest compare abc123 def456 --include-unchanged

# Export results to JSON
agenttest compare abc123 def456 --export comparison.json

Comparison Output

Summary Changes

πŸ“Š Comparing abc123 β†’ def456
Base: abc123 (2024-06-26T14:38:33)
Target: def456 (2024-06-26T14:45:12)

πŸ“Š Overall Summary Changes:
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ Metric       ┃   Base ┃  Target ┃  Change ┃ % Change  ┃
┑━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
β”‚ Pass Rate    β”‚   0.75 β”‚    0.85 β”‚  +0.100 β”‚    +13.3% β”‚
β”‚ Average Scoreβ”‚   0.692β”‚    0.751β”‚  +0.059 β”‚     +8.5% β”‚
β”‚ Total Tests  β”‚      4 β”‚       5 β”‚      +1 β”‚    +25.0% β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Test-Level Changes

πŸ” Test Changes Overview
β”œβ”€β”€ πŸ“ˆ Improvements: 2
β”œβ”€β”€ πŸ“‰ Regressions: 1
β”œβ”€β”€ πŸ†• New Tests: 1
└── πŸ—‘οΈ Removed Tests: 0

πŸ“ˆ Improvements:
  β€’ test_summarization: score: 0.734 β†’ 0.856 (+0.122)
  β€’ test_qa_accuracy: FAIL β†’ PASS, score: 0.650 β†’ 0.823 (+0.173)

πŸ“‰ Regressions:
  β€’ test_content_generation: score: 0.891 β†’ 0.734 (-0.157)

πŸ†• New Tests:
  β€’ test_new_feature: PASS, score: 0.912

Detailed Evaluator Analysis

agenttest compare abc123 def456 --detailed
πŸ” Evaluator-Specific Changes:

  similarity:
    β€’ test_summarization: 0.734 β†’ 0.856 (+0.122)
    β€’ test_qa_accuracy: 0.432 β†’ 0.678 (+0.246)
    β€’ test_content_generation: 0.891 β†’ 0.734 (-0.157)

  llm_judge:
    β€’ test_summarization: 0.823 β†’ 0.867 (+0.044)
    β€’ test_qa_accuracy: 0.712 β†’ 0.845 (+0.133)
    β€’ test_content_generation: 0.912 β†’ 0.745 (-0.167)

πŸš€ Development Workflows

Pre-Commit Checks

# Check for regressions before committing
agenttest compare HEAD~1 HEAD --detailed

# Fail if significant regressions detected
agenttest compare HEAD~1 HEAD --min-change 0.05 | grep "πŸ“‰ Regressions: 0" || exit 1

Feature Development

# Start feature branch
git checkout -b feature-new-capability

# Develop and test iteratively
agenttest run --verbose

# Compare with main branch before merge
agenttest compare main feature-new-capability --detailed

# Check specific evaluator performance
agenttest compare main feature-new-capability --metric llm_judge

Release Validation

# Compare release candidate with previous release
agenttest compare v1.2.0 v1.3.0-rc1 --detailed --export release-comparison.json

# Validate no regressions in core functionality
agenttest compare v1.2.0 v1.3.0-rc1 --filter "core" --min-change 0.02

πŸ”§ CI/CD Integration

GitHub Actions

# .github/workflows/agent-tests.yml
name: Agent Tests

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0 # Fetch full history for comparisons

      - name: Set up Python
        uses: actions/setup-python@v3
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install agenttest

      - name: Run tests
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          agenttest run --ci --output results.json

      - name: Compare with main (for PRs)
        if: github.event_name == 'pull_request'
        run: |
          agenttest compare origin/main HEAD \
            --detailed \
            --export comparison.json \
            --min-change 0.02

      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: test-results
          path: |
            results.json
            comparison.json

      - name: Comment PR with results
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const comparison = JSON.parse(fs.readFileSync('comparison.json', 'utf8'));

            const comment = `## πŸ§ͺ Agent Test Results

            ### Summary Changes
            - **Pass Rate**: ${comparison.summary_changes.pass_rate?.change || 'N/A'}
            - **Average Score**: ${comparison.summary_changes.average_score?.change || 'N/A'}

            ### Changes
            - πŸ“ˆ Improvements: ${comparison.improvements.length}
            - πŸ“‰ Regressions: ${comparison.regressions.length}
            - πŸ†• New Tests: ${comparison.new_tests.length}
            `;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: comment
            });

GitLab CI

# .gitlab-ci.yml
stages:
  - test
  - compare

agent_tests:
  stage: test
  script:
    - pip install agenttest
    - agenttest run --ci --output results.json
  artifacts:
    reports:
      junit: results.json
    paths:
      - results.json
    expire_in: 1 week
  only:
    - main
    - develop
    - merge_requests

compare_performance:
  stage: compare
  script:
    - agenttest compare $CI_MERGE_REQUEST_TARGET_BRANCH_NAME $CI_COMMIT_SHA --detailed --export comparison.json
  artifacts:
    paths:
      - comparison.json
  only:
    - merge_requests

πŸ“Š Performance Monitoring

Long-term Tracking

# Generate performance trend report
agenttest log --limit 50 --branch main > performance-history.txt

# Compare performance over time windows
agenttest compare $(git rev-parse HEAD~20) HEAD --detailed --export trend-analysis.json

Automated Monitoring Script

#!/bin/bash
# monitor-performance.sh

# Get current and previous commit
CURRENT=$(git rev-parse HEAD)
PREVIOUS=$(git rev-parse HEAD~1)

# Run comparison
agenttest compare $PREVIOUS $CURRENT \
  --detailed \
  --min-change 0.02 \
  --export comparison.json

# Check for significant regressions
REGRESSIONS=$(cat comparison.json | jq '.regressions | length')

if [ $REGRESSIONS -gt 0 ]; then
  echo "⚠️  Performance regressions detected!"
  agenttest compare $PREVIOUS $CURRENT --detailed
  exit 1
else
  echo "βœ… No significant performance regressions detected"
fi

πŸ” Debugging Performance Issues

Identify Problem Areas

# Focus on specific evaluator that's regressing
agenttest compare abc123 def456 --metric similarity --detailed

# Look at specific test patterns
agenttest compare abc123 def456 --filter "summarization" --detailed

# Find tests with largest score drops
agenttest compare abc123 def456 --min-change 0.1 --detailed

Historical Analysis

# Find when performance started declining
for commit in $(git rev-list --reverse HEAD~10..HEAD); do
  echo "Checking commit: $commit"
  agenttest compare HEAD~10 $commit --quiet | grep "Average Score"
done

# Bisect performance issues
git bisect start HEAD HEAD~10
# Use agenttest compare in bisect script

πŸ—‚οΈ Data Storage

File Structure

.agenttest/results/
β”œβ”€β”€ index.json                    # Master index of all runs
β”œβ”€β”€ 20240626_144512_e1c83a6d.json # Individual test run results
β”œβ”€β”€ 20240626_143833_95eadec3.json
└── 20240626_122144_7b2af91e.json

Index Format

{
  "runs": [
    {
      "timestamp": "2024-06-26T14:45:12.789012",
      "commit_hash": "e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c",
      "commit_hash_short": "e1c83a6d",
      "branch": "main",
      "summary": {
        "total_tests": 5,
        "passed": 5,
        "failed": 0,
        "pass_rate": 100.0,
        "average_score": 0.887
      },
      "filename": "20240626_144512_e1c83a6d.json"
    }
  ],
  "by_commit": {
    "e1c83a6d": [...],
    "95eadec3": [...]
  },
  "by_branch": {
    "main": [...],
    "feature-123": [...]
  }
}

Individual Result Format

{
  "timestamp": "2024-06-26T14:45:12.789012",
  "git_info": {
    "commit_hash": "e1c83a6d4f2b8a9c7e5d3f1a8b6c4e2d9f7a5b3c",
    "commit_hash_short": "e1c83a6d",
    "branch": "main",
    "author": "john.doe@example.com",
    "message": "Fix: Improve summarization accuracy"
  },
  "summary": {
    "total_tests": 5,
    "passed": 5,
    "failed": 0,
    "pass_rate": 100.0,
    "average_score": 0.887,
    "total_duration": 12.45
  },
  "test_results": [
    {
      "test_name": "test_summarization",
      "passed": true,
      "score": 0.856,
      "duration": 2.45,
      "evaluations": {
        "similarity": { "score": 0.834, "passed": true },
        "llm_judge": { "score": 0.878, "passed": true }
      }
    }
  ]
}

πŸ”§ Configuration

Enable git integration in configuration:

logging:
  git_aware: true # Enable git integration
  results_dir: '.agenttest/results' # Results storage location

Advanced Git Options

logging:
  git_aware: true
  results_dir: '.agenttest/results'
  git_config:
    track_author: true # Include commit author
    track_message: true # Include commit message
    track_changes: true # Include file changes
    max_history: 100 # Limit stored results
    cleanup_days: 30 # Auto-cleanup old results

πŸ› οΈ Best Practices

1. Commit Hygiene

  • Make atomic commits for better tracking
  • Use descriptive commit messages
  • Tag releases for easy comparison

2. Branch Strategy

  • Test feature branches before merging
  • Compare with target branch regularly
  • Use meaningful branch names

3. Performance Monitoring

  • Set up automated comparison checks
  • Monitor long-term trends
  • Investigate regressions quickly

4. CI/CD Integration

  • Include comparison in PR workflows
  • Fail builds on significant regressions
  • Generate comparison reports

5. Data Management

  • Regular cleanup of old results
  • Export important comparisons
  • Back up critical performance data