optimization-benchmark
Comprehensive performance benchmarking, regression detection, automated testing, and performance validation. Use for CI/CD quality gates, baseline comparisons, and systematic performance testing.
$ Installer
git clone https://github.com/vamseeachanta/workspace-hub /tmp/workspace-hub && cp -r /tmp/workspace-hub/.claude/skills/tools/optimization/optimization-benchmark ~/.claude/skills/workspace-hub// tip: Run this command in your terminal to install the skill
SKILL.md
name: optimization-benchmark description: Comprehensive performance benchmarking, regression detection, automated testing, and performance validation. Use for CI/CD quality gates, baseline comparisons, and systematic performance testing.
Benchmark Suite Skill
Overview
This skill provides comprehensive automated performance testing capabilities including benchmark execution, regression detection, performance validation, and quality assessment for ensuring optimal system performance.
When to Use
- Running performance benchmark suites before deployment
- Detecting performance regressions between versions
- Validating SLA compliance through automated testing
- Load, stress, and endurance testing
- CI/CD pipeline performance quality gates
- Comparing performance across configurations
Quick Start
# Run comprehensive benchmark suite
npx claude-flow benchmark-run --suite comprehensive --duration 300
# Execute specific benchmark
npx claude-flow benchmark-run --suite throughput --iterations 10
# Compare with baseline
npx claude-flow benchmark-compare --current <results> --baseline <baseline>
# Quality assessment
npx claude-flow quality-assess --target swarm-performance --criteria throughput,latency
# Performance validation
npx claude-flow validate-performance --results <file> --criteria <file>
Architecture
+-----------------------------------------------------------+
| Benchmark Suite |
+-----------------------------------------------------------+
| Benchmark Runner | Regression Detector | Validator |
+--------------------+-----------------------+--------------+
| | |
v v v
+------------------+ +-------------------+ +--------------+
| Benchmark Types | | Detection Methods | | Validation |
| - Throughput | | - Statistical | | - SLA |
| - Latency | | - ML-based | | - Regression |
| - Scalability | | - Threshold | | - Scalability|
| - Coordination | | - Trend Analysis | | - Reliability|
+------------------+ +-------------------+ +--------------+
| | |
v v v
+-----------------------------------------------------------+
| Reporter & Comparator |
+-----------------------------------------------------------+
Benchmark Types
Standard Benchmarks
| Benchmark | Metrics | Duration | Targets |
|---|---|---|---|
| Throughput | requests/sec, tasks/sec, messages/sec | 5 min | min: 1000, optimal: 5000 |
| Latency | p50, p90, p95, p99, max | 5 min | p50<100ms, p99<1s |
| Scalability | linear coefficient, efficiency retention | variable | coefficient>0.8 |
| Coordination | message latency, sync time | 5 min | <50ms |
| Fault Tolerance | recovery time, failover success | 10 min | <30s recovery |
Test Campaign Types
- Load Testing: Gradual ramp-up to sustained load
- Stress Testing: Find breaking points
- Volume Testing: Large data set handling
- Endurance Testing: Long-duration stability
- Spike Testing: Sudden load changes
- Configuration Testing: Different settings comparison
Core Capabilities
1. Comprehensive Benchmarking
// Run benchmark suite
const results = await benchmarkSuite.run({
duration: 300000, // 5 minutes
iterations: 10, // 10 iterations
warmupTime: 30000, // 30 seconds warmup
cooldownTime: 10000, // 10 seconds cooldown
parallel: false, // Sequential execution
baseline: previousRun // Compare with baseline
});
// Results include:
// - summary: Overall scores and status
// - detailed: Per-benchmark results
// - baseline_comparison: Delta from baseline
// - recommendations: Optimization suggestions
2. Regression Detection
Multi-algorithm detection:
| Method | Description | Use Case |
|---|---|---|
| Statistical | CUSUM change point detection | Detect gradual degradation |
| Machine Learning | Anomaly detection models | Identify unusual patterns |
| Threshold | Fixed limit comparisons | Hard performance limits |
| Trend | Time series regression | Long-term degradation |
# Detect performance regressions
npx claude-flow detect-regression --current <results> --historical <data>
# Set up automated regression monitoring
npx claude-flow regression-monitor --enable --sensitivity 0.95
3. Automated Performance Testing
// Execute test campaign
const campaign = await tester.runTestCampaign({
tests: [
{ type: 'load', config: loadTestConfig },
{ type: 'stress', config: stressTestConfig },
{ type: 'endurance', config: enduranceConfig }
],
constraints: {
maxDuration: 3600000, // 1 hour max
failFast: true // Stop on first failure
}
});
4. Performance Validation
Validation framework with multi-criteria assessment:
| Validation Type | Criteria |
|---|---|
| SLA Validation | Availability, response time, throughput, error rate |
| Regression Validation | Comparison with historical data |
| Scalability Validation | Linear scaling, efficiency retention |
| Reliability Validation | Error handling, recovery, consistency |
MCP Integration
// Comprehensive benchmark integration
const benchmarkIntegration = {
// Execute performance benchmarks
async runBenchmarks(config = {}) {
const [benchmark, metrics, trends, cost] = await Promise.all([
mcp.benchmark_run({ suite: config.suite || 'comprehensive' }),
mcp.metrics_collect({ components: ['system', 'agents', 'coordination'] }),
mcp.trend_analysis({ metric: 'performance', period: '24h' }),
mcp.cost_analysis({ timeframe: '24h' })
]);
return { benchmark, metrics, trends, cost, timestamp: Date.now() };
},
// Quality assessment
async assessQuality(criteria) {
return await mcp.quality_assess({
target: 'swarm-performance',
criteria: criteria || ['throughput', 'latency', 'reliability', 'scalability']
});
}
};
Key Metrics
Benchmark Targets
const benchmarkTargets = {
throughput: {
requests_per_second: { min: 1000, optimal: 5000 },
tasks_per_second: { min: 100, optimal: 500 },
messages_per_second: { min: 10000, optimal: 50000 }
},
latency: {
p50: { max: 100 }, // 100ms
p90: { max: 200 }, // 200ms
p95: { max: 500 }, // 500ms
p99: { max: 1000 }, // 1s
max: { max: 5000 } // 5s
},
scalability: {
linear_coefficient: { min: 0.8 },
efficiency_retention: { min: 0.7 }
}
};
CI/CD Quality Gates
| Gate | Criteria | Action on Failure |
|---|---|---|
| Performance | < 10% degradation | Block deployment |
| Latency | p99 < 1s | Warning |
| Error Rate | < 0.5% | Block deployment |
| Scalability | > 80% linear | Warning |
Load Testing Example
// Load test with gradual ramp-up
const loadTest = {
type: 'load',
phases: [
{ phase: 'ramp-up', duration: 60000, startLoad: 10, endLoad: 100 },
{ phase: 'sustained', duration: 300000, load: 100 },
{ phase: 'ramp-down', duration: 30000, startLoad: 100, endLoad: 0 }
],
successCriteria: {
p99_latency: { max: 1000 },
error_rate: { max: 0.01 },
throughput: { min: 80 } // % of expected
}
};
Stress Testing Example
// Stress test to find breaking point
const stressTest = {
type: 'stress',
startLoad: 100,
maxLoad: 10000,
loadIncrement: 100,
duration: 60000, // Per load level
breakingCriteria: {
error_rate: { max: 0.05 }, // 5% errors
latency_p99: { max: 5000 }, // 5s latency
timeout_rate: { max: 0.10 } // 10% timeouts
}
};
Integration Points
| Integration | Purpose |
|---|---|
| Performance Monitor | Continuous monitoring data for benchmarking |
| Load Balancer | Validates load balancing effectiveness |
| Topology Optimizer | Tests topology configurations |
| CI/CD Pipeline | Automated quality gates |
Best Practices
- Consistent Environment: Run benchmarks in consistent, isolated environments
- Warmup Period: Always include warmup to eliminate cold-start effects
- Multiple Iterations: Run multiple iterations for statistical significance
- Baseline Maintenance: Keep baseline updated with expected performance
- Historical Tracking: Store all benchmark results for trend analysis
- Realistic Workloads: Use production-like workload patterns
Example: CI/CD Integration
#!/bin/bash
# ci-performance-gate.sh
# Run benchmark suite
RESULTS=$(npx claude-flow benchmark-run --suite quick --output json)
# Compare with baseline
COMPARISON=$(npx claude-flow benchmark-compare \
--current "$RESULTS" \
--baseline ./baseline.json)
# Check for regressions
if echo "$COMPARISON" | jq -e '.regression_detected == true' > /dev/null; then
echo "Performance regression detected!"
echo "$COMPARISON" | jq '.regressions'
exit 1
fi
echo "Performance validation passed"
exit 0
Related Skills
optimization-monitor- Real-time performance monitoringoptimization-analyzer- Bottleneck analysis and reportingoptimization-load-balancer- Load distribution optimizationoptimization-topology- Topology performance testing
Version History
- 1.0.0 (2026-01-02): Initial release - converted from benchmark-suite agent with comprehensive benchmarking, regression detection, automated testing, and performance validation
Repository

vamseeachanta
Author
vamseeachanta/workspace-hub/.claude/skills/tools/optimization/optimization-benchmark
0
Stars
0
Forks
Updated21h ago
Added1w ago