evaluation-metrics

Automatically applies when evaluating LLM performance. Ensures proper eval datasets, metrics computation, A/B testing, LLM-as-judge patterns, and experiment tracking.

$ Instalar

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/product/evaluation-metrics ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill