ai-agents
Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, and observability (modern best practices)
$ 安裝
git clone https://github.com/vasilyu1983/AI-Agents-public /tmp/AI-Agents-public && cp -r /tmp/AI-Agents-public/frameworks/claude-code-kit/framework/skills/ai-agents ~/.claude/skills/AI-Agents-public// tip: Run this command in your terminal to install the skill
name: ai-agents description: Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, and observability (modern best practices)
AI Agents Development — Production Skill Hub
Modern Best Practices: MCP-based tool integration, agentic RAG, handoff-first orchestration, multi-layer guardrails, LangGraph workflows, OpenTelemetry observability, and human-in-the-loop controls.
This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.
No theory. No narrative. Only what Claude can execute.
When to Use This Skill
Claude should activate this skill whenever the user asks for:
- Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
- Scoping capability maturity and rollout risk for new agent behaviors.
- Creating action loops, plans, workflows, or delegation logic.
- Writing tool definitions, MCP tools, schemas, or validation logic.
- Generating RAG pipelines, retrieval modules, or context injection.
- Building memory systems (session, long-term, episodic, task).
- Creating evaluation harnesses, observability plans, or safety gates.
- Preparing CI/CD, rollout, deployment, or production operational specs.
- Producing any template in
/resources/or/templates/. - Implementing MCP servers or integrating Model Context Protocol.
- Setting up agent handoffs and orchestration patterns.
- Configuring multi-layer guardrails and safety controls.
- For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.
Scope Boundaries (Use These Skills for Depth)
- Prompt scaffolds & structured outputs → ai-prompt-engineering
- RAG retrieval & chunking → ai-rag
- Search tuning (BM25/HNSW/hybrid) → ai-rag
- Security/guardrails → ai-mlops
- Inference optimization → ai-llm-inference
Quick Reference
| Agent Type | Capability | Frameworks | MCP/A2A | When to Use |
|---|---|---|---|---|
| RAG Agent | Knowledge-grounded responses | LangChain, LlamaIndex | MCP for tools | Answering questions from knowledge base |
| Tool-Using | API/function calls | LangGraph, Autogen | MCP for tools | External actions (search, DB, APIs) |
| Multi-Agent | Task delegation, collaboration | CrewAI, AutoGen, ADK | A2A for handoffs | Complex workflows requiring specialization |
| OS Agent | Computer/browser control | Anthropic Computer Use | MCP for system | Desktop automation, web browsing |
| Agentic RAG | Dynamic multi-step retrieval | Custom (ReAct + RAG) | MCP for data | Complex queries requiring iterative search |
| Planning Agent | Strategic decomposition | LangGraph (ReAct/Plan-Execute) | A2A for delegation | Multi-step problems, long-horizon tasks |
| Code/SWE Agent | Autonomous coding, PR creation | HyperAgent, Devin, Claude Code | MCP for git/fs | Issue resolution, feature implementation |
Decision Tree: Choosing Agent Architecture
What does the agent need to do?
├─ Answer questions from knowledge base?
│ ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
│ └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
│
├─ Perform external actions (APIs, tools, functions)?
│ ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
│ └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
│
├─ Write/modify code autonomously?
│ ├─ Single file edits? → Tool-Using Agent with code tools
│ └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
│
├─ Delegate tasks to specialists?
│ ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
│ ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
│ └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
│
├─ Control desktop/browser?
│ └─ OS Agent (Anthropic Computer Use + MCP for system access)
│
└─ Hybrid (combination of above)?
└─ Planning Agent that coordinates:
- Tool-using for actions (MCP)
- RAG for knowledge (MCP)
- Multi-agent for delegation (A2A)
- Code agents for implementation
Protocol Selection:
- Use MCP for: Tool access, data retrieval, single-agent integration
- Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation
Navigation: Core Concepts & Patterns
Governance & Maturity
- Agent Maturity & Governance -
resources/agent-maturity-governance.md- Capability maturity levels (L0-L4)
- Identity & policy enforcement
- Fleet control and registry management
- Deprecation rules and kill switches
Modern Best Practices
- Modern Best Practices -
resources/modern-best-practices.md- Model Context Protocol (MCP)
- Agent-to-Agent Protocol (A2A)
- Agentic RAG (Dynamic Retrieval)
- Multi-layer guardrails
- LangGraph over LangChain
- OpenTelemetry for agents
Context Management
- Context Engineering -
resources/context-engineering.md- Progressive disclosure
- Session management
- Memory provenance
- Retrieval timing
- Multimodal context
Core Operational Patterns
- Operational Patterns -
resources/operational-patterns.md- Agent loop pattern (PLAN → ACT → OBSERVE → UPDATE)
- OS agent action loop
- RAG pipeline pattern
- Tool specification
- Memory system pattern
- Multi-agent workflow
- Safety & guardrails
- Observability
- Evaluation patterns
- Deployment & CI/CD
Navigation: Protocol Implementation
-
MCP Practical Guide -
resources/mcp-practical-guide.mdBuilding MCP servers, tool integration, and standardized data access -
MCP Server Builder -
resources/mcp-server-builder.mdEnd-to-end checklist for workflow-focused MCP servers (design → build → test) -
A2A Handoff Patterns -
resources/a2a-handoff-patterns.mdAgent-to-agent communication, task delegation, and coordination protocols -
Protocol Decision Tree -
resources/protocol-decision-tree.mdWhen to use MCP vs A2A, decision framework, and selection criteria
Navigation: Agent Capabilities
-
Agent Operations -
resources/agent-operations-best-practices.mdAction loops, planning, observation, and execution patterns -
RAG Patterns -
resources/rag-patterns.mdContextual retrieval, agentic RAG, and hybrid search strategies -
Memory Systems -
resources/memory-systems.mdSession, long-term, episodic, and task memory architectures -
Tool Design & Validation -
resources/tool-design-specs.mdTool schemas, validation, error handling, and MCP integration
Skill Packaging & Sharing
-
Skill Lifecycle -
resources/skill-lifecycle.mdScaffold, validate, package, and share Claude skills with teams (Slack-ready) -
API Contracts for Agents -
resources/api-contracts-for-agents.mdRequest/response envelopes, safety gates, streaming/async patterns, error taxonomy -
Multi-Agent Patterns -
resources/multi-agent-patterns.mdManager-worker, sequential, handoff, and group chat orchestration -
OS Agent Capabilities -
resources/os-agent-capabilities.mdDesktop automation, UI grounding, and computer use patterns -
Code/SWE Agents -
resources/code-swe-agents.mdSE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture
Navigation: Production Operations
-
Evaluation & Observability -
resources/evaluation-and-observability.mdOpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring -
Deployment, CI/CD & Safety -
resources/deployment-ci-cd-and-safety.mdMulti-layer guardrails, HITL controls, NIST AI RMF, production checklists
Navigation: Templates (Copy-Paste Ready)
Core Agent Templates
-
Standard Agent Template -
templates/core/agent-template-standard.mdFull production spec: memory, tools, RAG, evaluation, observability, safety -
Specialized Agent Template -
templates/core/agent-template-specialized.mdDomain-specific agents with custom capabilities and constraints -
Quick Agent Template -
templates/core/agent-template-quick.mdMinimal viable agent for rapid prototyping
RAG Templates
-
Basic RAG -
templates/rag/rag-basic.mdSimple retrieval-augmented generation pipeline -
Advanced RAG -
templates/rag/rag-advanced.mdContextual retrieval, reranking, and agentic RAG patterns -
Hybrid Retrieval -
templates/rag/hybrid-retrieval.mdSemantic + keyword search with BM25 fusion
Tool Templates
-
Tool Definition -
templates/tools/tool-definition.mdMCP-compatible tool schemas with validation and error handling -
Tool Validation Checklist -
templates/tools/tool-validation-checklist.mdTesting, security, and production readiness checks
Multi-Agent Templates
-
Manager-Worker Template -
templates/multi-agent/manager-worker-template.mdOrchestration pattern with task delegation and result aggregation -
Evaluator-Router Template -
templates/multi-agent/evaluator-router-template.mdDynamic routing with quality assessment and domain classification
Service Layer Templates
- FastAPI Agent Service -
../dev-api-design/templates/fastapi/fastapi-complete-api.mdAuth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasks
External Sources Metadata
- Curated References -
data/sources.json95 authoritative sources across 13 categories including arXiv research papers and Code/SWE agents
Shared Utilities (Centralized patterns — extract, don't duplicate)
- ../_shared/utilities/llm-utilities.md — Token counting, streaming, cost estimation
- ../_shared/utilities/error-handling.md — Effect Result types, correlation IDs
- ../_shared/utilities/resilience-utilities.md — p-retry v6, circuit breaker for API calls
- ../_shared/utilities/logging-utilities.md — pino v9 + OpenTelemetry integration
- ../_shared/utilities/observability-utilities.md — OpenTelemetry SDK, tracing, metrics
- ../_shared/utilities/testing-utilities.md — Test factories, fixtures, mocks
- ../_shared/resources/code-quality-operational-playbook.md — Canonical coding rules & LLM code review
Related Skills
This skill integrates with complementary Claude Code skills:
Core Dependencies
../ai-llm/- LLM patterns, prompt engineering, and model selection for agents../ai-rag/- Deep RAG implementation: chunking, embedding, reranking../ai-prompt-engineering/- System prompt design, few-shot patterns, reasoning strategies
Production & Operations
../qa-observability/- OpenTelemetry, metrics, distributed tracing../software-security-appsec/- OWASP Top 10, input validation, secure tool design../ops-devops-platform/- CI/CD pipelines, deployment strategies, infrastructure
Supporting Patterns
../dev-api-design/- REST/GraphQL design for agent APIs and tool interfaces../ai-mlops/- Model deployment, monitoring, drift detection../qa-debugging/- Agent debugging, error analysis, root cause investigation
Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.
Usage Notes for Claude
- Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
- Lightweight SKILL.md: Use this file for quick reference and navigation
- Drill-down resources: Reference detailed resources for implementation guidance
- Copy-paste templates: Use templates when the user asks for structured artifacts
- External sources: Reference
data/sources.jsonfor authoritative documentation links - No theory: Never include theoretical explanations; only operational steps
Key Modern Migrations
Traditional → Modern:
- Custom APIs → Model Context Protocol (MCP)
- Static RAG → Agentic RAG with contextual retrieval
- Ad-hoc handoffs → Versioned handoff APIs with JSON Schema
- Single guardrail → Multi-layer defense (5+ layers)
- LangChain agents → LangGraph stateful workflows
- Custom observability → OpenTelemetry GenAI standards
- Model-centric → Context engineering-centric
AI-Native SDLC Pattern (Delegate → Review → Own)
- Plan: Have the agent draft
PLAN.mdor use a planning tool; require code-path trace, dependency map, and risk/edge-case list before build starts. - Design: Convert mocks to components; enforce design tokens/style guides; surface accessibility gaps; keep MCP-linked component libraries in context.
- Build: Let the agent scaffold end-to-end (models/APIs/UI/tests/docs); enforce long-run guardrails (time cap, allowed commands/tools, commit/PR gating, kill switch).
- Test: Demand failing test first; agent generates and runs suites; require coverage deltas and flaky-test notes; human reviews assertions and fixtures.
- Review: Agent runs first-pass review tuned for P0/P1; human focuses on architecture, performance, safety, and migration risk; always own final merge.
- Document: Agent drafts PR summaries, module/file notes, and mermaid diagrams; require doc updates in the same run; human adds “why” and approvals.
- Deploy & Maintain: Agent links logs/metrics via MCP for triage; propose hotfixes with rollback plans; human approves rollouts; track drift/regressions with evals.
Executive Briefing (Optional)
- Value: Coding agents compress SDLC time; delegate mechanical work, keep humans on intent/architecture; measurable gains come from tight guardrails plus eval loops.
- Cost & Risk: Training vs inference economics; long runs need caps/kill switches; data/secret handling and supply-chain policies stay human-owned.
- Governance: Multi-layer guardrails (policy prompt, tool allowlist, auth scopes, eval gates, audit logs); require human sign-off for deploys and safety-sensitive changes.
Repository
