Marketplace

agent-safety

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

$ 安裝

git clone https://github.com/pluginagentmarketplace/custom-plugin-ai-agents /tmp/custom-plugin-ai-agents && cp -r /tmp/custom-plugin-ai-agents/skills/agent-safety ~/.claude/skills/custom-plugin-ai-agents

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: agent-safety description: Ensure agent safety - guardrails, content filtering, monitoring, and compliance sasmp_version: "1.3.0" bonded_agent: 07-agent-safety bond_type: PRIMARY_BOND version: "2.0.0"

Agent Safety

Implement safety systems for responsible AI agent deployment.

When to Use This Skill

Invoke this skill when:

Adding input/output guardrails
Implementing content filtering
Setting up rate limiting
Ensuring compliance (GDPR, SOC2)

Parameter Schema

Parameter	Type	Required	Description	Default
`task`	string	Yes	Safety goal	-
`risk_level`	enum	No	`strict`, `moderate`, `permissive`	`strict`
`filters`	list	No	Filter types to enable	`["injection", "pii", "toxicity"]`

Quick Start

from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter

guard = Guard.from_validators([
    ToxicLanguage(threshold=0.8, on_fail="exception"),
    PIIFilter(on_fail="fix")
])

# Validate output
validated = guard.validate(llm_response)

Guardrail Types

Input Guardrails

# Prompt injection detection
INJECTION_PATTERNS = [
    r"ignore (previous|all) instructions",
    r"you are now",
    r"forget everything"
]

Output Guardrails

# Content filtering
filters = [
    ToxicityFilter(),
    PIIRedactor(),
    HallucinationDetector()
]

Rate Limiting

class RateLimiter:
    def __init__(self, rpm=60, tpm=100000):
        self.rpm = rpm
        self.tpm = tpm

    def check(self, user_id, tokens):
        # Token bucket algorithm
        pass

Troubleshooting

Issue	Solution
False positives	Tune thresholds
Injection bypass	Add LLM-based detection
PII leakage	Add secondary validation
Performance hit	Cache filter results

agent-safety

$ 安裝

name: agent-safety description: Ensure agent safety - guardrails, content filtering, monitoring, and compliance sasmp_version: "1.3.0" bonded_agent: 07-agent-safety bond_type: PRIMARY_BOND version: "2.0.0"

Agent Safety

When to Use This Skill

Parameter Schema

Quick Start

Guardrail Types

Input Guardrails

Output Guardrails

Rate Limiting

Troubleshooting

Best Practices

Compliance Checklist

Related Skills

References

Repository

Actions

Related Skills