Marketplace

prompt-forge

Generate and iteratively harden prompts and templates using frozen evaluation harnesses and adversarial validation.

allowed_tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite

model: sonnet

$ インストール

git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/foundry/prompt-forge ~/.claude/skills/context-cascade

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: prompt-forge description: Generate and iteratively harden prompts and templates using frozen evaluation harnesses and adversarial validation. allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite model: sonnet x-version: 3.2.0 x-category: foundry x-vcl-compliance: v3.1.1 x-cognitive-frames:

HON
MOR
COM
CLS
EVD
ASP
SPC

L1 Improvement

Aligned the forge SOP with Skill Forge required sections, adding explicit validation gates and delivery artifacts.
Incorporated prompt-architect confidence ceilings, dual-pass refinement, and frozen eval harness references.

STANDARD OPERATING PROCEDURE

Purpose

Create or improve prompts/templates with systematic iteration, evaluation harnesses, and adversarial checks to ensure reliability and reuse.

Trigger Conditions

Positive: generating a new prompt/template, improving an existing prompt, or hardening prompts for CI/automation.
Negative/reroute: broad prompt clarity work (prompt-architect) or full skill/agent creation (skill-forge/agent-creator).

Guardrails

Always pair generation with validation: run harness tests (happy, edge, adversarial) and record results.
Keep outputs in English with explicit confidence ceilings and changelogs of iteration deltas.
Avoid one-pass generation; require at least two refinement cycles or document why not.
Respect template scoping; do not mix unrelated responsibilities.

Execution Phases

Intake: Capture goal, audience, constraints, success criteria, and target format; classify constraints as HARD/SOFT/INFERRED.
Drafting: Produce an initial template with structure, inputs/outputs, refusal policy, and examples.
Evaluation: Run frozen harness cases and adversarial probes; record metrics and failures.
Refinement: Address failures, adjust structure, calibrate ceilings; repeat until pass or documented residual risk.
Delivery: Provide final prompt, test artifacts, usage notes, and confidence statement.

Pattern Recognition

Deterministic automation prompts → enforce strict formats and refusal rules.
Creative prompts → use guardrails for IP/safety and clear tone guidance.
Review/checklist prompts → add scoring rubrics and evidence expectations.

Advanced Techniques

Use self-play (two-model debate) for risky domains to expose weaknesses.
Apply contrastive examples to narrow scope and improve refusal posture.
Store successful patterns in references for reuse by skill-forge and agent-creator.

Common Anti-Patterns

Shipping prompts without evaluation evidence.
Mixing multiple intents in one template.
Omitting confidence ceilings or iteration logs.

Practical Guidelines

Keep templates modular; separate context, instructions, and output schema.
Include quick-start examples and boundary cases.
Maintain a small changelog of iteration deltas with dates/metrics.

Cross-Skill Coordination

Upstream: prompt-architect for clarity; base-template-generator for structural scaffolds when code is involved.
Downstream: skill-forge/agent-creator embedding the prompt; recursive-improvement for ongoing tuning.

MCP Requirements

Optional memory/vector MCP to store harness results; tag WHO=prompt-forge-{session}, WHY=skill-execution.

Input/Output Contracts

inputs:
  goal: string  # required description of what the prompt must achieve
  audience: string  # required target user/agent
  constraints: list[string]  # optional constraints and policies
  format: string  # optional output format
outputs:
  prompt_template: string  # final prompt/template text
  eval_report: file  # harness results and adversarial findings
  change_log: file  # iteration deltas and confidence ceilings

Recursive Improvement

Feed harness failures into recursive-improvement to propose targeted edits until metrics meet thresholds or risk is documented.

Examples

Forge a CI-ready code-review prompt with security and performance rubrics.
Improve a research-synthesis prompt with source-citation enforcement and ceiling discipline.

Troubleshooting

Harness failures → prioritize fixes by severity and rerun critical cases.
Overly long prompts → modularize sections and remove redundancy.
Drift in outputs → tighten schema, add examples, and reassert refusal rules.

Completion Verification

Draft + refined prompt delivered with changelog.
Evaluation harness results recorded (happy/edge/adversarial).
Confidence ceiling stated; residual risks documented.
Usage notes and output schema included.

Confidence: 0.70 (ceiling: inference 0.70) - Prompt Forge SOP rewritten with Skill Forge cadence and prompt-architect ceilings.