Marketplace

reproducibility-audit

Verify that results, builds, and experiments can be reproduced consistently with documented steps and deterministic inputs.

allowed_tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite

model: sonnet

$ Installer

git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/quality/reproducibility-audit ~/.claude/skills/context-cascade

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: reproducibility-audit description: Verify that results, builds, and experiments can be reproduced consistently with documented steps and deterministic inputs. allowed-tools:

Read
Write
Edit
Bash
Glob
Grep
Task
TodoWrite model: sonnet x-version: 3.2.0 x-category: quality x-vcl-compliance: v3.1.1 x-cognitive-frames:
HON
MOR
COM
CLS
EVD
ASP
SPC

STANDARD OPERATING PROCEDURE

Purpose

Assess reproducibility across code, data, and build pipelines by validating deterministic processes, pinned dependencies, and documented steps.

Trigger Conditions

Positive: scientific/ML experiment verification, release build audits, or compliance checks requiring reproducible artifacts.
Negative: stylistic reviews or runtime debugging (route to style-audit or functionality-audit).

Guardrails

Confidence ceiling: Append Confidence: X.XX (ceiling: TYPE Y.YY) using ceilings {inference/report 0.70, research 0.85, observation/definition 0.95}.
Evidence & determinism: Require commands, seeds, data sources, and hashes for artifacts; rerun steps to confirm.
Structure-first: Maintain examples/tests showing successful and failing reproduction attempts.
Adversarial validation: Introduce clean environments and altered dependency versions to detect hidden variability.

Execution Phases

Inventory & Scope
- Identify targets (builds, experiments, reports) and required inputs (code revision, data, config, seeds).
- Note environments (OS, container, hardware) and expected outcomes.
Replay & Measurement
- Follow documented steps exactly; log commands, outputs, and timestamps.
- Compare generated artifacts via hashes or checksums; capture diffs.
Variability Probes
- Change environments/dependency versions within constraints to test stability.
- Document non-deterministic behaviors and their causes.
Reporting & Remediation
- Summarize reproduction success/failure, missing documentation, and proposed fixes (pinning, automation scripts, data versioning).
- Provide confidence with ceiling and attach logs/hashes.

Output Format

Scope, inputs, environments, and expected results.
Step-by-step replay log with evidence (commands, outputs, hashes).
Variability findings and fixes.
Confidence statement using ceiling syntax.

Validation Checklist

Inputs and environments captured with versions/seeds.
Replay executed with logs and hashes recorded.
Variability probes performed; nondeterminism documented.
Remediation steps proposed and owners identified.
Confidence ceiling provided; English-only output.

Confidence: 0.72 (ceiling: inference 0.70) - SOP rewritten per Prompt Architect confidence discipline and Skill Forge structure-first reproducibility focus.