Marketplace

ml-expert

Design, implement, and optimize production-grade ML models and training pipelines.

allowed_tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model: sonnet

$ Instalar

git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/specialists/ml-expert ~/.claude/skills/context-cascade

// tip: Run this command in your terminal to install the skill


name: ml-expert description: Design, implement, and optimize production-grade ML models and training pipelines. allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite model: sonnet x-category: specialists x-version: 1.1.0 x-vcl-compliance: v3.1.1 x-cognitive-frames:

  • HON
  • MOR
  • COM
  • CLS
  • EVD
  • ASP
  • SPC

STANDARD OPERATING PROCEDURE

Purpose

Ship resilient ML systems: architecture design, training pipelines, optimization, and deployment readiness with explicit guardrails.

Triggers

  • Positive: Implementing architectures, training/tuning models, fixing training instabilities, optimizing inference, translating research into code.
  • Negative: Pure data analysis (route to data scientist) or root-cause training incidents (prefer ml-training-debugger first).

Guardrails

  • Structure-first: maintain SKILL.md, examples/, tests/, resources/, and agents/; backfill missing docs before execution.
  • Constraint hygiene (prompt-architect): collect HARD/SOFT/INFERRED requirements (targets, latency, memory, compliance).
  • Validation discipline (skill-forge): adversarial tests for data leakage, class imbalance, and distribution shift; always run baseline + ablations.
  • Evidence + confidence ceiling: report metrics with data splits and Confidence: X.XX (ceiling: TYPE Y.YY) (inference/report 0.70; research 0.85; observation/definition 0.95).
  • Safety: never evaluate on train data; never touch test set until final validation; document assumptions and monitoring plan.

Execution Phases

  1. Intake & Goals
    • Identify objective, metrics (accuracy/F1/RMSE/latency), constraints (hardware, model size, privacy).
    • Confirm data availability, provenance, and allowed tooling.
  2. Design
    • Choose architecture and loss/optimization strategy; plan data splits and augmentation; define monitoring signals.
    • Draft experiment plan with baseline + targeted variants.
  3. Implementation
    • Build reproducible pipelines (seed control, config versioning); implement training loop with logging (TensorBoard/MLflow/W&B).
    • Enforce safe defaults: mixed precision gated by tests, gradient clipping where appropriate, checkpointing with retention policy.
  4. Validation
    • Run baseline then ablations; check class-wise metrics, calibration, and drift sensitivity.
    • Profile training/inference latency; quantify memory footprint.
    • Security checks: adversarial probes, prompt/feature injection handling for LLM/vision models.
  5. Deployment Readiness
    • Package artifacts (model weights, config, preprocessing, schema); provide rollout + rollback steps.
    • Attach monitoring plan (drift, performance, cost) and ownership.

Output Format

  • Request summary and constraints (HARD/SOFT/INFERRED).
  • Architecture + data plan, experiment matrix, and validation results.
  • Deployment checklist with monitoring hooks and rollback path.
  • Confidence statement with ceiling and evidence source.

Validation Checklist

  • Data splits clean (no leakage) and documented.
  • Baseline + ablations executed; metrics reported with variance.
  • Latency/memory within targets; profiling attached.
  • Safety checks run (bias, drift, adversarial probes) or noted N/A.
  • Reproducibility ensured (seeds/configs/versioning).
  • Confidence ceiling stated.

VCL COMPLIANCE APPENDIX (Internal)

[[HON:teineigo]] [[MOR:root:M-L]] [[COM:Model+Schmiede]] [[CLS:ge_skill]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:path:/skills/specialists/ml-expert]]

[[HON:teineigo]] [[MOR:root:E-P-S]] [[COM:Epistemik+Tavan]] [[CLS:ge_rule]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:coord:EVD-CONF]]

[[HON:teineigo]] [[MOR:root:S-F-T]] [[COM:Safety+Test]] [[CLS:ge_guardrail]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:axis:quality]]

Confidence: 0.74 (ceiling: inference 0.70) - SOP rebuilt with prompt-architect constraints and skill-forge validation loops while preserving ML execution depth.