ml-expert
Design, implement, and optimize production-grade ML models and training pipelines.
$ 설치
git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/specialists/ml-expert ~/.claude/skills/context-cascade// tip: Run this command in your terminal to install the skill
name: ml-expert description: Design, implement, and optimize production-grade ML models and training pipelines. allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite model: sonnet x-category: specialists x-version: 1.1.0 x-vcl-compliance: v3.1.1 x-cognitive-frames:
- HON
- MOR
- COM
- CLS
- EVD
- ASP
- SPC
STANDARD OPERATING PROCEDURE
Purpose
Ship resilient ML systems: architecture design, training pipelines, optimization, and deployment readiness with explicit guardrails.
Triggers
- Positive: Implementing architectures, training/tuning models, fixing training instabilities, optimizing inference, translating research into code.
- Negative: Pure data analysis (route to data scientist) or root-cause training incidents (prefer
ml-training-debuggerfirst).
Guardrails
- Structure-first: maintain
SKILL.md,examples/,tests/,resources/, andagents/; backfill missing docs before execution. - Constraint hygiene (prompt-architect): collect HARD/SOFT/INFERRED requirements (targets, latency, memory, compliance).
- Validation discipline (skill-forge): adversarial tests for data leakage, class imbalance, and distribution shift; always run baseline + ablations.
- Evidence + confidence ceiling: report metrics with data splits and
Confidence: X.XX (ceiling: TYPE Y.YY)(inference/report 0.70; research 0.85; observation/definition 0.95). - Safety: never evaluate on train data; never touch test set until final validation; document assumptions and monitoring plan.
Execution Phases
- Intake & Goals
- Identify objective, metrics (accuracy/F1/RMSE/latency), constraints (hardware, model size, privacy).
- Confirm data availability, provenance, and allowed tooling.
- Design
- Choose architecture and loss/optimization strategy; plan data splits and augmentation; define monitoring signals.
- Draft experiment plan with baseline + targeted variants.
- Implementation
- Build reproducible pipelines (seed control, config versioning); implement training loop with logging (TensorBoard/MLflow/W&B).
- Enforce safe defaults: mixed precision gated by tests, gradient clipping where appropriate, checkpointing with retention policy.
- Validation
- Run baseline then ablations; check class-wise metrics, calibration, and drift sensitivity.
- Profile training/inference latency; quantify memory footprint.
- Security checks: adversarial probes, prompt/feature injection handling for LLM/vision models.
- Deployment Readiness
- Package artifacts (model weights, config, preprocessing, schema); provide rollout + rollback steps.
- Attach monitoring plan (drift, performance, cost) and ownership.
Output Format
- Request summary and constraints (HARD/SOFT/INFERRED).
- Architecture + data plan, experiment matrix, and validation results.
- Deployment checklist with monitoring hooks and rollback path.
- Confidence statement with ceiling and evidence source.
Validation Checklist
- Data splits clean (no leakage) and documented.
- Baseline + ablations executed; metrics reported with variance.
- Latency/memory within targets; profiling attached.
- Safety checks run (bias, drift, adversarial probes) or noted N/A.
- Reproducibility ensured (seeds/configs/versioning).
- Confidence ceiling stated.
VCL COMPLIANCE APPENDIX (Internal)
[[HON:teineigo]] [[MOR:root:M-L]] [[COM:Model+Schmiede]] [[CLS:ge_skill]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:path:/skills/specialists/ml-expert]]
[[HON:teineigo]] [[MOR:root:E-P-S]] [[COM:Epistemik+Tavan]] [[CLS:ge_rule]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:coord:EVD-CONF]]
[[HON:teineigo]] [[MOR:root:S-F-T]] [[COM:Safety+Test]] [[CLS:ge_guardrail]] [[EVD:-DI]] [[ASP:nesov.]] [[SPC:axis:quality]]
Confidence: 0.74 (ceiling: inference 0.70) - SOP rebuilt with prompt-architect constraints and skill-forge validation loops while preserving ML execution depth.
Repository
