cpu-gpu-performance
Monitor and optimize CPU/GPU usage with load measurement and cost-effective validation strategies. Triggers: CPU usage, GPU usage, performance, load monitoring, build performance, training, resource consumption, test suite, compilation Use when: session starts (auto-load with token-conservation), planning builds or training that could pin CPUs/GPUs for >1 minute, retrying failed resource-heavy commands DO NOT use when: simple operations with no resource impact. DO NOT use when: quick single-file operations. Use this skill BEFORE resource-intensive operations. Establish baselines proactively.
$ 설치
git clone https://github.com/athola/claude-night-market /tmp/claude-night-market && cp -r /tmp/claude-night-market/plugins/conserve/skills/performance-monitoring/cpu-gpu-performance ~/.claude/skills/claude-night-market// tip: Run this command in your terminal to install the skill
name: cpu-gpu-performance description: | Monitor and optimize CPU/GPU usage with load measurement and cost-effective validation strategies.
Triggers: CPU usage, GPU usage, performance, load monitoring, build performance, training, resource consumption, test suite, compilation
Use when: session starts (auto-load with token-conservation), planning builds or training that could pin CPUs/GPUs for >1 minute, retrying failed resource-heavy commands
DO NOT use when: simple operations with no resource impact. DO NOT use when: quick single-file operations.
Use this skill BEFORE resource-intensive operations. Establish baselines proactively. location: plugin token_budget: 400 progressive_loading: true dependencies: hub: [token-conservation] modules: []
CPU/GPU Performance Discipline
When to Use
- At the beginning of every session (auto-load alongside
token-conservation). - Whenever you plan to build, train, or test anything that could pin CPU cores or GPUs for more than a minute.
- Before retrying a failing command that previously consumed significant resources.
Required TodoWrite Items
cpu-gpu-performance:baselinecpu-gpu-performance:scopecpu-gpu-performance:instrumentcpu-gpu-performance:throttlecpu-gpu-performance:log
Step 1 – Establish Current Baseline (baseline)
-
Capture current utilization:
uptimeps -eo pcpu,cmd | headnvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv
Note which hosts/GPUs are already busy.
-
Record any CI/cluster budgets (time quotas, GPU hours) before launching work.
-
Set a per-task CPU minute / GPU minute budget that respects those limits.
Step 2 – Narrow the Scope (scope)
- Avoid running "whole world" jobs after a small fix. Prefer diff-based
or tag-based selective testing:
pytest -k- Bazel target patterns
cargo test <module>
- Batch low-level fixes so you can validate multiple changes with a single targeted command.
- For GPU jobs, favor unit-scale smoke inputs or lower epoch counts before scheduling the full training/eval sweep.
Step 3 – Instrument Before You Optimize (instrument)
- Pick the right profiler/monitor:
- CPU work:
perfintel vtunecargo flamegraph- language-specific profilers
- GPU work:
nvidia-smi dmonnsysnvprof- DLProf
- framework timeline tracers
- CPU work:
- Capture kernel/ops timelines, memory footprints, and data pipeline latency so you have evidence when throttling or parallelizing.
- Record hot paths + I/O bottlenecks in notes so future reruns can jump straight to the culprit.
Step 4 – Throttle and Sequence Work (throttle)
- Use
nice,ionice, or Kubernetes/Slurm quotas to prevent starvation of shared nodes. - Chain heavy tasks with guardrails:
- Rerun only the failed test/module
- Then (optionally) escalate to the next-wider shard
- Reserve the full suite for the final gate
- Stagger GPU kernels (smaller batch sizes or gradient accumulation) when memory pressure risks eviction; prefer checkpoint/restore over restarts.
Step 5 – Log Decisions + Next Steps (log)
Conclude by documenting the commands that were run and their resource cost (duration, CPU%, GPU%), confirming whether they remained within the per-task budget. If a full suite or long training run was necessary, justify why selective or staged approaches were not feasible. Capture any follow-up tasks, such as adding a new test marker or profiling documentation, to streamline future sessions.
Output Expectations
- Brief summary covering:
- baseline metrics
- scope chosen
- instrumentation captured
- throttling tactics
- follow-up items
- Concrete example(s) of what ran (e.g.):
- "reran
pytest tests/test_orders.py -k test_refundinstead ofpytest -m slow" - "profiled
nvidia-smi dmonoutput to prove GPU idle time before scaling"
- "reran
Repository
