hpc-dev-testing-workflow

Efficient development testing workflow for HPC environments with VS Code tunneling. Trigger: testing code changes on HPC, development workflow with external data directories, VS Code Remote SSH development

$ 安裝

git clone https://github.com/smith6jt-cop/Skills_Registry /tmp/Skills_Registry && cp -r /tmp/Skills_Registry/plugins/general/hpc-dev-testing-workflow/skills/hpc-dev-testing-workflow ~/.claude/skills/Skills_Registry

// tip: Run this command in your terminal to install the skill


name: hpc-dev-testing-workflow description: "Efficient development testing workflow for HPC environments with VS Code tunneling. Trigger: testing code changes on HPC, development workflow with external data directories, VS Code Remote SSH development" author: KINTSUGI Team date: 2025-12-14

HPC Development Testing Workflow

Experiment Overview

ItemDetails
Date2025-12-14
GoalCreate efficient workflow for testing code changes when data is external to repo
EnvironmentHiperGator HPC, VS Code Remote SSH, Python scientific pipeline
StatusSuccess

Problem Statement

When developing scientific pipelines on HPC with VS Code tunneling:

  • Repository code lives in one directory
  • Project data (large image files) lives in external directory
  • After each code change, user must:
    1. Re-initialize project folder to get updated code
    2. Open project folder in VS Code (ends Claude session)
    3. Copy errors back to repo session

This creates a fragmented workflow that kills productivity.

Verified Solution

1. VS Code Multi-Root Workspace

Create a workspace file that opens all folders simultaneously:

// kintsugi-dev.code-workspace
{
  "folders": [
    { "path": "/path/to/repo", "name": "Repo (code)" },
    { "path": "/path/to/repo/test_data/mini_project", "name": "Test Project" },
    { "path": "/path/to/full_project", "name": "Full Project" }
  ],
  "settings": {
    "python.defaultInterpreterPath": "/path/to/repo/.venv/bin/python",
    "search.exclude": {
      "**/data/raw": true,
      "**/data/processed": true,
      "**/*.zarr": true
    }
  }
}

Open with: code /path/to/kintsugi-dev.code-workspace

Benefits:

  • Claude session stays alive across all folders
  • Switch between repo and project without losing context
  • Single kernel/terminal session spans all work

2. Minimal Test Dataset Inside Repo

Create a small test dataset inside the repository:

repo/
├── src/
├── notebooks/
└── test_data/
    └── mini_project/
        ├── data/raw/cyc001/  # Subset of tiles
        ├── meta/
        └── notebooks/

Key decisions:

  • Extract center tiles (most representative of real processing)
  • Include all z-planes (tests full stack processing)
  • Include all channels (tests multi-channel workflows)
  • Use 2-3 cycles (tests multi-cycle registration)
  • Keep size manageable (~3-5 GB vs 50+ GB full dataset)

3. Tile Renumbering Script

When extracting tiles, renumber them to form a valid grid:

#!/usr/bin/env python3
"""Extract and renumber center tiles for test dataset."""

import shutil
from pathlib import Path

# For 13x9 grid, center 2x2 tiles (snake pattern)
TILE_MAPPING = {
    58: 1,  # Row 4, Col 5 → position (0,0)
    59: 2,  # Row 4, Col 6 → position (0,1)
    73: 3,  # Row 5, Col 5 → position (1,0) - snake reverses
    72: 4,  # Row 5, Col 6 → position (1,1) - snake reverses
}

def rename_tile(filename: str, old_tile: int, new_tile: int) -> str:
    old_str = f"1_{old_tile:05d}_"
    new_str = f"1_{new_tile:05d}_"
    return filename.replace(old_str, new_str)

def copy_cycle(source_dir: Path, dest_dir: Path):
    dest_dir.mkdir(parents=True, exist_ok=True)
    for old_tile, new_tile in TILE_MAPPING.items():
        pattern = f"1_{old_tile:05d}_*.tif"
        for src_file in source_dir.glob(pattern):
            new_name = rename_tile(src_file.name, old_tile, new_tile)
            shutil.copy2(src_file, dest_dir / new_name)

4. Updated Test Parameters

After creating test dataset, update notebook parameters:

# Original parameters (full 13x9 grid)
n = 9   # rows
m = 13  # columns

# Test parameters (2x2 grid)
n = 2   # rows
m = 2   # columns
overlap_percentage = 30  # keep same

Failed Attempts (Critical)

AttemptWhy it FailedLesson Learned
Symlinks to notebooksChanges don't reflect immediately with editable installCopy notebooks, use %autoreload 2
Single folder workflowConstantly switching folders kills Claude sessionMulti-root workspace keeps session
Random tile subsetTiles don't form valid grid for stitchingMust extract contiguous tiles and renumber
Too small test set (1 tile)Can't test stitching, registration, or multi-tile algorithmsNeed minimum 2x2 grid
Too large test set (full row)Still too slow for rapid iteration2x2 is optimal balance
PYTHONPATH manipulationConfusing, doesn't handle notebook importsEditable install + autoreload is cleaner

Workflow Summary

┌─────────────────────────────────────────────────────────┐
│           VS Code Multi-Root Workspace                   │
├─────────────────┬─────────────────┬─────────────────────┤
│   Repo (code)   │  Test Project   │   Full Project      │
│                 │                 │                     │
│  Make changes   │  Quick test     │  Final validation   │
│  Claude session │  2x2 grid       │  Full 13x9 grid     │
│  ~0 GB data     │  ~3 GB data     │  ~50+ GB data       │
└─────────────────┴─────────────────┴─────────────────────┘
         │                │                  │
         └────────────────┴──────────────────┘
                    Single session

Test Dataset Specifications

ParameterTest DatasetFull Dataset
Grid size2×213×9 (117 tiles)
Tiles4117
Cycles39+
Channels44
Z-planes1313
Files/cycle2086,084
Total size~3 GB~50+ GB
Process time~2-5 min~30-60 min

Key Configuration Files

Workspace File Location

/blue/maigan/smith6jt/kintsugi-dev.code-workspace

Test Data Setup Script

repo/test_data/setup_test_data.py

Regenerate Test Data

python test_data/setup_test_data.py
kintsugi init test_data/mini_project --name "Dev Test 2x2" --force

Platform-Specific Notes

HiperGator (SLURM)

  • Use /blue/ filesystem for large data (faster I/O)
  • Workspace file can live in home directory
  • VS Code Server runs on login node, compute via SLURM

VS Code Remote SSH

  • Install "Remote - SSH" extension
  • Workspace file must use absolute paths
  • Search exclude patterns prevent indexing large data dirs

References

Repository

smith6jt-cop
smith6jt-cop
Author
smith6jt-cop/Skills_Registry/plugins/general/hpc-dev-testing-workflow/skills/hpc-dev-testing-workflow
0
Stars
0
Forks
Updated11h ago
Added1w ago