Marketplace
prepare-dataset
Process and validate datasets for training. Use when setting up data pipelines.
$ インストール
git clone https://github.com/mvillmow/ProjectOdyssey /tmp/ProjectOdyssey && cp -r /tmp/ProjectOdyssey/.claude/skills/tier-2/prepare-dataset ~/.claude/skills/ProjectOdyssey// tip: Run this command in your terminal to install the skill
SKILL.md
name: prepare-dataset description: "Process and validate datasets for training. Use when setting up data pipelines." mcp_fallback: none category: ml tier: 2 user-invocable: false
Prepare Dataset
Load, preprocess, and validate datasets for machine learning model training including normalization and augmentation.
When to Use
- Setting up data pipelines for training
- Normalizing and cleaning raw data
- Splitting into train/validation/test sets
- Applying data augmentation
Quick Reference
# Dataset preparation pipeline
class DatasetLoader:
def load(self, path: str) -> Tuple[ndarray, ndarray]:
# Load raw data
pass
def normalize(self, data: ndarray) -> ndarray:
# Normalize to [0, 1] or standardize
pass
def split(self, data: ndarray, ratios: Tuple[float, float, float]):
# Split into train/val/test
pass
def augment(self, data: ndarray) -> ndarray:
# Apply transformations if needed
pass
Workflow
- Load raw data: Read dataset from file (CSV, HDF5, NumPy)
- Validate data: Check shape, dtype, missing values
- Preprocess: Normalize, standardize, encode categorical features
- Split sets: Create train/validation/test splits
- Augment data: Apply transformations if needed (rotation, flip, etc.)
Output Format
Dataset preparation report:
- Raw data shape and statistics
- Data validation results (missing values, outliers)
- Preprocessing applied (normalization, encoding)
- Train/val/test split sizes
- Final dataset shape and statistics
- Augmentation transformations applied
References
- See
extract-hyperparametersskill for data preprocessing config - See
evaluate-modelskill for test set evaluation - See
/notes/review/mojo-ml-patterns.mdfor Mojo data loading
Repository

mvillmow
Author
mvillmow/ProjectOdyssey/.claude/skills/tier-2/prepare-dataset
8
Stars
4
Forks
Updated2d ago
Added6d ago