Marketplace

prepare-dataset

Process and validate datasets for training. Use when setting up data pipelines.

$ インストール

git clone https://github.com/mvillmow/ProjectOdyssey /tmp/ProjectOdyssey && cp -r /tmp/ProjectOdyssey/.claude/skills/tier-2/prepare-dataset ~/.claude/skills/ProjectOdyssey

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: prepare-dataset description: "Process and validate datasets for training. Use when setting up data pipelines." mcp_fallback: none category: ml tier: 2 user-invocable: false

Prepare Dataset

Load, preprocess, and validate datasets for machine learning model training including normalization and augmentation.

When to Use

Setting up data pipelines for training
Normalizing and cleaning raw data
Splitting into train/validation/test sets
Applying data augmentation

Quick Reference

# Dataset preparation pipeline
class DatasetLoader:
    def load(self, path: str) -> Tuple[ndarray, ndarray]:
        # Load raw data
        pass

    def normalize(self, data: ndarray) -> ndarray:
        # Normalize to [0, 1] or standardize
        pass

    def split(self, data: ndarray, ratios: Tuple[float, float, float]):
        # Split into train/val/test
        pass

    def augment(self, data: ndarray) -> ndarray:
        # Apply transformations if needed
        pass

Workflow

Load raw data: Read dataset from file (CSV, HDF5, NumPy)
Validate data: Check shape, dtype, missing values
Preprocess: Normalize, standardize, encode categorical features
Split sets: Create train/validation/test splits
Augment data: Apply transformations if needed (rotation, flip, etc.)

Output Format

Dataset preparation report:

Raw data shape and statistics
Data validation results (missing values, outliers)
Preprocessing applied (normalization, encoding)
Train/val/test split sizes
Final dataset shape and statistics
Augmentation transformations applied

References

See extract-hyperparameters skill for data preprocessing config
See evaluate-model skill for test set evaluation
See /notes/review/mojo-ml-patterns.md for Mojo data loading