unsloth-grpo

Implementation of Group Relative Policy Optimization (GRPO) for training reasoning models, optimized for 8x memory savings (triggers: GRPO, reasoning, DeepSeek-R1, reinforcement learning, RLVR, GRPOTrainer, thinking tokens).

$ インストール

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/unsloth-grpo ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill