unsloth-grpo
Implementation of Group Relative Policy Optimization (GRPO) for training reasoning models, optimized for 8x memory savings (triggers: GRPO, reasoning, DeepSeek-R1, reinforcement learning, RLVR, GRPOTrainer, thinking tokens).
$ インストール
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/unsloth-grpo ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
Repository

majiayu000
Author
majiayu000/claude-skill-registry/skills/data/unsloth-grpo
0
Stars
0
Forks
Updated5d ago
Added5d ago