unsloth-dpo

Direct Preference Optimization (DPO) for aligning models with preference data without separate reward models. Triggers: dpo, preference optimization, rlhf, ref_model=none, patchdpotrainer, dpotrainer.

$ 설치

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/unsloth-dpo ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill