Marketplace
rlhf
Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.
$ 설치
git clone https://github.com/itsmostafa/llm-engineering-skills /tmp/llm-engineering-skills && cp -r /tmp/llm-engineering-skills/skills/rlhf ~/.claude/skills/llm-engineering-skills// tip: Run this command in your terminal to install the skill
Repository

itsmostafa
Author
itsmostafa/llm-engineering-skills/skills/rlhf
1
Stars
0
Forks
Updated6d ago
Added6d ago