Unnamed Skill

Core fundamentals of Unsloth for fast LLM fine-tuning, covering FastLanguageModel setup, optimized gradient checkpointing, and native inference acceleration (triggers: unsloth, FastLanguageModel, from_pretrained, get_peft_model, for_inference, gradient checkpointing).

$ Instalar

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/unsloth-core ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: unsloth-core description: Core fundamentals of Unsloth for fast LLM fine-tuning, covering FastLanguageModel setup, optimized gradient checkpointing, and native inference acceleration (triggers: unsloth, FastLanguageModel, from_pretrained, get_peft_model, for_inference, gradient checkpointing).

Overview

Unsloth-core provides the foundational framework for accelerating Large Language Model (LLM) fine-tuning. It leverages specialized kernels and optimized gradient checkpointing to achieve up to 2x faster native inference and significant VRAM savings compared to standard Hugging Face implementations.

When to Use

When fine-tuning Llama-3, Mistral, or Phi models and seeking maximum speed.
When limited by VRAM and needing optimized gradient checkpointing.
When deploying models for inference and requiring low-latency responses.

Decision Tree

Are you starting a new fine-tuning run?
- Yes: Use FastLanguageModel.from_pretrained().
- No (Inference only): Load with for_inference=True.
Is your VRAM usage too high?
- Yes: Set use_gradient_checkpointing = 'unsloth'.
Are you ready for generation?
- Yes: Call FastLanguageModel.for_inference(model) to enable 2x speed.

Workflows

Basic Model Initialization: Import FastLanguageModel, load a model with specific max_seq_length and load_in_4bit, and configure PEFT adapters via get_peft_model().
Optimized Training Configuration: Configure SFTTrainer with use_gradient_checkpointing = 'unsloth' and monitor loss values (ideally between 0.5 and 1.0).
Saving and Post-Training Inference: Save LoRA adapters using save_pretrained(), then enable optimized inference via for_inference(model) for generation.

Non-Obvious Insights

Unsloth's native inference is not automatic; users must explicitly call FastLanguageModel.for_inference(model) to see the 2x performance gain.
The specialized 'unsloth' gradient checkpointing option is 30% more VRAM efficient than the default standard implementation.
Unsloth fixes known tokenizer and chat template bugs present in official model releases to improve fine-tuning accuracy.

Evidence

"Unsloth itself provides 2x faster inference natively as well, so always do not forget to call FastLanguageModel.for_inference(model)." Source
"We leverage our smart Unsloth gradient checkpointing algorithm... It smartly offloads intermediate activations to system RAM asynchronously whilst being only 1% slower." Source

Scripts

scripts/unsloth-core_tool.py: Python utility for model initialization and training setup.
scripts/unsloth-core_tool.js: Node.js interface for managing Unsloth training configurations.

Dependencies

unsloth
torch
transformers
peft
trl

References

references/README.md