Marketplace
llm-basics
LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.
$ Instalar
git clone https://github.com/pluginagentmarketplace/custom-plugin-ai-engineer /tmp/custom-plugin-ai-engineer && cp -r /tmp/custom-plugin-ai-engineer/skills/llm-basics ~/.claude/skills/custom-plugin-ai-engineer// tip: Run this command in your terminal to install the skill
SKILL.md
name: llm-basics description: LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models. sasmp_version: "1.3.0" bonded_agent: 01-llm-fundamentals bond_type: PRIMARY_BOND
LLM Basics
Master the fundamentals of Large Language Models.
Quick Start
Using OpenAI API
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformers briefly."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
Using Hugging Face
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Hello, how are", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Core Concepts
Transformer Architecture
Input → Embedding → [N × Transformer Block] → Output
Transformer Block:
┌───────────────────────────┐
│ Multi-Head Self-Attention │
├───────────────────────────┤
│ Layer Normalization │
├───────────────────────────┤
│ Feed-Forward Network │
├───────────────────────────┤
│ Layer Normalization │
└───────────────────────────┘
Tokenization
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello, world!"
# Encode
tokens = tokenizer.encode(text)
print(tokens) # [15496, 11, 995, 0]
# Decode
decoded = tokenizer.decode(tokens)
print(decoded) # "Hello, world!"
Key Parameters
# Generation parameters
params = {
'temperature': 0.7, # Randomness (0-2)
'max_tokens': 1000, # Output length limit
'top_p': 0.9, # Nucleus sampling
'top_k': 50, # Top-k sampling
'frequency_penalty': 0, # Reduce repetition
'presence_penalty': 0 # Encourage new topics
}
Model Comparison
| Model | Parameters | Context | Best For |
|---|---|---|---|
| GPT-4 | ~1.7T | 128K | Complex reasoning |
| GPT-3.5 | 175B | 16K | General tasks |
| Claude 3 | N/A | 200K | Long context |
| Llama 2 | 7-70B | 4K | Open source |
| Mistral 7B | 7B | 32K | Efficient inference |
Local Inference
With Ollama
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Run a model
ollama run llama2
# API usage
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?"
}'
With vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Llama-2-7b-hf")
sampling = SamplingParams(temperature=0.8, max_tokens=100)
outputs = llm.generate(["Hello, my name is"], sampling)
Best Practices
- Start simple: Use API before local deployment
- Mind context: Stay within context window limits
- Temperature tuning: Lower for facts, higher for creativity
- Token efficiency: Shorter prompts = lower costs
- Streaming: Use for better UX in applications
Error Handling & Retry
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(prompt: str) -> str:
return client.chat.completions.create(...)
Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
| Rate limit errors | Too many requests | Add exponential backoff |
| Empty response | max_tokens=0 | Check parameter values |
| High latency | Large model | Use smaller model |
| Timeout | Prompt too long | Reduce input size |
Unit Test Template
def test_llm_completion():
response = call_llm("Hello")
assert response is not None
assert len(response) > 0
Repository

pluginagentmarketplace
Author
pluginagentmarketplace/custom-plugin-ai-engineer/skills/llm-basics
1
Stars
0
Forks
Updated21h ago
Added1w ago