cost-optimized-llm
Implement cost-optimized LLM routing with NO OpenAI. Use tiered model selection (DeepSeek, Haiku, Sonnet) to achieve 70-90% cost savings. Triggers on "LLM costs", "model selection", "cost optimization", "which model", "DeepSeek", "Claude pricing", "reduce AI costs".
$ Installer
git clone https://github.com/ScientiaCapital/scientia-superpowers /tmp/scientia-superpowers && cp -r /tmp/scientia-superpowers/skills/cost-optimized-llm ~/.claude/skills/scientia-superpowers// tip: Run this command in your terminal to install the skill
name: cost-optimized-llm description: Implement cost-optimized LLM routing with NO OpenAI. Use tiered model selection (DeepSeek, Haiku, Sonnet) to achieve 70-90% cost savings. Triggers on "LLM costs", "model selection", "cost optimization", "which model", "DeepSeek", "Claude pricing", "reduce AI costs".
Cost-Optimized LLM Routing
Achieve 70-90% cost savings with intelligent model routing. NO OpenAI allowed.
Critical Rule
NEVER use OpenAI models in this ecosystem.
Allowed providers:
- Anthropic Claude (Haiku, Sonnet, Opus)
- Google Gemini (Flash, Pro)
- DeepSeek (via OpenRouter)
- Qwen (via OpenRouter)
- Cerebras (speed-critical)
- Local: Ollama, sentence-transformers
Cost Comparison
| Model | Cost per 1M tokens | Use Case |
|---|---|---|
| DeepSeek V3 | $0.14 input / $0.28 output | Simple queries, classification |
| Claude Haiku | $0.25 input / $1.25 output | Moderate complexity |
| Gemini Flash | FREE (limited) | MVP, prototyping |
| Claude Sonnet | $3.00 input / $15.00 output | Complex reasoning |
| Claude Opus | $15.00 input / $75.00 output | Expert tasks only |
Tiered Routing Strategy
Tier 1: Simple Tasks → DeepSeek ($0.0001/1K)
Use for:
- Text classification
- Simple extractions
- Formatting
- Basic Q&A
- Sentiment analysis
from openai import OpenAI # OpenRouter uses OpenAI SDK
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"]
)
response = client.chat.completions.create(
model="deepseek/deepseek-chat",
messages=[{"role": "user", "content": prompt}],
max_tokens=500
)
Tier 2: Moderate Tasks → Claude Haiku ($0.00075/1K)
Use for:
- Code review
- Summarization
- Multi-step reasoning
- Data analysis
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
Tier 3: Complex Tasks → Claude Sonnet ($0.009/1K)
Use for:
- Architecture decisions
- Complex code generation
- Multi-file refactoring
- Nuanced analysis
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
Automatic Routing Implementation
from enum import Enum
from typing import Literal
class TaskComplexity(Enum):
SIMPLE = "simple"
MODERATE = "moderate"
COMPLEX = "complex"
def route_to_model(complexity: TaskComplexity) -> str:
"""Route to appropriate model based on complexity."""
routing = {
TaskComplexity.SIMPLE: "deepseek/deepseek-chat",
TaskComplexity.MODERATE: "claude-3-5-haiku-20241022",
TaskComplexity.COMPLEX: "claude-sonnet-4-20250514"
}
return routing[complexity]
def estimate_complexity(prompt: str) -> TaskComplexity:
"""Estimate task complexity from prompt characteristics."""
# Simple heuristics
word_count = len(prompt.split())
has_code = "```" in prompt or "def " in prompt or "function" in prompt
has_analysis = any(w in prompt.lower() for w in ["analyze", "compare", "evaluate"])
if word_count < 50 and not has_code and not has_analysis:
return TaskComplexity.SIMPLE
elif word_count < 200 or (has_code and not has_analysis):
return TaskComplexity.MODERATE
else:
return TaskComplexity.COMPLEX
def smart_complete(prompt: str, force_model: str = None) -> str:
"""Complete with automatic model routing."""
if force_model:
model = force_model
else:
complexity = estimate_complexity(prompt)
model = route_to_model(complexity)
# Route to appropriate client
if model.startswith("deepseek"):
return call_openrouter(model, prompt)
else:
return call_anthropic(model, prompt)
Free Tier Strategy (Gemini Flash)
For MVPs and prototyping, use Gemini Flash (FREE):
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(prompt)
Limits:
- 15 requests/minute
- 1 million tokens/day
- 1,500 requests/day
Cost Tracking
Track costs per project:
import json
from datetime import datetime
from pathlib import Path
COST_LOG = Path.home() / ".claude" / "llm_costs.jsonl"
def log_cost(project: str, model: str, input_tokens: int, output_tokens: int):
"""Log LLM usage for cost tracking."""
costs = {
"deepseek/deepseek-chat": (0.00014, 0.00028),
"claude-3-5-haiku-20241022": (0.00025, 0.00125),
"claude-sonnet-4-20250514": (0.003, 0.015),
"gemini-1.5-flash": (0, 0) # Free
}
input_cost, output_cost = costs.get(model, (0.01, 0.03))
total = (input_tokens / 1_000_000 * input_cost) + (output_tokens / 1_000_000 * output_cost)
entry = {
"timestamp": datetime.utcnow().isoformat(),
"project": project,
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd": round(total, 6)
}
with open(COST_LOG, "a") as f:
f.write(json.dumps(entry) + "\n")
return total
Voice AI Cost Optimization
For voice pipelines (vozlux, solarvoice-ai):
STT (Speech-to-Text)
- Deepgram Nova-2: $0.0043/min (recommended)
- AssemblyAI: $0.00025/sec
TTS (Text-to-Speech)
- Cartesia Sonic-3: ~$0.01/1K chars (quality)
- AWS Polly: ~$0.004/1K chars (budget)
Tier-Based Voice Routing
def get_voice_tier(subscription: str) -> dict:
tiers = {
"starter": {
"tts": "polly",
"stt": "deepgram-base",
"llm": "deepseek"
},
"pro": {
"tts": "cartesia",
"stt": "deepgram-nova",
"llm": "haiku"
},
"enterprise": {
"tts": "cartesia",
"stt": "deepgram-nova",
"llm": "sonnet"
}
}
return tiers.get(subscription, tiers["starter"])
Monthly Budget Estimates
For a typical Scientia project:
| Usage Level | DeepSeek Heavy | Mixed Tier | Sonnet Heavy |
|---|---|---|---|
| Light (10K queries) | $1.40 | $8 | $90 |
| Medium (100K queries) | $14 | $80 | $900 |
| Heavy (1M queries) | $140 | $800 | $9,000 |
Recommendation: Use Mixed Tier routing for 90%+ of use cases.
Environment Variables
Required in .env:
# Primary (Anthropic)
ANTHROPIC_API_KEY=sk-ant-...
# Cost optimization (OpenRouter for DeepSeek)
OPENROUTER_API_KEY=sk-or-...
# Free tier (Google)
GOOGLE_API_KEY=AIza...
# NEVER set these:
# OPENAI_API_KEY= # FORBIDDEN
Validation
lang-core enforces NO OpenAI at runtime:
def validate_environment():
"""Block OpenAI usage."""
if os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError(
"OpenAI is not allowed in Scientia projects. "
"Use ANTHROPIC_API_KEY or OPENROUTER_API_KEY instead."
)
Repository
