Model Manager
Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml.
$ 安裝
git clone https://github.com/sunholo-data/ailang /tmp/ailang && cp -r /tmp/ailang/.claude/skills/model-manager ~/.claude/skills/ailang// tip: Run this command in your terminal to install the skill
name: Model Manager description: Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml.
Model Manager
Test API access, validate configurations, and add new AI models to the AILANG eval suite.
Quick Start
Most common usage:
# User says: "Can we add GPT-5.1 to the eval suite?"
# This skill will:
# 1. Test API access to GPT-5.1
# 2. Find the correct API model name
# 3. Look up pricing information
# 4. Update models.yml configuration
# 5. Run a test benchmark to verify
When to Use This Skill
Invoke this skill when:
- User asks to "add a new model" to eval suite
- User mentions checking if a model is "accessible" or "available"
- User wants to "test API access" to a model
- User asks to "update models.yml" or "check pricing"
- User says "can we use [model name]?" for evaluations
Available Scripts
scripts/test_model_access.sh <provider> <model-name>
Test API access to a model and display authentication status.
Usage:
# Test OpenAI model
scripts/test_model_access.sh openai gpt-5.1
# Test Anthropic model
scripts/test_model_access.sh anthropic claude-sonnet-4-5-20250929
# Test Google Gemini via Vertex AI
scripts/test_model_access.sh google gemini-3-pro-preview-11-2025
Output:
Testing: openai/gpt-5.1
✓ OPENAI_API_KEY found
✓ API call successful
✓ Model: gpt-5.1-2025-11-13
✓ Tokens: 13 input, 10 output (10 reasoning)
Ready to add to models.yml
scripts/find_model_info.sh <model-keywords>
Search for model information using web search and return API names + pricing.
Usage:
# Find GPT-5.1 info
scripts/find_model_info.sh "GPT-5.1 API model name pricing"
# Find Gemini 3 Pro info
scripts/find_model_info.sh "Gemini 3 Pro API documentation"
Output:
Searching for: GPT-5.1 API model name pricing
✓ Found API names:
- gpt-5.1 (Thinking mode)
- gpt-5.1-chat-latest (Instant mode)
✓ Pricing:
Input: $1.25 per 1M tokens
Output: $10.00 per 1M tokens
Cached: $0.125 per 1M tokens
scripts/update_models_yml.sh <friendly-name> <api-name> <provider> <input-price> <output-price>
Add a new model to models.yml configuration.
Usage:
# Add GPT-5.1
scripts/update_models_yml.sh \
gpt5-1 \
"gpt-5.1" \
openai \
0.00125 \
0.01
Output:
Adding model to models.yml:
Friendly name: gpt5-1
API name: gpt-5.1
Provider: openai
Pricing: $0.00125 / $0.01 per 1K tokens
✓ Updated models.yml
✓ Validated YAML syntax
✓ Ready to test
scripts/verify_vertex_model.sh <model-name>
Check if a Gemini model is available in Vertex AI.
Usage:
# Check if Gemini 3 Pro is available
scripts/verify_vertex_model.sh gemini-3-pro-preview-11-2025
Output:
Checking Vertex AI for: gemini-3-pro-preview-11-2025
✓ GCP project: multivac-internal-prod
✓ Access token obtained
✗ Model not found (404)
Recommendation: Monitor for availability, check again in 1-2 weeks
scripts/run_test_benchmark.sh <model-name>
Run a small test benchmark to verify model works end-to-end.
Usage:
# Test GPT-5.1 with fizzbuzz benchmark
scripts/run_test_benchmark.sh gpt5-1
Output:
Running test benchmark: fizzbuzz
Model: gpt5-1
✓ Benchmark completed
✓ Result: PASS (100%)
✓ Tokens: 245 input, 89 output
✓ Cost: $0.002
Model is ready for production use
Workflow
1. Test API Access
First, verify you can call the model:
# Use test_model_access.sh
scripts/test_model_access.sh openai gpt-5.1
What to check:
- API key is set (OPENAI_API_KEY, ANTHROPIC_API_KEY, or gcloud auth)
- API call succeeds (not 401/403/404)
- Model returns expected structure
- Token usage is reported
For Gemini models:
- Uses Vertex AI (not public API)
- Requires
gcloud auth application-default login - Check availability with
verify_vertex_model.sh
2. Find Model Information
Search for official documentation:
# Find API model name and pricing
scripts/find_model_info.sh "GPT-5.1 API documentation pricing"
What to gather:
- Exact API model name (e.g.,
gpt-5.1notGPT-5.1) - Provider (openai, anthropic, google)
- Input price per 1K tokens
- Output price per 1K tokens
- Context limits (if relevant)
- Special features (adaptive reasoning, caching, etc.)
Reference: See resources/provider_endpoints.md
3. Update models.yml
Add the model configuration:
# Add to models.yml
scripts/update_models_yml.sh \
<friendly-name> \
<api-name> \
<provider> \
<input-per-1k> \
<output-per-1k>
Naming conventions:
- Friendly name:
gpt5-1,claude-sonnet-4-5,gemini-3-pro - API name: Exact string for API calls
- Use hyphens, lowercase
Also update:
- Model suites (
benchmark_suite,extended_suite,dev_models) - Add notes about special features
- Document agent CLI support (if available)
4. Run Test Benchmark
Verify end-to-end:
# Test with a simple benchmark
scripts/run_test_benchmark.sh <model-name>
What to verify:
- Benchmark completes successfully
- Results are reasonable (not garbage output)
- Token usage matches expectations
- Cost calculation works
- No errors in logs
5. Document the Model
Update relevant documentation:
- Add model to this skill's resource guide
- Note any special parameters (e.g.,
max_completion_tokensfor GPT-5.1) - Document authentication requirements
- Add to teaching prompts if needed
6. Optional: Run Full Eval
If model looks good:
# Run small eval suite
ailang eval-suite --models <model-name> --benchmarks fizzbuzz,recursion_factorial
# Run full suite (expensive!)
make eval-baseline EVAL_VERSION=vX.Y.Z FULL=true
Resources
Provider Endpoints
See resources/provider_endpoints.md for:
- API endpoint URLs for each provider
- Authentication methods
- How to test access manually
- Common errors and fixes
Pricing Guide
See resources/pricing_guide.md for:
- How to find official pricing
- Price conversion (per 1M → per 1K)
- Cost calculation verification
- Caching and discounts
Progressive Disclosure
This skill loads information progressively:
- Always loaded: This SKILL.md file (workflow and script descriptions)
- Execute as needed: Scripts in
scripts/(testing, updating, verification) - Load on demand: Resources (detailed endpoint docs, pricing references)
Notes
Important:
- Always test API access BEFORE updating models.yml
- Vertex AI (Gemini) requires gcloud auth, not API key
- GPT-5.1+ uses
max_completion_tokensinstead ofmax_tokens - New models may not be available in all regions immediately
- Check for preview/beta status before adding to production suites
Prerequisites:
- API keys set in environment (OPENAI_API_KEY, ANTHROPIC_API_KEY)
- For Gemini:
gcloudCLI installed and authenticated - For Gemini: GCP project set (
gcloud config set project PROJECT_ID) curl,python3, andjqavailable in PATH
Files modified by this skill:
internal/eval_harness/models.yml- Model configurations- (Optional)
prompts/vX.Y.Z.md- Teaching prompts - (Optional)
.claude/skills/model-manager/resources/- Local model database
Repository
