ruby-integration
This skill is for writing integrations to the Ruby SDK. Claude acts as the engineer implementing LLM provider or agentic framework integrations. Use when adding support for OpenAI-like providers, Anthropic-like providers, or agent frameworks. Covers TDD workflow, comprehensive testing (streaming/non-streaming/tokens/multimodal), defensive coding, MCP validation, and StandardRB compliance.
$ 安裝
git clone https://github.com/braintrustdata/braintrust-sdk-ruby /tmp/braintrust-sdk-ruby && cp -r /tmp/braintrust-sdk-ruby/.claude/skills/ruby-integration ~/.claude/skills/braintrust-sdk-ruby// tip: Run this command in your terminal to install the skill
name: ruby-integration description: This skill is for writing integrations to the Ruby SDK. Claude acts as the engineer implementing LLM provider or agentic framework integrations. Use when adding support for OpenAI-like providers, Anthropic-like providers, or agent frameworks. Covers TDD workflow, comprehensive testing (streaming/non-streaming/tokens/multimodal), defensive coding, MCP validation, and StandardRB compliance.
Writing Ruby SDK Integrations
This skill is for writing integrations. Claude acts as the Braintrust engineer implementing new integrations to the Ruby SDK.
Reference Integrations
Study existing integrations as examples:
- OpenAI:
lib/braintrust/trace/contrib/openai.rb(tests:test/braintrust/trace/openai_test.rb, example:examples/openai.rb) - Anthropic:
lib/braintrust/trace/contrib/anthropic.rb(tests:test/braintrust/trace/anthropic_test.rb, example:examples/anthropic.rb)
Important Notes:
- Examine the library thoroughly - Study the library's documentation and source code to identify ALL critical methods that call LLMs/AI services. Plan to trace every method that makes API calls, not just the obvious ones.
- Some integrations (e.g. ruby-llm) support multiple providers (e.g. OpenAI and Anthropic). Test all supported providers.
Core Pattern: Module Prepending
# frozen_string_literal: true
module Braintrust
module Trace
module YourProvider
def self.wrap(client = nil, tracer_provider: nil)
tracer_provider ||= ::OpenTelemetry.tracer_provider
# Idempotent wrapping: check if already wrapped
return client if client && client.instance_variable_get(:@braintrust_wrapped)
# Support class-level wrapping: wrap() with no args wraps class globally
if client.nil?
# Class wrapping: YourProvider.prepend(wrapper)
# Instance wrapping: client.singleton_class.prepend(wrapper)
end
wrapper = Module.new do
define_method(:your_api_method) do |**params|
tracer = tracer_provider.tracer("braintrust")
tracer.in_span("your_provider.operation") do |span|
# IMPORTANT: Start span FIRST (before metadata extraction) for accurate timing
# 1. Capture input
set_json_attr(span, "braintrust.input_json", extract_input(params))
# 2. Set metadata (provider, model, endpoint, all params)
set_json_attr(span, "braintrust.metadata", {
"provider" => "your_provider",
"endpoint" => "/v1/endpoint",
"model" => params[:model]
}.compact)
# 3. Call original
response = super(**params)
# 4. Capture output
set_json_attr(span, "braintrust.output_json", extract_output(response))
# 5. Capture metrics (normalized tokens)
set_json_attr(span, "braintrust.metrics", parse_usage_tokens(response.usage))
response
end
end
end
client.your_api.singleton_class.prepend(wrapper)
client.instance_variable_set(:@braintrust_wrapped, true) if client
client
end
## Code Organization
- Break large methods (>50 lines) into focused helpers
- Separate streaming/non-streaming into distinct handler methods (e.g., `handle_streaming_request`, `handle_non_streaming_request`)
- Extract metadata/input/output capture into helper methods (e.g., `extract_metadata`, `build_input_messages`, `capture_output`)
private
def self.set_json_attr(span, key, value)
span.set_attribute(key, JSON.generate(value)) if value
rescue => e
warn "Failed to serialize #{key}: #{e.message}"
end
def self.parse_usage_tokens(usage)
return {} unless usage
{
"prompt_tokens" => usage[:input_tokens] || usage[:prompt_tokens],
"completion_tokens" => usage[:output_tokens] || usage[:completion_tokens],
"tokens" => usage[:total_tokens]
}.compact
end
end
end
end
Streaming Pattern
define_method(:stream) do |**params|
tracer = tracer_provider.tracer("braintrust")
aggregated_chunks = []
span = tracer.start_span("your_provider.operation.stream")
set_json_attr(span, "braintrust.input_json", extract_input(params))
set_json_attr(span, "braintrust.metadata", extract_metadata(params))
stream = begin
super(**params)
rescue => e
span.record_exception(e)
span.status = ::OpenTelemetry::Trace::Status.error("Error: #{e.message}")
span.finish
raise
end
original_each = stream.method(:each)
stream.define_singleton_method(:each) do |&block|
original_each.call do |chunk|
aggregated_chunks << chunk
block&.call(chunk)
end
rescue => e
span.record_exception(e)
span.status = ::OpenTelemetry::Trace::Status.error("Streaming error: #{e.message}")
raise
ensure
# CRITICAL: Always finish span even if stream partially consumed
unless aggregated_chunks.empty?
aggregated = aggregate_chunks(aggregated_chunks)
set_json_attr(span, "braintrust.output_json", aggregated)
set_json_attr(span, "braintrust.metrics", parse_usage_tokens(aggregated[:usage]))
end
span.finish
end
stream
end
Examples
Write two examples:
- Customer example (
examples/your_provider.rb): Concise example demonstrating setup and basic usage - Internal example (
examples/internal/your_provider.rb): Comprehensive example using every library feature
Follow existing example patterns:
- Nest all API calls under a manual root span (see
examples/openai.rb):tracer = OpenTelemetry.tracer_provider.tracer("your-provider-example") root_span = nil response = tracer.in_span("examples/your_provider.rb") do |span| root_span = span client.your_api.call(...) # Automatically traced, nested under root_span end - Use consistent nomenclature for spans and projects
- Print permalink at end:
Braintrust::Trace.permalink(root_span)
Required Components
Do in this order:
- Appraisals FIRST: Add to
Appraisalsfile (latest + 2 recent + uninstalled), runbundle exec appraisal generate - Tests:
test/braintrust/trace/your_provider_test.rb - Integration:
lib/braintrust/trace/contrib/your_provider.rb - VCR cassettes:
test/fixtures/vcr_cassettes/your_provider/(record as you write tests) - Auto-load: Add to
lib/braintrust/trace.rbwithbegin/rescue LoadError - Example:
examples/your_provider.rb - Example:
examples/internal/your_provider.rb(comprehensive internal example) - Env var: Add to
.env.exampleif needed
Test Coverage (LLM Providers)
- ✅ Non-streaming requests (basic + attributes + metrics)
- ✅ Streaming requests (full consumption)
- ✅ Early stream termination (partial consumption)
- ✅ Error handling (exception recording)
- ✅ All critical features - Test ALL provider capabilities:
- Tool/function calling (if supported)
- Images/vision (if supported)
- System messages (if supported)
- Multiple messages/chat history (if supported)
- Any other provider-specific features
- ✅ Token usage edge cases (cached, reasoning tokens)
- ✅ Multiple APIs (if provider has multiple endpoints)
- ✅ Verify we don't change the behaviour of the integration
- ✅ LLM wrapper libraries - If tracing a library that wraps LLM providers (e.g., ruby_llm→OpenAI), verify traces match the underlying provider exactly (tools format, token format, output structure). Compare side-by-side with
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=1
Appraisal Configuration (Set up FIRST)
CRITICAL: Configure appraisal at the START, before writing tests. Test latest + 2 recent versions + uninstalled.
Step 1 - Add to Appraisals file:
# Appraisals file - ADD THIS FIRST
appraise "your_provider-latest" do
gem "your_provider", ">= 2.0"
end
appraise "your_provider-1.5" do
gem "your_provider", "~> 1.5.0"
end
appraise "your_provider-1.0" do
gem "your_provider", "~> 1.0.0"
end
appraise "your_provider-uninstalled" do
remove_gem "your_provider"
end
Step 2 - Generate gemfiles:
bundle exec appraisal generate
Step 3 - Use appraisal for ALL test runs:
bundle exec appraisal rake test # Run all scenarios (use this in TDD cycle)
Determine versions: Check release history, focus on API changes, include customer-likely versions.
Testing Tools & Validation
Use multiple testing approaches to validate your integration:
1. Unit Tests (Primary)
- Location:
test/braintrust/trace/your_provider_test.rb - Purpose: Test all code paths, edge cases, and error handling
- Run:
bundle exec appraisal rake test - Coverage: Track with
bundle exec rake coverage(>90% line, >80% branch)
2. Console Log Inspection
- Purpose: Quickly verify trace structure during development
- Usage:
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=true bundle exec ruby examples/your_provider.rb - Verify: Check span hierarchy, attributes, and parent/child relationships
3. Braintrust MCP Server (Integration Testing)
- Purpose: Query and inspect traces in the Braintrust platform
- Setup: Should be auto-configured in Docker environment
- Commands:
# List recent traces mcp__braintrust__list_recent_objects(object_type: "project_logs", limit: 10) # Inspect specific span mcp__braintrust__resolve_object(object_type: "project_logs", object_id: "span_id") # BTQL query mcp__braintrust__btql_query(query: "SELECT * FROM project_logs WHERE metadata.provider = 'your_provider'") - Verify attributes:
input,output,metadata,metrics,span_attributes.braintrust.parent,span_attributes.braintrust.org
4. Examples (Manual Testing)
- Customer example:
bundle exec ruby examples/your_provider.rb - Internal example:
bundle exec ruby examples/internal/your_provider.rb - Purpose: End-to-end validation of real API calls
Testing Workflow
- TDD cycle: Write unit test → implement → run
bundle exec appraisal rake test - Console log: Use
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=trueto debug span structure - MCP validation: Query traces with Braintrust MCP server
- Examples: Run examples to verify end-to-end behavior
TDD Workflow (CRITICAL)
After EVERY major change: test → lint → fix → commit cycle
- Create todo list at start
- Write one failing test
- Implement minimal code to pass
- Run tests with appraisal:
bundle exec appraisal rake test - Lint:
bundle exec rake lint(fix withrake lint:fix) - Verify with MCP tools
- Refactor if needed
- Repeat cycle for: basic → attributes → streaming → errors → tokens → multimodal
Defensive Coding
- ✅ Nil checks (
return {} unless usage) - ✅ Safe navigation (
params[:model] || "unknown") - ✅ Compact hashes (
.compact) - ✅ Error handling (
begin/rescue/ensure) - ✅ JSON safety (rescue in
set_json_attr) - ✅ Graceful gem loading (
rescue LoadError)
StandardRB & CI
Lint after every change (part of TDD cycle):
bundle exec rake lint # Check StandardRB
bundle exec rake lint:fix # Auto-fix
Coverage target (check periodically):
bundle exec rake coverage # >90% line, >80% branch
CI requirements: StandardRB + tests on Ruby 3.2/3.3/3.4 + Ubuntu/macOS + all appraisal scenarios
Token Normalization
Use shared TokenParser.parse_usage_tokens(usage) in lib/braintrust/trace/token_parser.rb to normalize tokens:
prompt_tokens(input)completion_tokens(output)tokens(total, includes cache_creation_tokens)prompt_cached_tokens(if cached)prompt_cache_creation_tokens(if cache created)completion_reasoning_tokens(if reasoning)
VCR Cassettes
VCR_MODE=all bundle exec rake test # Re-record all
VCR_MODE=new_episodes bundle exec rake test # Record new only
VCR_OFF=true bundle exec rake test # Skip VCR
Reference Files
- Integrations:
lib/braintrust/trace/contrib/{openai,anthropic}.rb - Tests:
test/braintrust/trace/{openai,anthropic}_test.rb - Test helpers:
test/test_helper.rb - Examples:
examples/{openai,anthropic}.rb - Config:
Rakefile,Appraisals,.github/workflows/ci.yml
Repository
