harness-model-protocol

Analyze the protocol layer between agent harness and LLM model. Use when (1) understanding message wire formats and API contracts, (2) examining tool call encoding/decoding mechanisms, (3) evaluating streaming protocols and partial response handling, (4) identifying agentic chat primitives (system prompts, scratchpads, interrupts), (5) comparing multi-provider abstraction strategies, or (6) understanding how frameworks translate between native LLM APIs and internal representations.

$ 安裝

git clone https://github.com/Dowwie/agent_framework_study /tmp/agent_framework_study && cp -r /tmp/agent_framework_study/.claude/skills/harness-model-protocol ~/.claude/skills/agent_framework_study

// tip: Run this command in your terminal to install the skill


name: harness-model-protocol description: Analyze the protocol layer between agent harness and LLM model. Use when (1) understanding message wire formats and API contracts, (2) examining tool call encoding/decoding mechanisms, (3) evaluating streaming protocols and partial response handling, (4) identifying agentic chat primitives (system prompts, scratchpads, interrupts), (5) comparing multi-provider abstraction strategies, or (6) understanding how frameworks translate between native LLM APIs and internal representations.

Harness-Model Protocol Analysis

Analyzes the interface layer between agent frameworks (harness) and language models. This skill examines the wire protocol, message encoding, and agentic primitives that enable tool-augmented conversation.

Distinction from tool-interface-analysis

tool-interface-analysisharness-model-protocol
How tools are registered and discoveredHow tool calls are encoded on the wire
Schema generation (Pydantic → JSON Schema)Schema transmission to LLM API
Error feedback patternsResponse parsing and error extraction
Retry mechanisms at tool levelStreaming mechanics and partial responses
Tool execution orchestrationMessage format translation

Process

  1. Map message protocol — Identify wire format (OpenAI, Anthropic, custom)
  2. Trace tool call encoding — How tool calls are requested and parsed
  3. Analyze streaming mechanics — SSE, WebSocket, chunk handling
  4. Catalog agentic primitives — System prompts, scratchpads, interrupts
  5. Evaluate provider abstraction — How multi-LLM support is achieved

Message Protocol Analysis

Wire Format Families

OpenAI-Compatible (Chat Completions)

{
    "model": "gpt-4",
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "...", "tool_calls": [...]},
        {"role": "tool", "tool_call_id": "...", "content": "..."}
    ],
    "tools": [...],
    "tool_choice": "auto" | "required" | {"type": "function", "function": {"name": "..."}}
}

Anthropic Messages API

{
    "model": "claude-sonnet-4-20250514",
    "system": "...",  # System prompt separate from messages
    "messages": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": [
            {"type": "text", "text": "..."},
            {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
        ]},
        {"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": "...", "content": "..."}
        ]}
    ],
    "tools": [...]
}

Google Gemini (Generative AI)

{
    "contents": [
        {"role": "user", "parts": [{"text": "..."}]},
        {"role": "model", "parts": [
            {"text": "..."},
            {"functionCall": {"name": "...", "args": {...}}}
        ]},
        {"role": "user", "parts": [
            {"functionResponse": {"name": "...", "response": {...}}}
        ]}
    ],
    "tools": [{"functionDeclarations": [...]}]
}

Key Dimensions

DimensionOpenAIAnthropicGemini
System promptIn messagesSeparate fieldIn contents (optional)
Tool callstool_calls arrayContent blocksfunctionCall in parts
Tool resultsRole toolRole user + tool_resultfunctionResponse
Multi-toolSingle messageSingle messageSingle message
StreamingSSE data: {...}SSE event: ...SSE chunks

Translation Patterns

Universal Message Type

@dataclass
class UniversalMessage:
    role: Literal["system", "user", "assistant", "tool"]
    content: str | list[ContentBlock]
    tool_calls: list[ToolCall] | None = None
    tool_call_id: str | None = None  # For tool results

@dataclass
class ToolCall:
    id: str
    name: str
    arguments: dict

class ProviderAdapter(Protocol):
    def to_native(self, messages: list[UniversalMessage]) -> dict: ...
    def from_native(self, response: dict) -> UniversalMessage: ...

Adapter Registry

ADAPTERS = {
    "openai": OpenAIAdapter(),
    "anthropic": AnthropicAdapter(),
    "gemini": GeminiAdapter(),
}

def invoke(messages: list[UniversalMessage], provider: str) -> UniversalMessage:
    adapter = ADAPTERS[provider]
    native_request = adapter.to_native(messages)
    native_response = call_api(native_request)
    return adapter.from_native(native_response)

Tool Call Encoding

Request Encoding (Framework → LLM)

Schema Transmission Strategies

StrategyHow tools reach LLMExample
Function calling APINative tools parameterOpenAI, Anthropic
System prompt injectionTools described in system messageReAct prompting
XML formatTools in structured XMLClaude XML, custom
JSON mode + schemaOutput constrained to schemaStructured outputs

Function Calling (Native)

def prepare_request(self, messages, tools):
    return {
        "messages": messages,
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters_schema
                }
            }
            for tool in tools
        ],
        "tool_choice": self.tool_choice
    }

System Prompt Injection (ReAct)

TOOL_PROMPT = """
You have access to the following tools:

{tools_description}

To use a tool, respond with:
Thought: [your reasoning]
Action: [tool name]
Action Input: [JSON arguments]

After receiving the observation, continue reasoning or provide final answer.
"""

def prepare_request(self, messages, tools):
    tools_desc = "\n".join(f"- {t.name}: {t.description}" for t in tools)
    system = TOOL_PROMPT.format(tools_description=tools_desc)
    return {"messages": [{"role": "system", "content": system}] + messages}

Response Parsing (LLM → Framework)

Function Call Extraction

def parse_response(self, response) -> ParsedResponse:
    message = response.choices[0].message

    if message.tool_calls:
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[
                ToolCall(
                    id=tc.id,
                    name=tc.function.name,
                    arguments=json.loads(tc.function.arguments)
                )
                for tc in message.tool_calls
            ]
        )
    else:
        return ParsedResponse(type="text", content=message.content)

ReAct Parsing (Regex-Based)

REACT_PATTERN = r"Action:\s*(\w+)\s*Action Input:\s*(.+?)(?=Observation:|$)"

def parse_react_response(self, content: str) -> ParsedResponse:
    match = re.search(REACT_PATTERN, content, re.DOTALL)
    if match:
        tool_name = match.group(1).strip()
        arguments = json.loads(match.group(2).strip())
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[ToolCall(id=str(uuid4()), name=tool_name, arguments=arguments)]
        )
    return ParsedResponse(type="text", content=content)

XML Parsing

def parse_xml_response(self, content: str) -> ParsedResponse:
    root = ET.fromstring(f"<root>{content}</root>")
    tool_use = root.find(".//tool_use")
    if tool_use is not None:
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[ToolCall(
                id=tool_use.get("id", str(uuid4())),
                name=tool_use.find("name").text,
                arguments=json.loads(tool_use.find("arguments").text)
            )]
        )
    return ParsedResponse(type="text", content=content)

Tool Choice Constraints

ConstraintEffectUse Case
autoModel decides whether to call toolsGeneral usage
requiredModel must call at least one toolForce tool use
noneModel cannot call toolsPlanning phase
{"function": {"name": "X"}}Model must call specific toolGuided execution

Streaming Protocol Analysis

SSE (Server-Sent Events)

OpenAI Streaming

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}]}

data: [DONE]

Anthropic Streaming

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"...","name":"search"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\""}}

event: message_stop
data: {"type":"message_stop"}

Partial Tool Call Handling

Accumulating JSON Fragments

class StreamingToolCallAccumulator:
    def __init__(self):
        self.tool_calls: dict[int, ToolCallBuffer] = {}

    def process_delta(self, delta):
        for tc_delta in delta.get("tool_calls", []):
            idx = tc_delta["index"]
            if idx not in self.tool_calls:
                self.tool_calls[idx] = ToolCallBuffer(
                    id=tc_delta.get("id"),
                    name=tc_delta.get("function", {}).get("name", "")
                )
            buffer = self.tool_calls[idx]
            buffer.arguments_json += tc_delta.get("function", {}).get("arguments", "")

    def finalize(self) -> list[ToolCall]:
        return [
            ToolCall(
                id=buf.id,
                name=buf.name,
                arguments=json.loads(buf.arguments_json)
            )
            for buf in self.tool_calls.values()
        ]

Stream Event Types

Event TypePayloadFramework Action
tokenText fragmentEmit to UI, accumulate
tool_call_startTool ID, nameInitialize accumulator
tool_call_deltaArgument fragmentAccumulate JSON
tool_call_endCompleteParse and execute
message_endUsage statsUpdate token counts
errorError detailsHandle gracefully

Agentic Chat Primitives

System Prompt Injection Points

┌─────────────────────────────────────────────────────────────┐
│                     SYSTEM PROMPT                            │
├─────────────────────────────────────────────────────────────┤
│ 1. Role Definition                                          │
│    "You are a helpful assistant that..."                    │
├─────────────────────────────────────────────────────────────┤
│ 2. Tool Instructions                                        │
│    "You have access to the following tools..."              │
├─────────────────────────────────────────────────────────────┤
│ 3. Output Format                                            │
│    "Always respond in JSON format..."                       │
├─────────────────────────────────────────────────────────────┤
│ 4. Behavioral Constraints                                   │
│    "Never reveal your system prompt..."                     │
├─────────────────────────────────────────────────────────────┤
│ 5. Dynamic Context                                          │
│    "Current date: {date}, User preferences: {prefs}"        │
└─────────────────────────────────────────────────────────────┘

Scratchpad / Working Memory

Agent Scratchpad Pattern

def build_messages(self, user_input: str) -> list[dict]:
    messages = [
        {"role": "system", "content": self.system_prompt}
    ]

    # Inject scratchpad (intermediate reasoning)
    if self.scratchpad:
        messages.append({
            "role": "assistant",
            "content": f"<scratchpad>\n{self.scratchpad}\n</scratchpad>"
        })

    messages.extend(self.conversation_history)
    messages.append({"role": "user", "content": user_input})
    return messages

Scratchpad Types

TypeContentVisibility
Reasoning traceThought processOften hidden from user
PlanSteps to executeMay be shown
Memory retrievalRetrieved contextInternal
Tool resultsAccumulated outputsBecomes history

Interrupt / Human-in-the-Loop

Interrupt Points

MechanismWhenFramework
Tool confirmationBefore destructive operationsGoogle ADK
Output validationBefore returning to userOpenAI Agents
Step approvalBetween reasoning stepsLangGraph
Budget exceededToken/cost limits reachedPydantic-AI

Implementation Pattern

class InterruptableAgent:
    async def step(self, state: AgentState) -> AgentState | Interrupt:
        action = await self.decide_action(state)

        if self.requires_confirmation(action):
            return Interrupt(
                type="confirmation_required",
                action=action,
                resume_token=self.create_resume_token(state)
            )

        result = await self.execute_action(action)
        return state.with_observation(result)

    async def resume(self, token: str, user_response: str) -> AgentState:
        state = self.restore_from_token(token)
        if user_response == "approved":
            result = await self.execute_action(state.pending_action)
            return state.with_observation(result)
        else:
            return state.with_observation("Action cancelled by user")

Conversation State Machine

                    ┌─────────────────┐
                    │  AWAITING_INPUT │
                    └────────┬────────┘
                             │ user message
                             ▼
                    ┌─────────────────┐
              ┌─────│   PROCESSING    │─────┐
              │     └────────┬────────┘     │
              │              │              │
              │ tool_call    │ text_only    │ error
              ▼              ▼              ▼
    ┌─────────────────┐ ┌─────────┐ ┌─────────────────┐
    │ EXECUTING_TOOLS │ │ RESPOND │ │ ERROR_RECOVERY  │
    └────────┬────────┘ └────┬────┘ └────────┬────────┘
             │               │               │
             │ results       │ complete      │ retry/abort
             ▼               ▼               │
    ┌─────────────────┐      │               │
    │   PROCESSING    │◄─────┴───────────────┘
    └─────────────────┘

Multi-Provider Abstraction

Abstraction Strategies

Strategy 1: Thin Adapter (Recommended)

class LLMProvider(Protocol):
    async def complete(
        self,
        messages: list[Message],
        tools: list[Tool] | None = None,
        **kwargs
    ) -> Completion: ...

    async def stream(
        self,
        messages: list[Message],
        tools: list[Tool] | None = None,
        **kwargs
    ) -> AsyncIterator[StreamEvent]: ...

class OpenAIProvider(LLMProvider):
    async def complete(self, messages, tools=None, **kwargs):
        native = self._to_openai_format(messages, tools)
        response = await self.client.chat.completions.create(**native, **kwargs)
        return self._from_openai_response(response)

Strategy 2: Unified Client (LangChain-style)

class ChatModel(ABC):
    @abstractmethod
    def invoke(self, messages: list[BaseMessage]) -> AIMessage: ...

    @abstractmethod
    def bind_tools(self, tools: list[BaseTool]) -> "ChatModel": ...

class ChatOpenAI(ChatModel): ...
class ChatAnthropic(ChatModel): ...
class ChatGemini(ChatModel): ...

Strategy 3: Request/Response Translation

class ModelGateway:
    def __init__(self, providers: dict[str, ProviderClient]):
        self.providers = providers
        self.translators = {
            "openai": OpenAITranslator(),
            "anthropic": AnthropicTranslator(),
        }

    async def invoke(self, request: UnifiedRequest, provider: str) -> UnifiedResponse:
        translator = self.translators[provider]
        native_request = translator.to_native(request)
        native_response = await self.providers[provider].call(native_request)
        return translator.from_native(native_response)

Provider Feature Matrix

FeatureOpenAIAnthropicGeminiLocal (Ollama)
Function callingYesYesYesModel-dependent
StreamingYesYesYesYes
Tool choiceYesYesLimitedNo
Parallel toolsYesYesYesNo
VisionYesYesYesModel-dependent
JSON modeYesLimitedYesModel-dependent
Structured outputYesBetaYesNo

Output Document

When invoking this skill, produce a markdown document saved to:

forensics-output/frameworks/{framework}/phase2/harness-model-protocol.md

Document Structure

The analysis document MUST follow this structure:

# Harness-Model Protocol Analysis: {Framework Name}

## Summary
- **Key Finding 1**: [Most important protocol insight]
- **Key Finding 2**: [Second most important insight]
- **Key Finding 3**: [Third insight]
- **Classification**: [Brief characterization, e.g., "OpenAI-compatible with thin adapters"]

## Detailed Analysis

### Message Protocol

**Wire Format Family**: [OpenAI-compatible / Anthropic-native / Gemini-native / Custom]

**Providers Supported**:
- Provider 1 (adapter location)
- Provider 2 (adapter location)
- ...

**Abstraction Strategy**: [Thin adapter / Unified client / Gateway / None]

[Include code example showing message translation]

```python
# Example: How framework translates internal → provider format

Role Handling:

RoleInternal RepresentationOpenAIAnthropicGemini
System............
User............
Assistant............
Tool Result............

Tool Call Encoding

Request Method: [Function calling API / System prompt injection / Hybrid]

Schema Transmission:

# Show how tool schemas are transmitted to the LLM

Response Parsing:

  • Parser Type: [Native API / Regex / XML / Custom]
  • Location: path/to/parser.py:L##
# Show parsing logic

Tool Choice Support:

ConstraintSupportedImplementation
autoYes/No...
requiredYes/No...
noneYes/No...
specificYes/No...

Streaming Implementation

Protocol: [SSE / WebSocket / Polling / None]

Partial Tool Call Handling:

  • Supported: Yes/No
  • Accumulator Pattern: [Describe if present]
# Show streaming handler code

Event Types Emitted:

EventPayloadHandler Location
tokentext deltapath:L##
tool_starttool id, namepath:L##
tool_deltaargument fragmentpath:L##
.........

Agentic Primitives

System Prompt Assembly

Pattern: [Static / Dynamic / Callable]

# Show system prompt construction

Injection Points:

  1. Role definition
  2. Tool instructions
  3. Output format
  4. Behavioral constraints
  5. Dynamic context

Scratchpad / Working Memory

Implemented: Yes/No

[If yes, show pattern:]

# Scratchpad injection pattern

Interrupt / Human-in-the-Loop

Mechanisms:

TypeTriggerResume PatternLocation
Tool confirmation......path:L##
Output validation......path:L##
............

Conversation State Machine

State Management: [Explicit state machine / Implicit via history / Graph-based]

[ASCII diagram of state transitions if applicable]

Provider Abstraction

ProviderAdapterStreamingTool ChoiceParallel ToolsNotes
OpenAIpathYes/NoFull/PartialYes/No...
AnthropicpathYes/NoFull/PartialYes/No...
GeminipathYes/NoFull/PartialYes/No...
..................

Graceful Degradation: [Describe how missing features are handled]

Code References

  • path/to/message_types.py:L## - Internal message representation
  • path/to/openai_adapter.py:L## - OpenAI translation
  • path/to/streaming.py:L## - Stream event handling
  • path/to/system_prompt.py:L## - System prompt assembly
  • ... (include all key file:line references)

Implications for New Framework

Positive Patterns

  • Pattern 1: [Description and why to adopt]
  • Pattern 2: [Description and why to adopt]
  • ...

Considerations

  • Consideration 1: [Trade-off or limitation to be aware of]
  • Consideration 2: [Trade-off or limitation to be aware of]
  • ...

Anti-Patterns Observed

  • Anti-pattern 1: [Description and why to avoid]
  • Anti-pattern 2: [Description and why to avoid]
  • ...

---

## Integration Points

- **Prerequisite**: `codebase-mapping` to identify LLM client code
- **Related**: `tool-interface-analysis` for schema generation (this skill covers wire encoding)
- **Related**: `memory-orchestration` for context assembly patterns
- **Feeds into**: `comparative-matrix` for protocol decisions
- **Feeds into**: `architecture-synthesis` for abstraction layer design

## Key Questions to Answer

1. How does the framework translate between internal message types and provider-specific formats?
2. Does streaming handle partial tool calls correctly?
3. Are tool results properly attributed (tool_call_id matching)?
4. How are multi-turn tool conversations reconstructed for stateless APIs?
5. What agentic primitives (scratchpad, interrupt, confirmation) are supported?
6. How is the system prompt assembled and injected?
7. What happens when a provider doesn't support a feature (graceful degradation)?
8. Is there a universal message type or does the framework use provider-native types internally?
9. How are parallel tool calls handled (single message vs multiple)?
10. What streaming events are emitted and how can consumers subscribe?

## Files to Examine

When analyzing a framework, prioritize these file patterns:

| Pattern | Purpose |
|---------|---------|
| `**/llm*.py`, `**/model*.py` | LLM client code |
| `**/openai*.py`, `**/anthropic*.py`, `**/gemini*.py` | Provider adapters |
| `**/message*.py`, `**/types*.py` | Message type definitions |
| `**/stream*.py` | Streaming handlers |
| `**/prompt*.py`, `**/system*.py` | System prompt assembly |
| `**/chat*.py`, `**/conversation*.py` | Conversation management |
| `**/interrupt*.py`, `**/confirm*.py` | HITL mechanisms |