memory-orchestration
Analyze context management, memory systems, and state continuity in agent frameworks. Use when (1) understanding how prompts are assembled, (2) evaluating eviction policies for context overflow, (3) mapping memory tiers (short-term/long-term), (4) analyzing token budget management, or (5) comparing context strategies across frameworks.
$ Installieren
git clone https://github.com/Dowwie/agent_framework_study /tmp/agent_framework_study && cp -r /tmp/agent_framework_study/.claude/skills/memory-orchestration ~/.claude/skills/agent_framework_study// tip: Run this command in your terminal to install the skill
name: memory-orchestration description: Analyze context management, memory systems, and state continuity in agent frameworks. Use when (1) understanding how prompts are assembled, (2) evaluating eviction policies for context overflow, (3) mapping memory tiers (short-term/long-term), (4) analyzing token budget management, or (5) comparing context strategies across frameworks.
Memory Orchestration
Analyzes context management and memory systems.
Process
- Trace context assembly โ How prompts are built from components
- Identify eviction policies โ How context overflow is handled
- Map memory tiers โ Short-term (RAM) to long-term (DB)
- Analyze token management โ Counting, budgeting, truncation
Context Assembly Analysis
Standard Assembly Order
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. System Prompt โ
โ - Role definition โ
โ - Behavioral guidelines โ
โ - Output format instructions โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2. Retrieved Context / Memory โ
โ - Relevant past interactions โ
โ - Retrieved documents (RAG) โ
โ - User preferences โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3. Tool Definitions โ
โ - Available tools and schemas โ
โ - Usage examples โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4. Conversation History โ
โ - Previous turns (user/assistant) โ
โ - Prior tool calls and results โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 5. Current Input โ
โ - User's current message โ
โ - Any attachments/context โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 6. Agent Scratchpad (Optional) โ
โ - Current thinking/planning โ
โ - Intermediate results โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Assembly Patterns
Template-Based
PROMPT_TEMPLATE = """
{system_prompt}
## Available Tools
{tool_descriptions}
## Conversation
{history}
## Current Request
{user_input}
"""
prompt = PROMPT_TEMPLATE.format(
system_prompt=self.system_prompt,
tool_descriptions=self._format_tools(),
history=self._format_history(),
user_input=message
)
Message List (Chat API)
messages = [
{"role": "system", "content": system_prompt},
*self._get_history_messages(),
{"role": "user", "content": user_input}
]
Programmatic Assembly
def build_prompt(self, input):
builder = PromptBuilder()
builder.add_system(self.system_prompt)
builder.add_context(self.memory.retrieve(input))
builder.add_tools(self.tools)
builder.add_history(self.history, max_tokens=2000)
builder.add_user(input)
return builder.build()
Eviction Policies
FIFO (First In, First Out)
def trim_history(self, max_messages: int):
while len(self.history) > max_messages:
self.history.pop(0) # Remove oldest
Pros: Simple, predictable Cons: May lose important early context
Sliding Window
def get_context_window(self, max_tokens: int):
window = []
token_count = 0
for msg in reversed(self.history):
msg_tokens = count_tokens(msg)
if token_count + msg_tokens > max_tokens:
break
window.insert(0, msg)
token_count += msg_tokens
return window
Pros: Token-aware, keeps recent Cons: Still loses old context
Summarization
def summarize_and_trim(self, max_tokens: int):
if self.total_tokens < max_tokens:
return
# Summarize oldest messages
old_messages = self.history[:len(self.history)//2]
summary = self.llm.summarize(old_messages)
# Replace with summary
self.history = [
{"role": "system", "content": f"Previous conversation summary: {summary}"},
*self.history[len(self.history)//2:]
]
Pros: Preserves context semantically Cons: Expensive (LLM call), lossy
Vector Store Swapping
def manage_context(self, current_input: str, max_tokens: int):
# Move old messages to vector store
if self.total_tokens > max_tokens:
to_archive = self.history[:-10]
self.vector_store.add(to_archive)
self.history = self.history[-10:]
# Retrieve relevant context
relevant = self.vector_store.search(current_input, k=5)
return self._build_prompt(relevant, self.history)
Pros: Scalable, relevance-based Cons: Complex, retrieval quality matters
Importance Scoring
def score_and_trim(self, max_tokens: int):
scored = []
for msg in self.history:
score = self._compute_importance(msg)
scored.append((score, msg))
# Keep highest scoring until budget
scored.sort(reverse=True)
kept = []
tokens = 0
for score, msg in scored:
if tokens + count_tokens(msg) > max_tokens:
break
kept.append(msg)
tokens += count_tokens(msg)
# Restore chronological order
self.history = sorted(kept, key=lambda m: m['timestamp'])
Pros: Keeps important context Cons: Expensive to compute
Memory Tier Mapping
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MEMORY TIERS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Tier 1: Working Memory (In-Prompt) โ
โ โโโ Current conversation turns โ
โ โโโ Active tool results โ
โ โโโ Immediate scratchpad โ
โ Latency: 0ms | Capacity: Context window โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Tier 2: Session Memory (RAM) โ
โ โโโ Full conversation history โ
โ โโโ Session state โ
โ โโโ Cached retrievals โ
โ Latency: <1ms | Capacity: GB โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Tier 3: Persistent Memory (Database) โ
โ โโโ Vector store (semantic search) โ
โ โโโ SQL/Document store (structured) โ
โ โโโ User profiles and preferences โ
โ Latency: 10-100ms | Capacity: TB+ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Tier Promotion/Demotion
class MemoryManager:
def on_turn_end(self, turn):
# Tier 1 โ Tier 2: Move from prompt to session
self.session_memory.add(turn)
# Tier 2 โ Tier 3: Persist important turns
if self.should_persist(turn):
self.persistent_memory.add(turn)
def on_session_end(self):
# Tier 2 โ Tier 3: Archive session
summary = self.summarize_session()
self.persistent_memory.add(summary)
Token Management
Counting Strategies
| Method | Accuracy | Speed |
|---|---|---|
tiktoken | Exact | Fast |
len(text) / 4 | Rough estimate | Instant |
| API response | Post-hoc | After call |
| Tokenizer model | Exact | Medium |
Budget Allocation
class TokenBudget:
def __init__(self, total: int = 8000):
self.total = total
self.allocations = {
'system': 1000,
'tools': 1500,
'history': 4000,
'input': 1000,
'output_reserve': 500
}
def remaining_for_history(self, used: dict) -> int:
fixed = used.get('system', 0) + used.get('tools', 0)
return self.total - fixed - self.allocations['output_reserve']
Output Template
## Memory Orchestration Analysis: [Framework Name]
### Context Assembly
- **Order**: [System โ Memory โ Tools โ History โ Input]
- **Method**: [Template/Message List/Programmatic]
- **Location**: `path/to/prompt_builder.py`
### Eviction Policy
- **Strategy**: [FIFO/Window/Summarization/Vector/Importance]
- **Trigger**: [Token count/Message count/Explicit]
- **Location**: `path/to/memory.py:L45`
### Memory Tiers
| Tier | Storage | Capacity | Retrieval |
|------|---------|----------|-----------|
| Working | In-prompt | ~4K tokens | Immediate |
| Session | Dict/List | Unlimited | Direct |
| Persistent | [Chroma/Pinecone/SQL] | Unlimited | Semantic |
### Token Management
- **Counting**: [tiktoken/estimate/API]
- **Budget Allocation**: [Description]
- **Overflow Handling**: [Truncate/Summarize/Error]
Integration
- Prerequisite:
codebase-mappingto identify memory files - Feeds into:
comparative-matrixfor context strategies - Related:
control-loop-extractionfor scratchpad usage
Repository
