decision-graph-analyzer
Query and analyze the AI Counsel decision graph to find past deliberations, identify patterns, and debug memory issues
$ インストール
git clone https://github.com/blueman82/ai-counsel /tmp/ai-counsel && cp -r /tmp/ai-counsel/.claude/skills/decision-graph-analyzer ~/.claude/skills/ai-counsel// tip: Run this command in your terminal to install the skill
name: decision-graph-analyzer description: Query and analyze the AI Counsel decision graph to find past deliberations, identify patterns, and debug memory issues when_to_use: > Use this skill when you need to explore the decision graph memory system, find similar past deliberations, identify contradictions or evolution patterns, debug context injection issues, or analyze cache performance.
Decision Graph Analyzer Skill
Overview
The decision graph module (decision_graph/) stores completed deliberations and provides semantic similarity-based retrieval for context injection. This skill teaches you how to query, analyze, and troubleshoot the decision graph effectively.
Core Components
Storage Layer (decision_graph/storage.py)
- DecisionGraphStorage: SQLite3 backend with CRUD operations
- Schema:
decision_nodes,participant_stances,decision_similarities - Indexes: Optimized for timestamp (recency), question (duplicates), similarity (retrieval)
- Connection: Use
:memory:for testing, file path for production
Integration Layer (decision_graph/integration.py)
- DecisionGraphIntegration: High-level API facade
- Methods:
store_deliberation(question, result): Save completed deliberationget_context_for_deliberation(question): Retrieve similar past decisionsget_graph_stats(): Get monitoring statisticshealth_check(): Validate database integrity
Retrieval Layer (decision_graph/retrieval.py)
- DecisionRetriever: Finds relevant decisions and formats context
- Key Features:
- Two-tier caching (L1: query results, L2: embeddings)
- Adaptive k (2-5 results based on database size)
- Noise floor filtering (0.40 minimum similarity)
- Tiered formatting (strong/moderate/brief)
Maintenance Layer (decision_graph/maintenance.py)
- DecisionGraphMaintenance: Monitoring and health checks
- Methods:
get_database_stats(): Node/stance/similarity counts, DB sizeanalyze_growth(days): Growth rate and projectionshealth_check(): Validate data integrityestimate_archival_benefit(): Space savings simulation
Common Query Patterns
1. Find Similar Decisions
When: You want to see what past deliberations are related to a new question.
from decision_graph.integration import DecisionGraphIntegration
from decision_graph.storage import DecisionGraphStorage
# Initialize
storage = DecisionGraphStorage("decision_graph.db")
integration = DecisionGraphIntegration(storage)
# Get similar decisions with context
question = "Should we adopt TypeScript for the project?"
context = integration.get_context_for_deliberation(question)
if context:
print("Found relevant past decisions:")
print(context)
else:
print("No similar past decisions found")
Direct retrieval access:
from decision_graph.retrieval import DecisionRetriever
retriever = DecisionRetriever(storage)
# Get scored results (DecisionNode, similarity_score) tuples
scored_decisions = retriever.find_relevant_decisions(
query_question="Should we adopt TypeScript?",
threshold=0.7, # Deprecated but kept for compatibility
max_results=3 # Deprecated - uses adaptive k instead
)
for decision, score in scored_decisions:
print(f"Score: {score:.2f}")
print(f"Question: {decision.question}")
print(f"Consensus: {decision.consensus}")
print(f"Participants: {', '.join(decision.participants)}")
print("---")
2. Inspect Database Statistics
When: Monitoring growth, checking health, or debugging performance.
# Get comprehensive stats
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")
print(f"Total stances: {stats['total_stances']}")
print(f"Total similarities: {stats['total_similarities']}")
print(f"Database size: {stats['db_size_mb']} MB")
# Analyze growth rate
from decision_graph.maintenance import DecisionGraphMaintenance
maintenance = DecisionGraphMaintenance(storage)
growth = maintenance.analyze_growth(days=30)
print(f"Decisions in last 30 days: {growth['decisions_in_period']}")
print(f"Average per day: {growth['avg_decisions_per_day']}")
print(f"Projected next 30 days: {growth['projected_decisions_30d']}")
3. Validate Database Health
When: Debugging issues, after schema changes, or periodic maintenance.
# Run comprehensive health check
health = integration.health_check()
if health['healthy']:
print(f"Database is healthy ({health['checks_passed']}/{health['checks_passed']} checks passed)")
else:
print(f"Found {health['checks_failed']} issues:")
for issue in health['issues']:
print(f" - {issue}")
# View detailed results
print("\nDetails:")
for check, result in health['details'].items():
print(f" {check}: {result}")
Common issues detected:
- Orphaned participant stances (decision_id doesn't exist)
- Orphaned similarities (source_id or target_id missing)
- Future timestamps (data corruption)
- Missing required fields (incomplete data)
- Invalid similarity scores (not in 0.0-1.0 range)
4. Analyze Cache Performance
When: Debugging slow queries or optimizing cache configuration.
# Get cache statistics
retriever = DecisionRetriever(storage, enable_cache=True)
# Run some queries first to populate cache
for question in test_questions:
retriever.find_relevant_decisions(question)
# Check cache stats
cache_stats = retriever.get_cache_stats()
print(f"L1 query cache: {cache_stats['query_cache_size']} entries")
print(f"L1 hit rate: {cache_stats['query_hit_rate']:.1%}")
print(f"L2 embedding cache: {cache_stats['embedding_cache_size']} entries")
print(f"L2 hit rate: {cache_stats['embedding_hit_rate']:.1%}")
# Invalidate cache after adding new decisions
retriever.invalidate_cache()
Expected performance:
- L1 cache hit: <2μs (instant)
- L1 cache miss: <100ms (compute similarities)
- L2 cache hit: ~50% after warmup
- Target: 60%+ L1 hit rate for production workloads
5. Retrieve Specific Decisions
When: Debugging, inspection, or building custom queries.
# Get a specific decision by ID
decision = storage.get_decision_node(decision_id="uuid-here")
if decision:
print(f"Question: {decision.question}")
print(f"Timestamp: {decision.timestamp}")
print(f"Consensus: {decision.consensus}")
print(f"Status: {decision.convergence_status}")
# Get participant stances
stances = storage.get_participant_stances(decision.id)
for stance in stances:
print(f"{stance.participant}: {stance.vote_option} ({stance.confidence:.0%})")
print(f" Rationale: {stance.rationale}")
# Get all recent decisions
recent_decisions = storage.get_all_decisions(limit=10, offset=0)
for decision in recent_decisions:
print(f"{decision.timestamp}: {decision.question[:50]}...")
# Find similar decisions to a known decision
similar = storage.get_similar_decisions(
decision_id="uuid-here",
threshold=0.7,
limit=5
)
for decision, score in similar:
print(f"Score: {score:.2f} - {decision.question}")
6. Manual Similarity Computation
When: Testing similarity detection, calibrating thresholds, or debugging retrieval.
from decision_graph.similarity import QuestionSimilarityDetector
detector = QuestionSimilarityDetector()
# Check backend being used
print(f"Backend: {detector.backend.__class__.__name__}")
# Outputs: SentenceTransformerBackend, TFIDFBackend, or JaccardBackend
# Compute similarity between two questions
score = detector.compute_similarity(
"Should we use TypeScript?",
"Should we adopt TypeScript for our project?"
)
print(f"Similarity: {score:.3f}")
# Find similar questions from candidates
candidates = [
("id1", "Should we use React or Vue?"),
("id2", "What database should we choose?"),
("id3", "Should we migrate to TypeScript?")
]
matches = detector.find_similar(
query="Should we adopt TypeScript?",
candidates=candidates,
threshold=0.7
)
for match in matches:
print(f"{match['id']}: {match['score']:.2f}")
Similarity Score Interpretation
The decision graph uses semantic similarity scores (0.0-1.0) to determine relevance:
| Score Range | Tier | Meaning | Example |
|---|---|---|---|
| 0.90-1.00 | Duplicate | Near-identical questions | "Use TypeScript?" vs "Should we use TypeScript?" |
| 0.75-0.89 | Strong | Highly related topics | "Use TypeScript?" vs "Adopt TypeScript for backend?" |
| 0.60-0.74 | Moderate | Related but distinct | "Use TypeScript?" vs "What language for frontend?" |
| 0.40-0.59 | Brief | Tangentially related | "Use TypeScript?" vs "Choose a static analyzer" |
| 0.00-0.39 | Noise | Unrelated or spurious | "Use TypeScript?" vs "What database to use?" |
Thresholds in use:
- Noise floor (0.40): Minimum similarity to include in results
- Default threshold (0.70): Legacy retrieval threshold (deprecated)
- Strong tier (0.75): Full formatting with stances in context
- Moderate tier (0.60): Summary formatting without stances
Adaptive k (result count):
- Small DB (<100 decisions): k=5 (exploration phase)
- Medium DB (100-999): k=3 (balanced phase)
- Large DB (≥1000): k=2 (precision phase)
Tiered Context Formatting
The decision graph uses budget-aware tiered formatting to control token usage:
Strong Tier (≥0.75 similarity)
Format: Full details with participant stances (~500 tokens)
### Strong Match (similarity: 0.85): Should we use TypeScript?
**Date**: 2024-10-15T14:30:00
**Convergence Status**: converged
**Consensus**: Adopt TypeScript for type safety and tooling benefits
**Winning Option**: Option A: Adopt TypeScript
**Participants**: opus@claude, gpt-4@codex, gemini-pro@gemini
**Participant Positions**:
- **opus@claude**: Voted for 'Option A' (confidence: 90%) - Strong type system reduces bugs
- **gpt-4@codex**: Voted for 'Option A' (confidence: 85%) - Better IDE support
- **gemini-pro@gemini**: Voted for 'Option A' (confidence: 80%) - Easier refactoring
Moderate Tier (0.60-0.74 similarity)
Format: Summary without stances (~200 tokens)
### Moderate Match (similarity: 0.68): What language for frontend?
**Consensus**: Use TypeScript for better type safety
**Result**: TypeScript
Brief Tier (0.40-0.59 similarity)
Format: One-liner (~50 tokens)
- **Brief Match** (0.45): Choose static analysis tools → ESLint with TypeScript
Token budget (default: 2000 tokens):
- Allows ~2-3 strong decisions, or
- ~5-7 moderate decisions, or
- ~20-40 brief decisions
- Formatting stops when budget reached
Troubleshooting
Issue: No context retrieved for similar questions
Symptoms: get_context_for_deliberation() returns empty string
Diagnosis:
# Check if decisions exist
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")
# Try direct retrieval with lower threshold
retriever = DecisionRetriever(storage)
scored = retriever.find_relevant_decisions(
query_question="Your question here",
threshold=0.0 # See all results
)
print(f"Found {len(scored)} candidates above noise floor (0.40)")
for decision, score in scored[:5]:
print(f" {score:.3f}: {decision.question[:50]}...")
Common causes:
- Database empty: No past deliberations stored
- Below noise floor: All similarities <0.40 (unrelated questions)
- Cache stale: Cache not invalidated after adding decisions
- Backend mismatch: Using Jaccard (weak) instead of SentenceTransformer (strong)
Fixes:
# 1. Check database
if stats['total_decisions'] == 0:
print("No decisions in database - add some first")
# 2. Lower threshold temporarily for testing
context = retriever.get_enriched_context(question, threshold=0.5)
# 3. Invalidate cache
retriever.invalidate_cache()
# 4. Check backend
detector = QuestionSimilarityDetector()
print(f"Using backend: {detector.backend.__class__.__name__}")
# If Jaccard: install sentence-transformers for better results
Issue: Slow queries (>1s latency)
Symptoms: find_relevant_decisions() takes >1 second
Diagnosis:
import time
# Measure query latency
start = time.time()
scored = retriever.find_relevant_decisions("Test question")
latency_ms = (time.time() - start) * 1000
print(f"Query latency: {latency_ms:.1f}ms")
# Check cache stats
cache_stats = retriever.get_cache_stats()
print(f"L1 hit rate: {cache_stats['query_hit_rate']:.1%}")
print(f"L2 hit rate: {cache_stats['embedding_hit_rate']:.1%}")
# Check database size
stats = integration.get_graph_stats()
print(f"Total decisions: {stats['total_decisions']}")
Common causes:
- Cold cache: First query always slow (computes similarities)
- Large database: >1000 decisions increases compute time
- No cache: Caching disabled in retriever
- Slow backend: Jaccard or TF-IDF slower than SentenceTransformer
Performance targets:
- Cache hit: <2μs
- Cache miss (<100 decisions): <50ms
- Cache miss (100-999 decisions): <100ms
- Cache miss (≥1000 decisions): <200ms
Fixes:
# 1. Warm up cache (run same query twice)
retriever.find_relevant_decisions(question) # Cold (slow)
retriever.find_relevant_decisions(question) # Warm (fast)
# 2. Enable caching if disabled
retriever = DecisionRetriever(storage, enable_cache=True)
# 3. Reduce query limit for large databases
all_decisions = storage.get_all_decisions(limit=100) # Not 10000
# 4. Upgrade to SentenceTransformer backend
# pip install sentence-transformers
Issue: Memory usage growing
Symptoms: Process memory increases over time
Diagnosis:
# Check cache sizes
cache_stats = retriever.get_cache_stats()
print(f"L1 entries: {cache_stats['query_cache_size']} (max: 200)")
print(f"L2 entries: {cache_stats['embedding_cache_size']} (max: 500)")
# Check database size
stats = integration.get_graph_stats()
print(f"Database: {stats['db_size_mb']} MB")
# Estimate memory usage
# L1: ~5KB per entry = ~1MB for 200 entries
# L2: ~1KB per entry = ~500KB for 500 entries
# Total expected: ~1.5MB for cache + DB size
Common causes:
- Cache unbounded: Using custom cache without size limits
- Database growth: Normal, ~5KB per decision
- Embedding cache: SentenceTransformer embeddings (768 floats each)
Fixes:
# 1. Use bounded cache (default)
retriever = DecisionRetriever(storage, enable_cache=True)
# Auto-creates cache with maxsize=200 (L1) and maxsize=500 (L2)
# 2. Monitor database growth
maintenance = DecisionGraphMaintenance(storage)
growth = maintenance.analyze_growth(days=30)
print(f"Growth rate: {growth['avg_decisions_per_day']:.1f} decisions/day")
# 3. Consider archival at 5000+ decisions (Phase 2)
if stats['total_decisions'] > 5000:
estimate = maintenance.estimate_archival_benefit()
print(f"Archival would save ~{estimate['estimated_space_savings_mb']} MB")
Issue: Context not helping convergence
Symptoms: Injected context doesn't improve deliberation quality
Diagnosis:
# Check what context was injected
context = integration.get_context_for_deliberation(question)
print(f"Context length: {len(context)} chars (~{len(context)//4} tokens)")
print(context)
# Check tier distribution in logs (look for MEASUREMENT lines)
# Example: tier_distribution=(strong:1, moderate:0, brief:2)
# Verify similarity scores
scored = retriever.find_relevant_decisions(question)
for decision, score in scored:
print(f"Score {score:.2f}: {decision.question[:40]}...")
if score < 0.70:
print(f" WARNING: Low similarity, may not be helpful")
Common causes:
- Low similarity: Scores 0.40-0.60 are tangentially related
- Brief tier dominance: Most context in brief format (no stances)
- Token budget exhausted: Only including 1-2 decisions
- Contradictory context: Past decisions conflict with current question
Calibration approach (Phase 1.5):
- Log MEASUREMENT lines: question, scored_results, tier_distribution, tokens, db_size
- Analyze which tiers correlate with improved convergence
- Adjust tier boundaries in config (default: strong=0.75, moderate=0.60)
- Tune token budget (default: 2000)
Configuration
Context injection can be configured in config.yaml:
decision_graph:
enabled: true
db_path: "decision_graph.db"
# Retrieval settings
similarity_threshold: 0.7 # DEPRECATED - uses noise floor (0.40) instead
max_context_decisions: 3 # DEPRECATED - uses adaptive k instead
# Tiered formatting (NEW)
tier_boundaries:
strong: 0.75 # Full details with stances
moderate: 0.60 # Summary without stances
# brief: implicit (≥0.40 noise floor)
context_token_budget: 2000 # Max tokens for context injection
Tuning recommendations:
- Start with defaults (strong=0.75, moderate=0.60, budget=2000)
- Collect MEASUREMENT logs over 50-100 deliberations
- Analyze tier distribution vs convergence improvement
- Adjust boundaries if needed (e.g., raise to 0.80/0.70 for stricter relevance)
- Increase budget if frequently hitting limit with strong matches
Testing Queries
# Minimal test: Store and retrieve
from decision_graph.integration import DecisionGraphIntegration
from decision_graph.storage import DecisionGraphStorage
from models.schema import DeliberationResult, Summary, ConvergenceInfo
storage = DecisionGraphStorage(":memory:")
integration = DecisionGraphIntegration(storage)
# Create mock result
result = DeliberationResult(
participants=["opus@claude", "gpt-4@codex"],
rounds_completed=2,
summary=Summary(consensus="Test consensus"),
convergence_info=ConvergenceInfo(status="converged"),
full_debate=[],
transcript_path="test.md"
)
# Store
decision_id = integration.store_deliberation("Should we use TypeScript?", result)
print(f"Stored: {decision_id}")
# Retrieve
context = integration.get_context_for_deliberation("Should we adopt TypeScript?")
print(f"Context retrieved: {len(context)} chars")
assert len(context) > 0, "Should find similar decision"
Key Files Reference
- Storage:
decision_graph/storage.py- SQLite CRUD operations - Schema:
decision_graph/schema.py- DecisionNode, ParticipantStance, DecisionSimilarity - Retrieval:
decision_graph/retrieval.py- DecisionRetriever with caching - Integration:
decision_graph/integration.py- High-level API facade - Similarity:
decision_graph/similarity.py- Semantic similarity detection - Cache:
decision_graph/cache.py- Two-tier LRU caching - Maintenance:
decision_graph/maintenance.py- Stats and health checks - Workers:
decision_graph/workers.py- Async background processing
See Also
- CLAUDE.md: Decision Graph Memory Architecture section
- Tests:
tests/unit/test_decision_graph*.py- Unit tests with examples - Integration tests:
tests/integration/test_*memory*.py- Full workflow tests - Performance tests:
tests/integration/test_performance.py- Latency benchmarks
Repository
