codebase-context-extractor

This skill provides a comprehensive context extraction system for large codebases. It intelligently analyzes code structure, dependencies, and relationships to extract relevant context for understanding, debugging, or modifying code.

$ Installieren

git clone https://github.com/lofcz/LLMTornado /tmp/LLMTornado && cp -r /tmp/LLMTornado/src/LlmTornado.Demo/Static/Files/Skills/codebase-context-extractor ~/.claude/skills/LLMTornado

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: codebase-context-extractor description: This skill provides a comprehensive context extraction system for large codebases. It intelligently analyzes code structure, dependencies, and relationships to extract relevant context for understanding, debugging, or modifying code.

Codebase Context Extractor Skill

Overview

This skill provides a comprehensive context extraction system for large codebases. It intelligently analyzes code structure, dependencies, and relationships to extract relevant context for understanding, debugging, or modifying code.

Trigger Words

"extract context"
"codebase context"
"code context"
"analyze codebase"
"codebase analysis"
"code structure"
"dependency analysis"
"code relationships"
"understand codebase"
"map codebase"

When to Use This Skill

Use this skill when you need to:

Understand the structure and organization of a large codebase
Extract relevant context for a specific function, class, or module
Analyze dependencies and relationships between code components
Generate documentation or summaries of code sections
Prepare context for code modifications or debugging
Identify entry points and execution flows
Map out API surfaces and public interfaces
Understand data flow and state management

Instructions

When this skill is triggered, execute the context_extractor.py script with appropriate parameters.

Basic Usage

python /projects/workspace/codebase-context-extractor/context_extractor.py \
  --target-path <path_to_codebase> \
  --mode <extraction_mode> \
  --output <output_file>

Extraction Modes

full - Complete codebase analysis with all components
targeted - Focus on specific files, functions, or classes
dependency - Map dependencies and imports
flow - Trace execution flows and call chains
api - Extract public interfaces and API surfaces
data - Analyze data structures and models
hierarchy - Show class hierarchies and inheritance
summary - Generate high-level overview

Parameters

--target-path (required): Path to the codebase to analyze
--mode (required): Extraction mode (see above)
--output (optional): Output file path (default: stdout)
--focus (optional): Specific file, class, or function to focus on
--depth (optional): Maximum depth for traversal (default: unlimited)
--include-tests (optional): Include test files in analysis (default: false)
--language (optional): Programming language (auto-detected if not specified)
--format (optional): Output format (markdown, json, yaml, text) (default: markdown)
--exclude (optional): Patterns to exclude (comma-separated)

Examples

Full codebase analysis:

python context_extractor.py --target-path ./my-project --mode full --output context.md

Targeted analysis of a specific class:

python context_extractor.py --target-path ./my-project --mode targeted --focus "UserService" --output user_service_context.md

Dependency mapping:

python context_extractor.py --target-path ./my-project --mode dependency --format json --output dependencies.json

Execution flow analysis:

python context_extractor.py --target-path ./my-project --mode flow --focus "main" --depth 5

Output Structure

The extractor generates structured output including:

For Full/Targeted Mode

Project Overview: Language, structure, entry points
File Organization: Directory structure and file purposes
Key Components: Important classes, functions, modules
Dependencies: External and internal dependencies
Code Metrics: Lines of code, complexity estimates
Context Summary: High-level understanding

For Dependency Mode

Dependency Graph: Visual representation of dependencies
Import Analysis: All imports and their usage
Circular Dependencies: Detection and reporting
Unused Dependencies: Potential cleanup targets

For Flow Mode

Call Chains: Function call sequences
Entry Points: Main execution paths
Exit Points: Return and error handling
Branch Analysis: Conditional execution paths

For API Mode

Public Interfaces: Exported functions and classes
API Documentation: Signatures and docstrings
Usage Examples: How to use the API
Versioning Info: API version and compatibility

Advanced Features

Smart Context Window Management

The extractor automatically manages context size to fit within LLM token limits:

Prioritizes most relevant code sections
Provides summaries for less critical parts
Includes breadcrumb navigation for context

Multi-Language Support

Supports analysis of:

Python
JavaScript/TypeScript
Java
C#
Go
Rust
C/C++
Ruby
PHP
And more (extensible)

Intelligent Filtering

Excludes generated code, build artifacts, and vendor directories
Focuses on business logic and core functionality
Configurable exclusion patterns

Integration with Other Tools

The context extractor output can be used with:

Documentation generators
Code review tools
Refactoring assistants
Bug tracking systems
Development environments

Best Practices

Start with Summary Mode: Get a high-level overview before diving deep
Use Targeted Mode for Specific Tasks: Focus on relevant code sections
Combine with Dependency Analysis: Understand impact of changes
Leverage Flow Analysis for Debugging: Trace execution paths
Regular Updates: Re-run analysis as codebase evolves

Notes

Large codebases may take time to analyze
Consider using depth limits for very large projects
JSON output is best for programmatic processing
Markdown output is best for human reading
The tool respects .gitignore patterns by default