markdown-content-formatter

Format and validate markdown documents with auto-generated TOC, frontmatter, structure validation, and cross-reference linking. Export to GitHub/CommonMark/Jekyll/Hugo.

$ インストール

git clone https://github.com/dkyazzentwatwa/chatgpt-skills /tmp/chatgpt-skills && cp -r /tmp/chatgpt-skills/markdown-content-formatter ~/.claude/skills/chatgpt-skills

// tip: Run this command in your terminal to install the skill


name: markdown-content-formatter description: Format and validate markdown documents with auto-generated TOC, frontmatter, structure validation, and cross-reference linking. Export to GitHub/CommonMark/Jekyll/Hugo.

Markdown Content Formatter

Structure, validate, and format long-form markdown content for documentation, blogs, and static site generators. Auto-generate tables of contents, add frontmatter, validate structure, and convert between markdown flavors.

Workflow

The markdown formatting process follows these steps:

  1. Load - Read markdown file or content
  2. Validate - Check heading hierarchy, broken links, structure issues
  3. Format - Apply formatting rules (spacing, code blocks, etc.)
  4. Generate - Add TOC, frontmatter, cross-references
  5. Export - Save in target markdown flavor

Quick Start

from scripts.markdown_formatter import MarkdownFormatter

# Load and format markdown
formatter = MarkdownFormatter(file_path='document.md')

# Generate table of contents
toc = formatter.generate_toc(max_depth=3)

# Validate structure
validation = formatter.validate_structure()
if not validation['valid']:
    print("Issues found:")
    for error in validation['errors']:
        print(f"  - {error['message']}")

# Add frontmatter
formatter.add_frontmatter({
    'title': 'My Document',
    'author': 'John Doe',
    'date': '2024-01-15'
})

# Export formatted version
formatter.export(
    output_path='formatted.md',
    include_toc=True,
    target_flavor='github'
)

Formatting Operations

1. Table of Contents Generation

Auto-generate TOC from document heading structure:

  • Customizable depth (H2, H3, etc.)
  • GitHub-style anchor links
  • Numbered or bulleted format
  • Smart indentation based on heading levels

2. Frontmatter Management

Add YAML/TOML/JSON frontmatter for static site generators:

  • YAML (---) for Jekyll/Hugo
  • TOML (+++) for Hugo
  • JSON for custom parsers
  • Structured metadata (title, author, date, tags, etc.)

3. Structure Validation

Check document structure for common issues:

  • Heading hierarchy - Detect skipped levels (H2 → H4)
  • Broken links - Find invalid internal (#anchors) and external links
  • Duplicate headings - Identify heading ID conflicts
  • Missing elements - Check for required sections

4. Code Block Formatting

Enhance code blocks with syntax highlighting markers:

  • Add language tags to fenced code blocks
  • Convert indented code to fenced blocks
  • Default language specification
  • Consistent formatting

5. Cross-Reference Linking

Auto-link headings and create cross-references:

  • Generate unique heading IDs
  • Link section mentions (e.g., "see Introduction")
  • Create anchor links for internal navigation
  • Handle duplicate heading names

6. Spacing and Consistency

Apply consistent formatting rules:

  • Line breaks around headings
  • List formatting (bullets, numbers)
  • Code block spacing
  • Paragraph breaks
  • Horizontal rules

7. Flavor Conversion

Convert between markdown flavors:

  • GitHub Flavored Markdown - Task lists, tables, syntax highlighting
  • CommonMark - Standard specification
  • Jekyll - Liquid templates, includes
  • Hugo - Shortcodes, taxonomies

Validation Checks

The validator identifies these common issues:

Issue TypeDescriptionExample
Heading SkipLevel jumps (H2 → H4)Missing H3 between H2 and H4
Broken LinkInvalid internal/external link[link](#missing-section)
Duplicate HeadingSame heading appears multiple timesTwo "Introduction" headings
Missing IDHeading lacks unique identifierAnchor link fails
Invalid StructureIncorrect nesting or formattingList inside heading

API Reference

MarkdownFormatter

Initialization:

formatter = MarkdownFormatter(
    file_path='document.md',  # OR
    content='# Markdown text...'
)

Parameters:

  • file_path (str): Path to markdown file (optional)
  • content (str): Direct markdown content (optional)

One of file_path or content must be provided.

Table of Contents

generate_toc()

toc = formatter.generate_toc(
    max_depth=3,        # Max heading level (1-6)
    start_level=2,      # Start from H2 (skip H1)
    style='github'      # 'github', 'numbered', 'bullets'
)

Returns: TOC markdown string

Styles:

  • github - Bulleted list with anchor links
  • numbered - Numbered outline
  • bullets - Simple bullet list

Example Output (github style):

## Table of Contents

- [Introduction](#introduction)
- [Getting Started](#getting-started)
  - [Installation](#installation)
  - [Configuration](#configuration)
- [Advanced Topics](#advanced-topics)

Frontmatter

add_frontmatter()

content = formatter.add_frontmatter(
    metadata={
        'title': 'Document Title',
        'author': 'John Doe',
        'date': '2024-01-15',
        'tags': ['markdown', 'documentation']
    },
    format='yaml'  # 'yaml', 'toml', or 'json'
)

Returns: Markdown content with frontmatter prepended

Example Output (YAML):

---
title: Document Title
author: John Doe
date: 2024-01-15
tags:
  - markdown
  - documentation
---

Validation

validate_structure()

result = formatter.validate_structure()

Returns: Dictionary with validation results

{
    'valid': bool,
    'errors': [
        {
            'type': 'heading_skip',
            'line': 45,
            'message': 'Heading level jumps from H2 to H4'
        }
    ],
    'warnings': [
        {
            'type': 'duplicate_heading',
            'line': 120,
            'message': 'Heading "Introduction" appears multiple times'
        }
    ]
}

Code Blocks

format_code_blocks()

content = formatter.format_code_blocks(
    add_language_tags=True,
    default_language='text'
)

Returns: Markdown with formatted code blocks

Converts:

    code here

To:

```text
code here
```

Cross-References

auto_link_headings()

content = formatter.auto_link_headings()

Returns: Markdown with heading IDs and cross-reference links

Generates GitHub-style anchors:

  • # Getting Started<a id="getting-started"></a>
  • Links "see Getting Started" → [Getting Started](#getting-started)

Spacing

fix_spacing()

content = formatter.fix_spacing()

Returns: Markdown with consistent spacing

Applies rules:

  • 2 blank lines before H1
  • 1 blank line before H2-H6
  • 1 blank line around code blocks
  • 1 blank line around lists

Flavor Conversion

convert_to_flavor()

content = formatter.convert_to_flavor(target='jekyll')

Parameters:

  • target (str): 'github', 'commonmark', 'jekyll', or 'hugo'

Returns: Converted markdown string

Export

export()

formatter.export(
    output_path='formatted.md',
    include_toc=True,
    include_frontmatter=True,
    target_flavor='github'
)

Parameters:

  • output_path (str): Output file path
  • include_toc (bool): Add TOC at beginning
  • include_frontmatter (bool): Preserve/add frontmatter
  • target_flavor (str): Target markdown flavor

CLI Usage

Generate TOC

python scripts/markdown_formatter.py \
    --input document.md \
    --toc \
    --toc-depth 3 \
    --toc-style github \
    --output formatted.md

Add Frontmatter

# From command line
python scripts/markdown_formatter.py \
    --input document.md \
    --frontmatter title="My Doc" author="John Doe" date="2024-01-15" \
    --output formatted.md

# From file
python scripts/markdown_formatter.py \
    --input document.md \
    --frontmatter-file metadata.yaml \
    --output formatted.md

Validate Structure

python scripts/markdown_formatter.py \
    --input document.md \
    --validate \
    --format json

Output:

{
  "valid": false,
  "errors": [
    {
      "type": "heading_skip",
      "line": 45,
      "message": "Heading level jumps from H2 to H4"
    }
  ],
  "warnings": []
}

Full Formatting

python scripts/markdown_formatter.py \
    --input document.md \
    --toc \
    --frontmatter title="My Doc" \
    --auto-link \
    --fix-spacing \
    --flavor github \
    --output formatted.md

Batch Processing

# Format all markdown files in directory
for file in docs/*.md; do
    python scripts/markdown_formatter.py \
        --input "$file" \
        --toc \
        --fix-spacing \
        --output "formatted/$file"
done

CLI Arguments

ArgumentDescriptionDefault
--input, -iInput markdown fileRequired
--output, -oOutput file pathstdout
--tocGenerate table of contentsFalse
--toc-depthMax TOC depth (1-6)3
--toc-styleTOC style (github/numbered/bullets)github
--frontmatterKey=value pairs for frontmatter-
--frontmatter-fileYAML file with frontmatter-
--auto-linkAuto-link headingsFalse
--fix-spacingFix spacing and formattingFalse
--flavorTarget markdown flavorgithub
--validateValidate structure onlyFalse
--formatOutput format for validation (json/text)text

Examples

Example 1: Auto-Generate TOC

formatter = MarkdownFormatter(file_path='guide.md')
toc = formatter.generate_toc(max_depth=3, style='github')

print(toc)
# ## Table of Contents
# - [Introduction](#introduction)
# - [Setup](#setup)
#   - [Installation](#installation)
#   - [Configuration](#configuration)

Example 2: Add Jekyll Frontmatter

formatter = MarkdownFormatter(file_path='post.md')

formatter.add_frontmatter({
    'layout': 'post',
    'title': 'Getting Started with Markdown',
    'date': '2024-01-15',
    'categories': ['tutorial', 'markdown'],
    'tags': ['beginner', 'documentation']
}, format='yaml')

formatter.export('_posts/2024-01-15-getting-started.md')

Example 3: Validate Document Structure

formatter = MarkdownFormatter(file_path='documentation.md')
result = formatter.validate_structure()

if not result['valid']:
    print("Errors found:")
    for error in result['errors']:
        print(f"Line {error['line']}: {error['message']}")

    print("\nWarnings:")
    for warning in result['warnings']:
        print(f"Line {warning['line']}: {warning['message']}")
else:
    print("Document structure is valid!")

Example 4: Fix Common Issues

formatter = MarkdownFormatter(file_path='messy.md')

# Fix spacing issues
formatter.fix_spacing()

# Format code blocks
formatter.format_code_blocks(default_language='python')

# Add heading IDs
formatter.auto_link_headings()

# Export cleaned version
formatter.export('clean.md', target_flavor='github')

Example 5: Convert for Hugo Static Site

formatter = MarkdownFormatter(file_path='article.md')

# Add Hugo frontmatter
formatter.add_frontmatter({
    'title': 'My Article',
    'date': '2024-01-15T10:00:00Z',
    'draft': False,
    'tags': ['hugo', 'static-site'],
    'categories': ['web-development']
}, format='toml')

# Generate TOC
toc = formatter.generate_toc(max_depth=2)

# Convert to Hugo flavor
formatter.convert_to_flavor('hugo')

# Export
formatter.export(
    output_path='content/posts/my-article.md',
    include_toc=True,
    target_flavor='hugo'
)

Example 6: Batch Validation

# Validate all markdown files
for file in docs/**/*.md; do
    echo "Validating $file..."
    python scripts/markdown_formatter.py \
        --input "$file" \
        --validate \
        --format json > "${file}.validation.json"
done

# Find files with errors
jq -r 'select(.valid == false) | input_filename' docs/**/*.validation.json

Dependencies

markdown>=3.5.0
pyyaml>=6.0.0
beautifulsoup4>=4.12.0
pandas>=2.0.0

Install dependencies:

pip install -r scripts/requirements.txt

Limitations

  • Link Validation: External link checking requires network requests (not performed by default)
  • Markdown Parsing: Uses Python-Markdown library; some edge cases may differ from other parsers
  • Flavor Differences: Not all flavor-specific features are converted (e.g., Hugo shortcodes)
  • Heading Anchors: Anchor generation follows GitHub algorithm but may differ from other platforms
  • Code Language Detection: Automatic language detection is limited; manual tags recommended
  • Large Files: Very large files (>10MB) may be slow to process
  • Unicode: Some unicode characters in heading anchors may cause issues
  • Nested Lists: Complex nested list structures may not format perfectly
  • HTML in Markdown: Raw HTML blocks are preserved but not validated
  • Math Equations: LaTeX math equations are not parsed or validated

Markdown Flavor Notes

GitHub Flavored Markdown (GFM)

  • Task lists: - [ ] Task / - [x] Done
  • Tables with alignment
  • Strikethrough: ~~text~~
  • Automatic link detection

CommonMark

  • Strict specification adherence
  • No extensions (no task lists, no tables)
  • Predictable parsing

Jekyll

  • Liquid templating: {{ variable }}
  • Includes: {% include file.html %}
  • Frontmatter required

Hugo

  • Shortcodes: {{< shortcode >}}
  • TOML frontmatter preferred
  • Taxonomies (tags, categories)
  • Nested sections