Marketplace

data-extraction-patterns

Common patterns for extracting analytics data from GA4 and GSC with API handling

$ 설치

git clone https://github.com/MadAppGang/claude-code /tmp/claude-code && cp -r /tmp/claude-code/plugins/seo/skills/data-extraction-patterns ~/.claude/skills/claude-code

// tip: Run this command in your terminal to install the skill


name: data-extraction-patterns description: Common patterns for extracting analytics data from GA4 and GSC with API handling

Data Extraction Patterns

When to Use

  • Setting up analytics data pipelines
  • Combining data from multiple sources
  • Handling API rate limits and errors
  • Caching frequently accessed data
  • Building data collection workflows

API Reference

Google Analytics 4 (GA4)

MCP Server: mcp-server-google-analytics

Key Operations:

get_report({
  propertyId: "properties/123456789",
  dateRange: { startDate: "30daysAgo", endDate: "today" },
  dimensions: ["pagePath", "date"],
  metrics: ["screenPageViews", "averageSessionDuration", "bounceRate"]
})

Useful Metrics:

MetricDescriptionUse Case
screenPageViewsTotal page viewsTraffic volume
sessionsUser sessionsVisitor count
averageSessionDurationAvg time in sessionEngagement
bounceRateSingle-page visitsContent quality
engagementRateEngaged sessions %True engagement
scrolledUsersUsers who scrolledContent consumption

Useful Dimensions:

DimensionDescription
pagePathURL path
dateDate (for trending)
sessionSourceTraffic source
deviceCategoryDesktop/mobile/tablet

Google Search Console (GSC)

MCP Server: mcp-server-gsc

Key Operations:

search_analytics({
  siteUrl: "https://example.com",
  startDate: "2025-11-27",
  endDate: "2025-12-27",
  dimensions: ["query", "page"],
  rowLimit: 1000
})

get_url_inspection({
  siteUrl: "https://example.com",
  inspectionUrl: "https://example.com/page"
})

Available Metrics:

MetricDescriptionUse Case
clicksTotal clicks from searchTraffic from Google
impressionsTimes shown in resultsVisibility
ctrClick-through rateSnippet effectiveness
positionAverage rankingSEO success

Dimensions:

DimensionDescription
querySearch query
pageLanding page URL
countryUser country
deviceDesktop/mobile/tablet
dateDate (for trending)

Parallel Execution Pattern

Optimal Data Fetch (All Sources)

## Parallel Data Fetch Pattern

When fetching from multiple sources, issue all requests in a SINGLE message
for parallel execution:

┌─────────────────────────────────────────────────────────────────┐
│  MESSAGE 1: Parallel Data Requests                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  [MCP Call 1]: google-analytics.get_report(...)                 │
│  [MCP Call 2]: google-search-console.search_analytics(...)      │
│                                                                  │
│  → All execute simultaneously                                    │
│  → Results return when all complete                              │
│  → ~2x faster than sequential                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Sequential (When Needed)

Some operations require sequential execution:

## Sequential Pattern (Dependencies)

When one request depends on another's result:

┌─────────────────────────────────────────────────────────────────┐
│  MESSAGE 1: Get list of pages                                   │
│  → Returns: ["/page1", "/page2", "/page3"]                      │
├─────────────────────────────────────────────────────────────────┤
│  MESSAGE 2: Get details for each page                           │
│  → Uses page list from Message 1                                │
│  → Can parallelize within this message                          │
└─────────────────────────────────────────────────────────────────┘

Rate Limiting

API Rate Limits

APILimitStrategy
GA410 QPS per propertyBatch dimensions
GSC1,200 requests/minPaginate large exports

Retry Pattern

#!/bin/bash
# Retry with exponential backoff

MAX_RETRIES=3
RETRY_DELAY=5

fetch_with_retry() {
    local url="$1"
    local attempt=1

    while [ $attempt -le $MAX_RETRIES ]; do
        response=$(curl -s -w "%{http_code}" -o /tmp/response.json "$url")
        http_code="${response: -3}"

        if [ "$http_code" = "200" ]; then
            cat /tmp/response.json
            return 0
        elif [ "$http_code" = "429" ]; then
            echo "Rate limited, waiting ${RETRY_DELAY}s..." >&2
            sleep $RETRY_DELAY
            RETRY_DELAY=$((RETRY_DELAY * 2))
        else
            echo "Error: HTTP $http_code" >&2
            return 1
        fi

        attempt=$((attempt + 1))
    done

    echo "Max retries exceeded" >&2
    return 1
}

Caching Pattern

Session-Based Cache

# Cache structure
SESSION_PATH="/tmp/seo-performance-20251227-143000-example"
CACHE_DIR="${SESSION_PATH}/cache"
CACHE_TTL=3600  # 1 hour in seconds

mkdir -p "$CACHE_DIR"

# Cache key generation
cache_key() {
    echo "$1" | md5sum | cut -d' ' -f1
}

# Check cache
get_cached() {
    local key=$(cache_key "$1")
    local cache_file="${CACHE_DIR}/${key}.json"

    if [ -f "$cache_file" ]; then
        local age=$(($(date +%s) - $(stat -f%m "$cache_file" 2>/dev/null || stat -c%Y "$cache_file")))
        if [ $age -lt $CACHE_TTL ]; then
            cat "$cache_file"
            return 0
        fi
    fi
    return 1
}

# Save to cache
save_cache() {
    local key=$(cache_key "$1")
    local cache_file="${CACHE_DIR}/${key}.json"
    cat > "$cache_file"
}

# Usage
CACHE_KEY="ga4_${URL}_${DATE_RANGE}"
if ! RESULT=$(get_cached "$CACHE_KEY"); then
    RESULT=$(fetch_from_api)
    echo "$RESULT" | save_cache "$CACHE_KEY"
fi

Date Range Standardization

Common Date Ranges

# Standard date range calculations
TODAY=$(date +%Y-%m-%d)

case "$RANGE" in
    "7d")
        START_DATE=$(date -v-7d +%Y-%m-%d 2>/dev/null || date -d "7 days ago" +%Y-%m-%d)
        ;;
    "30d")
        START_DATE=$(date -v-30d +%Y-%m-%d 2>/dev/null || date -d "30 days ago" +%Y-%m-%d)
        ;;
    "90d")
        START_DATE=$(date -v-90d +%Y-%m-%d 2>/dev/null || date -d "90 days ago" +%Y-%m-%d)
        ;;
    "mtd")
        START_DATE=$(date +%Y-%m-01)
        ;;
    "ytd")
        START_DATE=$(date +%Y-01-01)
        ;;
esac

END_DATE="$TODAY"

API-Specific Formats

APIFormatExample
GA4Relative or ISO"30daysAgo", "2025-12-01"
GSCISO 8601"2025-12-01"

Graceful Degradation

Data Source Fallback

## Fallback Strategy

When a data source is unavailable:

┌─────────────────────────────────────────────────────────────────┐
│  PRIMARY SOURCE      │  FALLBACK           │  LAST RESORT       │
├──────────────────────┼─────────────────────┼────────────────────┤
│  GA4 traffic data    │  GSC clicks         │  Estimate from GSC │
│  GSC search perf     │  Manual SERP check  │  WebSearch SERP    │
│  CWV (CrUX)          │  PageSpeed API      │  Lighthouse CLI    │
└──────────────────────┴─────────────────────┴────────────────────┘

Partial Data Output

## Analysis Report (Partial Data)

### Data Availability

| Source | Status | Impact |
|--------|--------|--------|
| GA4 | NOT CONFIGURED | Missing engagement metrics |
| GSC | AVAILABLE | Full search data |

### Analysis Notes

This analysis is based on limited data sources:
- Search performance metrics are complete (GSC)
- Engagement metrics unavailable (no GA4)

**Recommendation**: Configure GA4 for complete analysis.
Run `/setup-analytics` to add Google Analytics.

Unified Data Model

Combined Output Structure

{
  "metadata": {
    "url": "https://example.com/page",
    "fetchedAt": "2025-12-27T14:30:00Z",
    "dateRange": {
      "start": "2025-11-27",
      "end": "2025-12-27"
    }
  },
  "sources": {
    "ga4": {
      "available": true,
      "metrics": {
        "pageViews": 2450,
        "avgTimeOnPage": 222,
        "bounceRate": 38.2,
        "engagementRate": 64.5
      }
    },
    "gsc": {
      "available": true,
      "metrics": {
        "impressions": 15200,
        "clicks": 428,
        "ctr": 2.82,
        "avgPosition": 4.2
      },
      "topQueries": [
        {"query": "seo guide", "clicks": 156, "position": 4}
      ]
    }
  },
  "computed": {
    "healthScore": 72,
    "status": "GOOD"
  }
}

Error Handling

Common Errors

ErrorCauseResolution
401 UnauthorizedInvalid/expired credentialsRe-run /setup-analytics
403 ForbiddenMissing permissionsCheck API access in console
429 Too Many RequestsRate limitWait and retry with backoff
404 Not FoundInvalid property/siteVerify IDs in configuration
500 Server ErrorAPI issueRetry later, check status page

Error Output Pattern

## Data Fetch Error

**Source**: Google Analytics 4
**Error**: 403 Forbidden
**Message**: "User does not have sufficient permissions for this property"

### Troubleshooting Steps

1. Verify Service Account email in GA4 Admin
2. Ensure "Viewer" role is granted
3. Check Analytics Data API is enabled
4. Wait 5 minutes for permission propagation

### Workaround

Proceeding with available data sources (GSC).
GA4 engagement metrics will not be included in this analysis.