python-json-parsing
Python JSON parsing best practices covering performance optimization (orjson/msgspec), handling large files (streaming/JSONL), security (injection prevention), and advanced querying (JSONPath/JMESPath). Use when working with JSON data, parsing APIs, handling large JSON files, or optimizing JSON performance.
$ Instalar
git clone https://github.com/basher83/lunar-claude /tmp/lunar-claude && cp -r /tmp/lunar-claude/plugins/devops/python-tools/skills/python-json-parsing ~/.claude/skills/lunar-claude// tip: Run this command in your terminal to install the skill
name: python-json-parsing description: > Python JSON parsing best practices covering performance optimization (orjson/msgspec), handling large files (streaming/JSONL), security (injection prevention), and advanced querying (JSONPath/JMESPath). Use when working with JSON data, parsing APIs, handling large JSON files, or optimizing JSON performance.
Python JSON Parsing Best Practices
Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.
Quick Start
Basic JSON Parsing
import json
# Parse JSON string
data = json.loads('{"name": "Alice", "age": 30}')
# Parse JSON file
with open("data.json", "r", encoding="utf-8") as f:
data = json.load(f)
# Write JSON file
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
Key Rule: Always specify encoding="utf-8" when reading/writing files.
When to Use This Skill
Use this skill when:
- Working with JSON APIs or data interchange
- Optimizing JSON performance in high-throughput applications
- Handling large JSON files (> 100MB)
- Securing applications against JSON injection
- Extracting data from complex nested JSON structures
Performance: Choose the Right Library
Library Comparison (10,000 records benchmark)
| Library | Serialize (s) | Deserialize (s) | Best For |
|---|---|---|---|
| orjson | 0.42 | 1.27 | FastAPI, web APIs (3.9x faster) |
| msgspec | 0.49 | 0.93 | Maximum performance (1.7x faster deserialization) |
| json (stdlib) | 1.62 | 1.62 | Universal compatibility |
| ujson | 1.41 | 1.85 | Drop-in replacement (2x faster) |
Recommendation:
- Use orjson for FastAPI/web APIs (native support, fastest serialization)
- Use msgspec for data pipelines (fastest overall, typed validation)
- Use json when compatibility is critical
Installation
# High-performance libraries
pip install orjson msgspec ujson
# Advanced querying
pip install jsonpath-ng jmespath
# Streaming large files
pip install ijson
# Schema validation
pip install jsonschema
Large Files: Streaming Strategies
For files > 100MB, avoid loading into memory.
Strategy 1: JSONL (JSON Lines)
Convert large JSON arrays to line-delimited format:
# Stream process JSONL
with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile:
for line in infile:
obj = json.loads(line)
obj["processed"] = True
outfile.write(json.dumps(obj) + "\n")
Strategy 2: Streaming with ijson
import ijson
# Process large JSON without loading into memory
with open("huge.json", "rb") as f:
for item in ijson.items(f, "products.item"):
process(item) # Handle one item at a time
See: patterns/streaming-large-json.md
Security: Prevent JSON Injection
Critical Rules:
- Always use
json.loads(), nevereval() - Validate input with
jsonschema - Sanitize user input before serialization
- Escape special characters (
"and\)
Vulnerable Code:
# NEVER DO THIS
username = request.GET['username'] # User input: admin", "role": "admin
json_string = f'{{"user":"{username}","role":"user"}}'
# Result: privilege escalation
Secure Code:
# Use json.dumps for serialization
data = {"user": username, "role": "user"}
json_string = json.dumps(data) # Properly escaped
See: anti-patterns/security-json-injection.md, anti-patterns/eval-usage.md
Advanced: JSONPath for Complex Queries
Extract data from nested JSON without complex loops:
import jsonpath_ng as jp
data = {
"products": [
{"name": "Apple", "price": 12.88},
{"name": "Peach", "price": 27.25}
]
}
# Filter by price
query = jp.parse("products[?price>20].name")
results = [match.value for match in query.find(data)]
# Output: ["Peach"]
Key Operators:
$- Root selector..- Recursive descendant*- Wildcard[?<predicate>]- Filter (e.g.,[?price > 20])[start:end:step]- Array slicing
See: patterns/jsonpath-querying.md
Custom Objects: Serialization
Handle datetime, UUID, Decimal, and custom classes:
from datetime import datetime
import json
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, set):
return list(obj)
return super().default(obj)
# Usage
data = {"timestamp": datetime.now(), "tags": {"python", "json"}}
json_str = json.dumps(data, cls=CustomEncoder)
See: patterns/custom-object-serialization.md
Performance Checklist
- Use orjson/msgspec for high-throughput applications
- Specify UTF-8 encoding when reading/writing files
- Use streaming (ijson/JSONL) for files > 100MB
- Minify JSON for production (
separators=(',', ':')) - Pretty-print for development (
indent=2)
Security Checklist
- Never use
eval()for JSON parsing - Validate input with
jsonschema - Sanitize user input before serialization
- Use
json.dumps()to prevent injection - Escape special characters in user data
Reference Documentation
Performance:
reference/python-json-parsing-best-practices-2025.md- Comprehensive research with benchmarks
Patterns:
patterns/streaming-large-json.md- ijson and JSONL strategiespatterns/custom-object-serialization.md- Handle datetime, UUID, custom classespatterns/jsonpath-querying.md- Advanced nested data extraction
Security:
anti-patterns/security-json-injection.md- Prevent injection attacksanti-patterns/eval-usage.md- Why never to use eval()
Examples:
examples/high-performance-parsing.py- orjson and msgspec codeexamples/large-file-streaming.py- Streaming with ijsonexamples/secure-validation.py- jsonschema validation
Tools:
tools/json-performance-benchmark.py- Benchmark different libraries
Repository
