Python JSON
UTF-8 JSON file I/O utilities to avoid Windows encoding issues (CP-1252 vs UTF-8)
$ 安裝
git clone https://github.com/lawless-m/claude-skills /tmp/claude-skills && cp -r /tmp/claude-skills/.claude/skills/PythonJson ~/.claude/skills/claude-skills// tip: Run this command in your terminal to install the skill
name: Python JSON description: UTF-8 JSON file I/O utilities to avoid Windows encoding issues (CP-1252 vs UTF-8)
Python JSON
Python on Windows defaults to the system encoding (usually CP-1252) when opening files, not UTF-8. This causes mojibake when reading JSON files containing non-ASCII characters like £, €, or accented letters.
Instructions
When helping users with JSON file operations in Python, follow these guidelines:
-
Never use raw file operations for JSON: Always use a utility module that abstracts the encoding issue
-
Always specify UTF-8 encoding: When opening files for JSON, use
encoding='utf-8'explicitly -
Use ensure_ascii=False when writing: This preserves Unicode characters instead of escaping them to
\uXXXX -
Create a json_io utility module: For any project that reads/writes JSON files, create a dedicated utility
-
RFC 8259 compliance: JSON text must be encoded in UTF-8 per the JSON specification
Examples
Example 1: Project needs JSON file operations
User: Help me read and write JSON files in my Python project
Claude: I'll create a json_io utility module to abstract the Windows encoding issue:
- Create utils/json_io.py with load_json and save_json functions
- Both functions explicitly use UTF-8 encoding
- save_json uses ensure_ascii=False to preserve Unicode
[Provides implementation using the json_io pattern from reference material below]
Example 2: Mojibake in JSON data
User: My JSON file shows £ as £ when I read it
Claude: This is the UTF-8 double-encoding bug on Windows. The £ character (U+00A3, bytes C2 A3 in UTF-8) is being read as CP-1252, where those bytes represent £.
Solution: Use explicit UTF-8 encoding when opening files:
- with open(path, 'r', encoding='utf-8') as f
I'll create a json_io utility module to prevent this issue project-wide.
[Provides implementation using the json_io pattern from reference material below]
Example 3: Writing JSON with special characters
User: When I save JSON, my £ symbols become \u00a3
Claude: Python's json.dump() defaults to ensure_ascii=True, which escapes non-ASCII characters. To preserve them:
- Use ensure_ascii=False in json.dump()
- Always specify encoding='utf-8' when opening the file
[Provides implementation using the json_io pattern from reference material below]
Reference Implementation Details
The sections below contain proven working code that the examples above reference.
Reference Files in This Folder:
json_io.py- Complete UTF-8 JSON I/O utility (copy to utils/)
json_io Pattern
Purpose: Abstract JSON file operations to always use UTF-8 encoding
Code Example
"""
JSON file I/O utilities with proper UTF-8 encoding.
This module abstracts JSON file operations to handle the well-known
issue of Python defaulting to system encoding on Windows instead of
UTF-8 (which is the default encoding per RFC 8259).
"""
import json
from pathlib import Path
from typing import Any, Union
def load_json(path: Union[str, Path]) -> Any:
"""
Load JSON from a file with proper UTF-8 encoding.
Args:
path: Path to the JSON file
Returns:
Parsed JSON data
"""
with open(path, 'r', encoding='utf-8') as f:
return json.load(f)
def save_json(path: Union[str, Path], data: Any, indent: int = 2) -> None:
"""
Save data to a JSON file with proper UTF-8 encoding.
Args:
path: Path to the JSON file
data: Data to serialize
indent: Indentation level (default: 2)
"""
with open(path, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=indent, ensure_ascii=False)
Key Points:
- Always use
encoding='utf-8'when opening files - Use
ensure_ascii=Falseto preserve Unicode characters in output - Accept both str and Path objects for flexibility
- Default indent of 2 for human-readable output
Usage Pattern
from utils.json_io import load_json, save_json
# Reading
data = load_json('config.json')
# Writing
save_json('output.json', {'price': '£99.99'})
The Problem Explained
Windows Default Encoding
Python's open() function uses locale.getencoding() to determine the default encoding. On Windows, this is typically CP-1252 (Windows-1252), not UTF-8.
The Double-Encoding Bug
When a UTF-8 file is read with CP-1252 encoding:
| Character | UTF-8 bytes | CP-1252 interpretation |
|---|---|---|
| £ (U+00A3) | C2 A3 | £ |
| € (U+20AC) | E2 82 AC | € |
| é (U+00E9) | C3 A9 | é |
RFC 8259 Requirement
Per RFC 8259 (The JSON Data Interchange Format):
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.
Troubleshooting
Mojibake Characters
Symptom: Characters like £, €, é appear in your data
Cause: UTF-8 file read with CP-1252 encoding
Solution: Use the json_io utility module or add encoding='utf-8' to file open calls
\uXXXX Escapes in Output
Symptom: Non-ASCII characters appear as \u00a3 in JSON output
Cause: json.dump() default ensure_ascii=True
Solution: Use ensure_ascii=False in json.dump()
Best Practices Summary
- Create a
utils/json_io.pymodule in every Python project that handles JSON files - Never use raw
json.load(open(...))without explicit UTF-8 encoding - Use
ensure_ascii=Falsewhen writing JSON to preserve readability - Consider setting
PYTHONUTF8=1environment variable for system-wide UTF-8 default (Python 3.7+)
Repository
