csv-data-synthesizer
Generate realistic CSV data files with proper headers, data types, and configurable row counts for testing, demos, and development. Triggers on "create CSV data", "generate CSV file", "fake CSV for", "sample data CSV", "test data spreadsheet".
$ インストール
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/testing/csv-data-synthesizer ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
name: csv-data-synthesizer description: Generate realistic CSV data files with proper headers, data types, and configurable row counts for testing, demos, and development. Triggers on "create CSV data", "generate CSV file", "fake CSV for", "sample data CSV", "test data spreadsheet".
CSV Data Synthesizer
Generate production-realistic CSV data files with proper structure, realistic values, and consistent formatting.
Output Requirements
File Output: .csv files with valid CSV structure
Naming Convention: {dataset-name}-data.csv (e.g., customers-data.csv)
Encoding: UTF-8
Line Endings: Unix (LF)
When Invoked
Immediately generate a complete, valid CSV file. Default to 20 rows if count not specified.
CSV Formatting Rules
Headers
- First row is always headers
- Use snake_case:
first_name,order_total,created_at - No spaces in header names
- Descriptive but concise
Quoting
- Quote fields containing commas, quotes, or newlines
- Escape internal quotes by doubling:
"She said ""hello""" - Prefer quoting text fields for safety
Data Types
- Dates:
YYYY-MM-DDformat (2024-01-15) - Datetimes:
YYYY-MM-DD HH:MM:SSformat - Currency: Decimal without symbol (99.99)
- Booleans:
true/false(lowercase) - Empty values: Empty string, not NULL or N/A
Domain Templates
Customer/User Data
id,email,first_name,last_name,phone,company,created_at
1,john.doe@example.com,John,Doe,+1-555-0101,Acme Corp,2024-01-15
2,jane.smith@techstart.io,Jane,Smith,+1-555-0102,TechStart Inc,2024-01-16
Fields available: id, email, first_name, last_name, full_name, phone, company, job_title, department, address, city, state, country, postal_code, created_at, updated_at, status, tier
E-commerce Orders
order_id,customer_id,order_date,status,subtotal,tax,shipping,total,items_count
ORD-001,CUST-123,2024-01-15,completed,89.99,7.20,5.99,103.18,3
ORD-002,CUST-456,2024-01-16,processing,149.50,11.96,0.00,161.46,2
Fields available: order_id, customer_id, customer_email, order_date, status, subtotal, tax, discount, shipping, total, items_count, payment_method, shipping_method, tracking_number
Product Catalog
sku,name,category,price,cost,quantity,weight_kg,status
WBH-001,Wireless Headphones,Electronics,79.99,32.00,150,0.25,active
TPT-002,Thermal Printer,Office,299.99,145.00,45,2.10,active
Fields available: sku, product_id, name, description, category, subcategory, brand, price, cost, margin, quantity, reorder_point, weight_kg, dimensions, status, created_at
Employee/HR Data
employee_id,email,first_name,last_name,department,job_title,hire_date,salary,manager_id
EMP001,john.doe@company.com,John,Doe,Engineering,Software Engineer,2022-03-15,95000,EMP010
EMP002,jane.smith@company.com,Jane,Smith,Marketing,Marketing Manager,2021-08-01,85000,EMP015
Fields available: employee_id, email, first_name, last_name, department, job_title, hire_date, salary, bonus, manager_id, location, status, performance_rating
Financial Transactions
transaction_id,date,account_id,type,category,amount,balance,description
TXN-0001,2024-01-15,ACC-123,debit,utilities,-125.50,4874.50,Electric Company Payment
TXN-0002,2024-01-16,ACC-123,credit,salary,3500.00,8374.50,Monthly Salary Deposit
Fields available: transaction_id, date, account_id, type, category, amount, balance, description, merchant, reference_number, status
Time Series / Metrics
timestamp,metric_name,value,unit,source
2024-01-15 00:00:00,cpu_usage,45.2,percent,server-01
2024-01-15 00:05:00,cpu_usage,52.8,percent,server-01
2024-01-15 00:10:00,cpu_usage,38.1,percent,server-01
Fields available: timestamp, date, metric_name, value, unit, source, environment, tags, min, max, avg
Survey/Form Responses
response_id,submitted_at,q1_satisfaction,q2_recommend,q3_comments,nps_score
RSP-001,2024-01-15 14:30:00,5,yes,"Great service, very helpful",9
RSP-002,2024-01-15 15:45:00,4,yes,"Good but could be faster",7
Data Generation Rules
Realistic Distributions
- Don't make all values perfectly distributed
- Include some edge cases (empty optional fields, boundary values)
- Vary string lengths naturally
- Use realistic value ranges
Consistency
- Related fields should be consistent (city matches postal code region)
- Dates should be chronologically sensible
- IDs should be unique
- Foreign keys should reference valid parent IDs
Variation
- Mix case variations where appropriate
- Include international formats (phone, address) if specified
- Vary the "completeness" of records (some with all optional fields, some without)
Validation Checklist
Before outputting, verify:
- Header row present
- Consistent column count across all rows
- Proper quoting for fields with commas
- No trailing commas
- UTF-8 encoding compatible
- Dates in consistent format
- Numbers without currency symbols
- No formula injection risks (fields starting with =, +, -, @)
Example Invocations
Prompt: "Generate CSV with 50 customer records including address"
Output: Complete customers-data.csv with 50 rows, full address fields.
Prompt: "Create sample sales data CSV for Q1 2024"
Output: Complete sales-q1-2024-data.csv with daily sales records Jan-Mar.
Prompt: "CSV test data for employee database import"
Output: Complete employees-data.csv with realistic HR data structure.
Repository
