name: test-ambiguity-detector description: Analyze test cases against specifications to find ambiguous assumptions. Use when tests might be making assumptions not explicitly defined in the spec. Invoke with /test-ambiguity-detector .

Test Ambiguity Detector

Compare a test file against its specification to identify where tests assume behavior that the spec doesn't explicitly define.

Usage: /test-ambiguity-detector file_backup checkpoint_1

Purpose

Tests should verify spec-defined behavior, not invent requirements. This skill finds cases where:

Tests assert specific behavior the spec is silent about
Tests assume implementation details not mandated by spec
Tests expect particular error messages/codes not specified
Tests require specific ordering/formatting beyond spec requirements

Step 1: Gather Context

Read the spec and test files for the specified problem/checkpoint:

problems/{problem}/checkpoint_N.md
problems/{problem}/tests/test_checkpoint_N.py
problems/{problem}/tests/conftest.py        # Fixtures and test setup

Also check for test data:

problems/{problem}/tests/data/checkpoint_N/  # Test case data (if present)

Step 2: Catalog Spec Requirements

For each section of the spec, extract:

Explicit requirements - "MUST", "SHALL", specific values, exact formats
Implicit requirements - Reasonable inferences from context
Undefined behaviors - What the spec does NOT address

Create a mental checklist:

Input validation rules (if any specified)
Output format requirements (exact vs flexible)
Error handling expectations (if any specified)
Edge cases addressed (explicitly or by example)
Processing order requirements
Default values specified

Step 3: Analyze Each Test

For each test function or test case:

A. Identify the Assertion

What specific behavior does this test check?

Return values
Output format/content
Side effects
Error conditions
State changes

B. Find Spec Support

Can you point to a specific spec line that mandates this behavior?

SUPPORTED: Spec says "must return error code 1" → test checks returncode == 1 UNSUPPORTED: Spec says "must return non-zero on error" → test checks returncode == 1

C. Classify the Assumption

Category	Example	Risk Level
Explicit	Spec says X, test checks X	None
Reasonable Inference	Spec implies X, test checks X	Low
Implementation Choice	Spec silent, test assumes X	HIGH
Contradiction	Spec says X, test checks Y	CRITICAL

Step 4: Report Findings

Generate a report in this format:

# Ambiguity Report: {problem} {checkpoint}

## Summary
- Total tests analyzed: N
- Potentially ambiguous: N
- Likely spec issues: N

## Ambiguous Test Assumptions

### 1. `test_function_name` (line X)

**Assertion**: [What the test checks]

**Spec says**: [Quote or "NOTHING - spec is silent"]

**Risk**: [HIGH/MEDIUM/LOW]

**Analysis**: [Why this is problematic]

**Recommendation**:
- [ ] Clarify in spec: [suggested text]
- [ ] OR relax test: [how to make less specific]
- [ ] OR remove test: [if testing undefined behavior]

---

### 2. `test_another_function` (line Y)
...

## Spec Gaps Identified

List behaviors tested but not defined in spec:
1. [Behavior] - should spec define this?
2. ...

## Recommendations

### For Spec Authors
- [Suggested clarifications]

### For Test Authors
- [Suggested test modifications]

Common Ambiguity Patterns

Error Handling

Ambiguous: Spec says "return error" but test expects specific code/message
Check: Does spec define error codes? Error message format?

Ordering

Ambiguous: Spec says "process items" but test expects specific order
Check: Does spec mandate sorting? Stable ordering?

Whitespace/Formatting

Ambiguous: Spec shows example output, test requires exact formatting
Check: Does spec say "must match exactly" or just "must contain"?

Empty/Null Inputs

Ambiguous: Test expects specific behavior for empty input
Check: Does spec address empty/null cases?

Boundary Values

Ambiguous: Test checks behavior at 0, max, or edge values
Check: Does spec define boundary behavior?

Field Ordering in JSON/YAML

Ambiguous: Test expects fields in specific order
Check: Does spec mandate field ordering?

Case Sensitivity

Ambiguous: Test expects case-sensitive/insensitive matching
Check: Does spec specify case handling?

Red Flags in Tests

Watch for these patterns that often indicate ambiguity:

# Specific error code not in spec
assert result.returncode == 42

# Exact error message not in spec
assert "invalid format" in result.stderr

# Specific field ordering
assert json.dumps(output) == '{"a":1,"b":2}'

# Implicit sorting expectations
assert results == sorted(results)

# Default value assumptions
assert config.get("timeout") == 30

# Specific exception types
with pytest.raises(ValueError):

Example Analysis

Given spec text:

Return non-zero exit code on invalid input.

And test:

def test_invalid_input():
    result = subprocess.run(...)
    assert result.returncode == 1  # <-- AMBIGUOUS!

Finding: Test expects exit code 1, but spec only says "non-zero". Valid implementations could return 2, 127, or any non-zero value.

Recommendation: Either:

Clarify spec: "Return exit code 1 on invalid input"
Relax test: assert result.returncode != 0

Recommended Tools for Lenient Tests

Fix preference: Make tests lenient > Remove tests >> Update spec

Many ambiguity issues can be solved by using flexible comparison tools instead of exact matching.

deepdiff - Flexible Object Comparison

from deepdiff import DeepDiff

# Instead of exact matching:
assert actual == expected  # FAILS on order, float precision, extra fields

# Use deepdiff to ignore irrelevant differences:
diff = DeepDiff(expected, actual,
    ignore_order=True,           # List/dict order doesn't matter
    significant_digits=5,         # Float precision tolerance
    exclude_paths=["root['id']"]  # Ignore fields not in spec
)
assert not diff, f"Meaningful differences: {diff}"

jsonschema - Structure Validation

from jsonschema import validate

# Instead of checking exact values:
assert output == {"status": "ok", "count": 5}  # Too strict!

# Validate structure matches spec requirements:
schema = {
    "type": "object",
    "required": ["status", "count"],  # Only what spec requires
    "properties": {
        "status": {"enum": ["ok", "error"]},
        "count": {"type": "integer", "minimum": 0}
    }
}
validate(instance=output, schema=schema)  # Accepts any valid structure

Normalization - Handle Formatting Differences

import json

def normalize(obj):
    """Normalize for comparison: sort keys, strip whitespace, lowercase strings."""
    if isinstance(obj, dict):
        return {k: normalize(v) for k, v in sorted(obj.items())}
    if isinstance(obj, list):
        return sorted([normalize(x) for x in obj], key=lambda x: json.dumps(x, sort_keys=True))
    if isinstance(obj, str):
        return obj.strip().lower()
    return obj

# Instead of exact string/JSON comparison:
assert json.dumps(actual) == json.dumps(expected)  # Fails on key order!

# Normalize first:
assert normalize(actual) == normalize(expected)

Combined Approach

def flexible_assert(actual, expected, ignore_fields=None):
    """Compare with maximum leniency for spec compliance."""
    diff = DeepDiff(
        normalize(expected),
        normalize(actual),
        ignore_order=True,
        exclude_paths=ignore_fields or [],
        significant_digits=5,
        ignore_string_case=True
    )
    assert not diff, f"Spec-relevant differences: {diff}"

# Usage: Only fails on meaningful differences
flexible_assert(actual, expected, ignore_fields=["root['timestamp']"])

When to Use Each Tool

Problem	Solution
Order doesn't match	`deepdiff` with `ignore_order=True`
Float precision differs	`deepdiff` with `significant_digits=N`
Extra fields in output	`deepdiff` with `exclude_paths`
Only structure matters	`jsonschema` validation
Whitespace/case differs	`normalize()` before comparing
Multiple issues	Combine all techniques

After Analysis

Summarize findings to the user with:

High-risk ambiguities that likely cause false failures
Spec gaps that should be addressed
Test modifications to reduce brittleness
Questions for spec authors to clarify intent

test-ambiguity-detector

$ Installer