test-driven-development

Use when writing tests or implementing code. Defines RED-GREEN-REFACTOR cycle and test execution workflow.

$ Instalar

git clone https://github.com/craigtkhill/atui-tools /tmp/atui-tools && cp -r /tmp/atui-tools/skills/test-driven-development ~/.claude/skills/atui-tools

// tip: Run this command in your terminal to install the skill


name: test-driven-development description: Use when writing tests or implementing code. Defines RED-GREEN-REFACTOR cycle and test execution workflow.

Test-Driven Development (TDD) Skill

This skill defines the proper TDD workflow: write test(s), run them (RED), implement minimal code (GREEN), run ALL tests (VERIFY), repeat.

Language-specific details:

  • For Python projects: See test-driven-development/PYTHON.md for pytest patterns, running tests, and Python-specific examples
  • Other languages: Add new files to test-driven-development/ directory as needed

TDD Cycle: RED โ†’ GREEN โ†’ REFACTOR

How Many Tests to Write at Once?

Write ONE test at a time when:

  • Design is uncertain or exploratory
  • Domain is complex and unfamiliar
  • You need to review and course correct after each step
  • Implementation approach is unclear

Write MULTIPLE tests upfront when:

  • Requirements are clear and well-understood
  • Domain is familiar
  • Design is straightforward
  • You're confident in the implementation approach

1. RED: Write Failing Test(s)

Write test(s) that fail.

Key principle: Each test must fail for the right reason (missing implementation, not syntax error).

CRITICAL - Run the tests - they MUST fail:

  • โŒ DO NOT write implementation before running the tests
  • โŒ DO NOT skip the RED phase
  • โœ… DO run the tests to verify they fail
  • Verify they fail with expected error (import error, assertion failure, etc.)
  • Expected outcome: RED (tests fail)

Why RED is mandatory:

  • Confirms the tests actually test something
  • Prevents false positives (tests that always pass)
  • Validates test setup is correct

2. GREEN: Write Minimal Implementation

Write the MINIMUM code to make the tests pass.

Key principle: Don't add features not yet tested. Don't over-engineer.

Run the tests again:

  • Run the tests to verify they now pass
  • Expected outcome: GREEN (tests pass)

3. VERIFY: Run ALL Tests

CRITICAL: After tests pass, run ALL tests to ensure nothing broke.

Key principle: Never break existing functionality.

Run complete test suite:

  • Run all tests in the project
  • Verify all tests pass (new and existing)
  • Expected outcome: All tests GREEN

4. REFACTOR: Improve Code Quality

After tests pass, consider refactoring to improve code quality.

Use the refactor skill for detailed refactoring guidance.

Key principle: Refactor only when tests are GREEN. Tests protect you during refactoring.

When to refactor:

  • Code duplication exists
  • Code is unclear or hard to understand
  • Better patterns or abstractions are apparent
  • Performance can be improved
  • Before implementing new features that may duplicate existing patterns

Refactoring safety:

  • Run ALL tests after each refactoring change
  • If any test fails, revert and try a different approach
  • Keep refactoring changes small and incremental

Note: Refactoring is optional on each cycle. You can skip if code is already clean.

5. REPEAT: Continue

After all tests pass, continue with next requirements and repeat the cycle.

TDD Rules

Rule 1: Update Spec as You Go

  • โœ… DO update SPEC.md to mark requirements as tested after tests pass
  • โœ… Keep spec in sync with implementation progress

Updating SPEC.md markers:

  • After writing unit test that passes: [O][O] โ†’ [U][O] (test exists, code pending)
  • After code passes unit test: [U][O] โ†’ [U][X] (test and code complete)
  • After writing acceptance test that passes: [O][O] โ†’ [A][O] (acceptance test exists, code pending)
  • After code passes acceptance test: [A][O] โ†’ [A][X] (acceptance test and code complete)
  • When acceptance test passes, mark related unit-tested features as implemented: [U][O] โ†’ [U][X]

Rule 2: Always Run Tests

CRITICAL: Tests MUST go through the RED โ†’ GREEN โ†’ VERIFY cycle:

  1. RED: Run tests to see them FAIL (before implementing)
  2. GREEN: Run tests to see them PASS (after implementing)
  3. VERIFY: Run ALL tests in the project to verify nothing broke

โŒ NEVER skip the RED phase - Always see tests fail before implementing

Why this matters:

  • RED phase confirms tests actually test something (prevents false positives)
  • GREEN phase confirms implementation works
  • VERIFY phase ensures no regression in existing functionality

Rule 3: Minimal Implementation

  • Write ONLY enough code to make the current test pass
  • Don't add features not yet tested
  • Don't over-engineer solutions
  • Prefer simple over clever

Rule 4: Test One Thing

  • Each test validates ONE behavior
  • One assertion per test (unless assertions are intrinsically coupled)
  • Clear test names describe what's being tested
  • Multiple tests enable better parallelization

Rule 5: Fail Fast

If all tests don't pass:

  • Stop immediately
  • Fix the broken tests
  • Don't continue until all tests GREEN

TDD Workflow Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Start: Pick requirements from spec             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. RED: Write test(s) for requirement(s)       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 2. Run tests                                   โ”‚
โ”‚    Expected: FAIL (RED)                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 3. GREEN: Write minimal implementation         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 4. Run tests                                   โ”‚
โ”‚    Expected: PASS (GREEN)                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 5. VERIFY: Run ALL tests                       โ”‚
โ”‚    Expected: All PASS                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
            All pass? โ”€โ”€Noโ”€โ”€> Fix broken tests
                 โ”‚
                Yes
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 6. REFACTOR: Improve code                      โ”‚
โ”‚    Run ALL tests after each change             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 7. More requirements? โ”€โ”€Yesโ”€โ”€> Repeat          โ”‚
โ”‚                        โ”‚                        โ”‚
โ”‚                       No                        โ”‚
โ”‚                        โ”‚                        โ”‚
โ”‚                        โ–ผ                        โ”‚
โ”‚                     Done                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Test Organization

Unit Tests vs Acceptance Tests vs Integration Tests

Unit Tests:

  • Test individual functions/classes in isolation
  • Fast, focused, and run frequently
  • Located within the feature directory
  • Should cover the vast majority of your testing needs
  • Marked with [U] in SPEC.md

Acceptance Tests:

  • Test complete features from a user/business perspective
  • Verify requirements are met end-to-end within a feature
  • Located within the feature directory
  • Test scenarios from the spec (Given/When/Then)
  • May involve multiple units working together within the same feature
  • Marked with [A] in SPEC.md
  • When to use:
    • Testing complete user workflows that require external APIs
    • Verifying business requirements are satisfied
    • Testing features that span multiple units within the feature
    • End-to-end validation of a feature

Integration Tests:

  • Test interactions between multiple modules or external systems
  • Slower, more complex, expensive to maintain
  • Located in tests/ directory at project root
  • Use sparingly and only for tactical purposes:
    • When unit tests cannot adequately verify the behavior
    • Testing interactions with external dependencies (databases, APIs, LLMs)
    • End-to-end workflows that span multiple modules
    • Verifying third-party library integration (e.g., Pydantic AI model introspection)

Test Hierarchy:

  1. Default to unit tests - fast, isolated, cover individual behaviors
  2. Use acceptance tests - when you need end-to-end feature validation
  3. Use integration tests sparingly - only for tactical external integration needs

Test Grouping

Organize tests by requirement sections:

  • Group related tests together
  • Use clear section/class names
  • Mirror the spec structure

Test Naming

Follow project conventions for test names:

  • Descriptive names that explain what's being tested
  • Include requirement reference in documentation
  • Use consistent naming pattern

Test Documentation

Each test should have:

  • Clear name describing behavior
  • Documentation linking to spec requirement
  • Given-When-Then structure

Integration with STDD Workflow

TDD is STEP 2 of the spec-test-driven development workflow:

  1. Write Spec (use spec skill)
  2. Write Tests โ† (use this tdd skill)
  3. Implement Code (minimal code to pass tests)

When writing tests:

  1. Reference spec requirements in test documentation
  2. Follow test organization from spec sections
  3. Write tests for requirements from the spec
  4. Decide whether to write one test at a time or multiple tests based on complexity and certainty (see "How Many Tests to Write at Once?" above)

Common Mistakes to Avoid

โŒ Not Running Tests Enough

  • Skipping the RED phase (not verifying test fails)
  • Skipping the GREEN phase (not verifying test passes)
  • Skipping the VERIFY phase (not running all tests)

โŒ Over-Implementing

  • Adding features not yet tested
  • Writing more code than needed to pass the test
  • Implementing based on assumptions rather than tests

โŒ Writing Bad Tests

  • Tests that don't fail when they should
  • Tests with multiple assertions (unless intrinsically coupled)
  • Tests that don't clearly document what they're testing

Checklist Before Moving Forward

Use this checklist for your TDD workflow in this exact order:

  1. Tests written with clear names and documentation
  2. Tests reference specific requirements from spec
  3. Tests run and FAILED (RED) โœ“ โ† DO NOT SKIP THIS
  4. Minimal implementation written (ONLY after seeing RED)
  5. Tests run and PASSED (GREEN) โœ“
  6. ALL tests run and PASSED (VERIFY) โœ“
  7. SPEC.md updated with test markers ([U][O] or [A][O]) โœ“
  8. SPEC.md updated with implementation markers ([U][X] or [A][X]) if code complete โœ“
  9. Code refactored if needed (optional)
  10. ALL tests still pass after refactoring
  11. No broken tests
  12. Ready to continue

Critical reminder: Steps 3-6 MUST happen in order:

  • RED (tests fail) โ†’ then implement โ†’ GREEN (tests pass) โ†’ then VERIFY (all tests pass)