name: adversarial-code-review description: Review code through hostile perspectives to find bugs, security issues, and unintended consequences the author missed. Use when reviewing PRs, auditing codebases, or before critical deployments.

Adversarial Code Review

Core Principle: Review as if you're trying to break the code. Deliberately adopt hostile perspectives—each reveals issues the others miss.

This is not about finding fault. It's about finding problems before users do.

Review Mode

Mode	Trigger	Focus
Diff-Focused (default)	No explicit instruction, PR review	What changed? What could break?
Audit	"audit", "holistic", "codebase review"	Broader scope, systematic coverage

When in doubt, use diff-focused mode. Audit mode requires explicit request.

The Six Adversarial Lenses

Review through each lens deliberately. Don't blend them—switching perspectives forces deeper analysis.

Lens	Core Question	Reveals
Malicious User	"How would I exploit this?"	Input validation gaps, injection vectors, privilege escalation
Careless Colleague	"How would this break if used wrong?"	API misuse, unclear contracts, error handling gaps
Future Maintainer	"What will confuse me in 6 months?"	Implicit assumptions, missing context, temporal coupling
Ops/On-Call	"How will this fail at 3am?"	Observability gaps, recovery paths, failure modes
Data Integrity	"What happens to state?"	Race conditions, partial failures, consistency violations
Interaction Effects	"What does this change elsewhere?"	Unintended side effects, behavioral changes, contract breaks

Lens Details

Malicious User

Assume the user is actively trying to break or exploit the system.

What inputs are trusted that shouldn't be?
Can I escalate privileges or access unauthorized data?
What happens if I send malformed/oversized/unexpected input?
Are there injection points (SQL, XSS, command, path traversal)?

For deep security reviews, see references/security-lens-detail.md.

Careless Colleague

Assume another developer will use this code without reading documentation.

Is the API intuitive or are there "gotchas"?
What happens if methods are called in wrong order?
Are error messages helpful or cryptic?
Could someone misuse this and get silent wrong results?

Future Maintainer

Assume you'll revisit this code in 6 months with no memory of writing it.

Why does this code exist? Is that documented?
What assumptions are implicit that should be explicit?
Are there magic numbers/strings without explanation?
Would I understand the control flow on first read?

Ops/On-Call

Assume this will fail in production at the worst possible time.

How will I know when this fails? (Logging, metrics, alerts)
Can I diagnose the problem from logs alone?
Is there a recovery path? Can it be retried safely?
What's the blast radius if this fails?

Data Integrity

Assume multiple things will try to modify state simultaneously.

What happens if this runs twice concurrently?
Are there partial failure states that leave data inconsistent?
Is there a transaction boundary? What if it fails mid-way?
Are reads and writes properly synchronized?

Interaction Effects

Assume this change has consequences beyond its immediate scope.

What calls this? Will their expectations still hold?
What does this call? What assumptions are we making?
Does this subtly change behavior callers depend on?
Are there caches, indexes, or derived data that need updating?

Review Workflow

Copy this checklist when starting a review:

Adversarial Review Progress:
- [ ] Step 1: Determine mode (diff-focused or audit)
- [ ] Step 2: Understand the change/code purpose
- [ ] Step 3: Apply lenses (prioritize by risk, ~5 min each):
  - [ ] Malicious User
  - [ ] Careless Colleague
  - [ ] Future Maintainer
  - [ ] Ops/On-Call Engineer
  - [ ] Data Integrity
  - [ ] Interaction Effects
- [ ] Step 4: Filter findings through Impact Filter
- [ ] Step 5: Classify severity (Must Fix / Should Fix / Consider)
- [ ] Step 6: Limit "Consider" items to max 2
- [ ] Step 7: Identify at least one positive
- [ ] Step 8: Format report

Lens Prioritization

Not all lenses are equally important for all code. Prioritize:

Code Type	Priority Lenses
User input handling	Malicious User, Data Integrity
API/public interface	Careless Colleague, Interaction Effects
Background jobs	Ops/On-Call, Data Integrity
Business logic	Future Maintainer, Interaction Effects
Database operations	Data Integrity, Ops/On-Call

The Five Iron Laws

Severity matches actual risk, not theoretical worst-case
- Bad: "CRITICAL: This string could theoretically be used for XSS" (in internal CLI)
- Good: "LOW: Unescaped string—not currently risky but add escaping if this reaches browser"
Every "Must Fix" requires demonstration or clear reasoning
- Don't just assert the bug exists—show WHY it's a bug
Alternative suggestions are optional, not mandated
- Present options. Don't dictate implementation details.
Acknowledge at least one thing done well
- Adversarial doesn't mean hostile. Recognition builds trust.

Impact Filter

Every potential finding must pass this filter. Score 2+ to report:

□ Likely to occur (probability)
□ Impactful if it occurs (severity)
□ Non-obvious to the author (added value)

If a finding scores 0-1, don't report it. You're adding noise, not value.

Severity Tiers

Tier	Definition	Action	Examples
Must Fix	Breaks correctness, security, or data integrity	Block merge	SQL injection, race condition causing data loss, auth bypass
Should Fix	Likely problems but not immediately broken	Fix before or soon after merge	Missing error handling, unclear naming, no tests for edge case
Consider	Style, optimization, theoretical concerns	Max 2 per review	Could be more idiomatic, minor perf optimization

The "Consider" Trap

Limit "Consider" comments to 2 maximum. More than that:

Dilutes important feedback
Feels like nitpicking
Reduces trust in your reviews

If you have many "Consider" items, pick the 2 most valuable and drop the rest.

What NOT to Flag

Style preferences covered by linter/formatter — Automation handles this
Alternative implementations of equal merit — "I would have done X" isn't a bug
Hypothetical futures — "What if we need to support Y someday..." isn't actionable
Things you'd do differently but aren't wrong — Preferences aren't defects

The Test: "Would a reasonable senior engineer disagree with me here?"

If yes → Probably not worth commenting.

Reporting Format

Structure findings clearly:

## Summary

[1-2 sentence overview of the review]

### What's Done Well

- [Specific positive observation]

### Must Fix

#### [Issue Title]

**Location:** `file.ts:45-52`
**Lens:** [Which lens found this]
**Issue:** [Clear description of the problem]
**Impact:** [What happens if not fixed]
**Suggestion:** [Optional - how to fix]

### Should Fix

[Same format as Must Fix]

### Consider

[Brief bullet points only - max 2 items]

When to Escalate

Stop the review and escalate when:

Trigger	Action
Security-critical code (auth, crypto, payments)	See `references/security-lens-detail.md`, consider external review
3+ "Must Fix" issues found	Stop reviewing. Escalate for fundamental redesign discussion.
You don't understand the code	Don't guess. Request walkthrough before reviewing.
Architectural concerns	Flag for design discussion, don't try to "fix" in review

Audit Mode

When explicitly requested to audit (not just review changes):

Scope Definition

Before starting, clarify:

What areas/modules to focus on?
What's the primary concern? (Security? Performance? Maintainability?)
What's the time budget?

Sampling Strategy

For large codebases, don't review everything. Sample strategically:

High-risk areas first — Auth, payments, user input handling
Recently changed code — git log --since="3 months ago" --name-only
Complex code — High cyclomatic complexity, many dependencies
Code with no tests — Higher likelihood of hidden bugs

Audit Checklist Addition

Audit-Specific Steps:
- [ ] Define scope and primary concerns with requester
- [ ] Identify high-risk areas for focused review
- [ ] Sample strategically (don't boil the ocean)
- [ ] Track coverage (what was reviewed vs skipped)
- [ ] Note systemic patterns across multiple files

Edge Cases

For systematic edge case generation by input domain, see references/edge-case-domains.md.

Common Mistakes

Reviewing Without Understanding

Don't start reviewing until you understand:

What is this code supposed to do?
Why does this change exist?
What's the broader context?

Reviewing without understanding produces shallow, unhelpful feedback.

Lens Blending

Don't try to apply all lenses simultaneously. You'll miss things.

Do this:

Apply Malicious User lens → Note findings
Apply Careless Colleague lens → Note findings
Continue through remaining lenses

Not this:

"Let me look at this code and find all the issues"

Severity Inflation

Not everything is critical. Reserve "Must Fix" for actual blockers.

If everything is urgent, nothing is urgent.

Missing the Forest for Trees

After applying all lenses, step back:

Are there systemic patterns in the findings?
Is there a deeper design issue causing multiple symptoms?
Should this be a redesign conversation instead of a review?

Key Principle

The goal isn't to find as many issues as possible. It's to find the issues that matter before they reach users.

Quality over quantity. Impact over volume. Trust over thoroughness.

adversarial-code-review

$ 설치