Marketplace

usability-testing

Design and plan usability tests - task creation, think-aloud protocols, moderator scripts, metrics definition, and analysis frameworks.

allowed_tools: Read, Glob, Grep, Task

$ Installer

git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/ux-research/skills/usability-testing ~/.claude/skills/claude-code-plugins

// tip: Run this command in your terminal to install the skill


name: usability-testing description: Design and plan usability tests - task creation, think-aloud protocols, moderator scripts, metrics definition, and analysis frameworks. allowed-tools: Read, Glob, Grep, Task

Usability Testing

Design and execute usability tests to evaluate how well users can accomplish tasks with your product.

When to Use This Skill

Use this skill when:

  • Usability Testing tasks - Working on design and plan usability tests - task creation, think-aloud protocols, moderator scripts, metrics definition, and analysis frameworks
  • Planning or design - Need guidance on Usability Testing approaches
  • Best practices - Want to follow established patterns and standards

MANDATORY: Skill Loading First

Before answering ANY usability testing question:

  1. Use established usability testing methodology sources
  2. Base all guidance on validated UX research practices

Usability Testing Types

Moderated vs Unmoderated

AspectModeratedUnmoderated
FacilitatorPresent, guides sessionAbsent, automated
DepthDeep insights, probingSurface-level, task-focused
Sample Size5-8 typical20-100+ typical
CostHigher (facilitator time)Lower (scale)
TurnaroundDays-weeksHours-days
Best ForComplex flows, discoveryValidation, benchmarking

Testing Formats

FormatDescriptionWhen to Use
In-Person ModeratedFace-to-face, controlled environmentHigh-fidelity prototypes, sensitive topics
Remote ModeratedVideo call, screen shareGeographic diversity, convenience
Remote UnmoderatedRecorded tasks, no moderatorScale, quick validation
Guerrilla TestingQuick tests in public spacesEarly concepts, low budget
First-Click TestingWhere users click firstNavigation, labeling
5-Second TestFirst impressionsVisual hierarchy, messaging
Benchmark TestingRepeated measurementTracking improvements

Task Design

Task Writing Principles

Good tasks are:

  • Realistic: Based on actual user goals
  • Actionable: Have a clear completion state
  • Specific enough: Provide necessary context
  • Open enough: Don't dictate path
  • Measurable: Success can be defined

Task Structure Template

## Task [N]: [Short Name]

**Scenario:**
[Context setting - why user is doing this]

**Task:**
[What to accomplish - goal, not steps]

**Success Criteria:**
- [ ] [Observable completion indicator]
- [ ] [Secondary success measure if applicable]

**Metrics:**
- Task success (binary or graded)
- Time on task
- Errors/assists needed
- Satisfaction rating

**Probes (Moderated):**
- What are you thinking right now?
- What did you expect to happen?
- What would you do next?

Task Examples

Poor Task: "Click the hamburger menu, then click Settings, then Privacy, then change your notification preferences."

Good Task: "You've been receiving too many email notifications from this app. Find where you can change your notification settings."

Poor Task: "Test the checkout flow."

Good Task: "You've been shopping for a birthday gift and found a book you want to purchase. Complete your purchase using the credit card already saved in your account."

Task Difficulty Progression

Structure tasks from easy to difficult:

  1. Warm-up task - Simple, builds confidence
  2. Primary tasks - Core functionality being tested
  3. Secondary tasks - Related but less critical
  4. Stretch tasks - Edge cases, advanced features

Think-Aloud Protocol

Concurrent Think-Aloud (CTA)

Participant verbalizes thoughts while performing tasks.

Moderator Prompts:

  • "Keep talking..."
  • "What are you looking at?"
  • "What's going through your mind?"
  • "What do you expect to happen?"

Advantages:

  • Real-time insight into thought process
  • Natural capture of confusion points
  • Rich qualitative data

Disadvantages:

  • May slow task completion
  • Unnatural for some participants
  • Can affect behavior

Retrospective Think-Aloud (RTA)

Participant reviews recording and explains thoughts after.

Moderator Approach:

  • Play back recording
  • Pause at key moments
  • Ask "What were you thinking here?"

Advantages:

  • Doesn't interfere with task
  • More accurate time measurements
  • Good for complex/fast tasks

Disadvantages:

  • Memory decay
  • Post-hoc rationalization
  • Longer sessions

Hybrid Approach

  • Silent task completion
  • Brief in-task probes at natural pauses
  • Post-task debrief questions

Moderator Script

Session Structure

# Usability Test Script: [Product/Feature]

## Pre-Session Setup (10 min before)
- [ ] Test recording equipment
- [ ] Verify prototype/product works
- [ ] Review participant background
- [ ] Prepare task cards/materials
- [ ] Set up note-taking template

## Introduction (5 min)

"Hello [Name], thank you for helping us today. I'm [Researcher] and
I'll be guiding our session. [Observer] is taking notes.

We're testing [product], not you. There are no wrong answers or
mistakes—everything you do helps us improve the design.

I'll ask you to complete some tasks and think out loud as you go.
Please share whatever comes to mind—your thoughts, reactions,
questions, even frustrations. There's no need to be polite about
problems you encounter.

We're recording the session to help with our notes. The recording
is confidential and only for our team.

Do you have any questions before we begin?"

## Warm-up Questions (3 min)
- Tell me briefly about your role and typical day.
- How familiar are you with [product category]?
- What tools do you currently use for [relevant activity]?

## Task Introduction
"I'm going to give you a series of tasks. I'll read each task aloud
and you'll also have it written down. Please read it back to me so
I know we're on the same page.

Remember to think out loud. If you get stuck or would normally give
up, just let me know—we can move on. Ready?"

## Tasks

### Task 1: [Name] (Warm-up)
[Read task aloud, hand written card]

**Observer Notes:**
- Start time: ___
- End time: ___
- Success: [ ] Complete [ ] Partial [ ] Fail
- Errors: ___
- Assists: ___
- Path taken: ___

**Post-task Questions:**
- How difficult was that task? (1-7 scale)
- What, if anything, was confusing?

### Task 2: [Name] (Primary)
[Continue pattern...]

### Task 3: [Name] (Primary)
...

## Post-Test Questions (5 min)
- What stood out to you about this experience?
- What was the most frustrating part?
- What was the most satisfying part?
- How does this compare to [competitor/current solution]?
- Would you recommend this to a colleague? Why/why not?

## SUS Questionnaire (3 min)
[Administer System Usability Scale]

## Wrap-up (2 min)
"Thank you so much for your time and feedback. Your insights will
directly influence how we improve [product].

Do you have any final questions for me?

[Explain incentive process]"

## Post-Session
- [ ] Save recording
- [ ] Complete observer notes
- [ ] Note immediate impressions
- [ ] Highlight key quotes/moments
- [ ] Debrief with observers

Usability Metrics

Task-Level Metrics

MetricDefinitionMeasurement
Task SuccessCompleted successfullyBinary (0/1) or Graded (0/0.5/1)
Time on TaskDuration to completeSeconds/minutes
ErrorsMistakes madeCount
AssistsHelp requestsCount
LostnessNavigation efficiency(N/S) - (S/N) where N=nodes visited, S=minimum
First ClickCorrect initial actionBinary

Study-Level Metrics

MetricFormulaBenchmark
Task Success RateSuccesses / Attempts78% average (Sauro)
Average TimeSum(times) / nTask-dependent
Error RateErrors / TasksLower = better
SUS ScoreStandardized formula68 = average
SEQ (Single Ease)7-point post-task5.5 = average
SUPR-QWebsite UX benchmarkPercentile rank

System Usability Scale (SUS)

// SUS Score calculation
public class SusCalculator
{
    private static readonly string[] Questions =
    [
        "I think that I would like to use this system frequently.",
        "I found the system unnecessarily complex.",
        "I thought the system was easy to use.",
        "I think that I would need the support of a technical person.",
        "I found the various functions well integrated.",
        "I thought there was too much inconsistency.",
        "I imagine most people would learn to use quickly.",
        "I found the system very cumbersome to use.",
        "I felt very confident using the system.",
        "I needed to learn a lot before I could get going."
    ];

    public decimal Calculate(int[] responses)
    {
        if (responses.Length != 10)
            throw new ArgumentException("SUS requires exactly 10 responses");

        // Responses are 1-5 (Strongly Disagree to Strongly Agree)
        decimal score = 0;

        for (int i = 0; i < 10; i++)
        {
            // Odd questions (1,3,5,7,9): score = response - 1
            // Even questions (2,4,6,8,10): score = 5 - response
            score += i % 2 == 0
                ? responses[i] - 1
                : 5 - responses[i];
        }

        // Multiply by 2.5 to get 0-100 scale
        return score * 2.5m;
    }

    public SusInterpretation Interpret(decimal score) => score switch
    {
        >= 85 => SusInterpretation.Excellent,    // Top 10%
        >= 72 => SusInterpretation.Good,         // Top 30%
        >= 68 => SusInterpretation.Average,      // Median
        >= 51 => SusInterpretation.BelowAverage,
        _ => SusInterpretation.Poor
    };
}

Task Success Grading

public enum TaskSuccessLevel
{
    Complete = 100,      // Completed without assistance
    PartialMinor = 75,   // Completed with minor struggle
    PartialMajor = 50,   // Completed with significant difficulty
    Assisted = 25,       // Required moderator hint
    Failure = 0          // Could not complete
}

public class TaskResult
{
    public required Guid TaskId { get; init; }
    public required Guid ParticipantId { get; init; }
    public required TaskSuccessLevel Success { get; init; }
    public required TimeSpan Duration { get; init; }
    public required int ErrorCount { get; init; }
    public required int AssistCount { get; init; }
    public required int SingleEaseQuestion { get; init; } // 1-7 scale
    public string? Notes { get; init; }
    public List<string> ClickPath { get; init; } = [];
}

Remote Unmoderated Testing

Platform Setup Considerations

// Configuration for unmoderated test
public class UnmoderatedTestConfig
{
    public required string TestName { get; init; }
    public required string WelcomeMessage { get; init; }
    public required List<ScreenerQuestion> Screener { get; init; }
    public required List<UnmoderatedTask> Tasks { get; init; }
    public required List<PostTestQuestion> PostQuestions { get; init; }

    public TestSettings Settings { get; init; } = new()
    {
        RecordScreen = true,
        RecordAudio = true,
        RecordWebcam = false,
        RequireThinkAloud = true,
        MaxTestDuration = TimeSpan.FromMinutes(30),
        AllowTaskSkip = true
    };
}

public class UnmoderatedTask
{
    public required int Order { get; init; }
    public required string Scenario { get; init; }
    public required string TaskInstructions { get; init; }
    public required string PrototypeUrl { get; init; }
    public TimeSpan? TimeLimit { get; init; }
    public bool RequireRecording { get; init; } = true;
    public List<PostTaskQuestion> FollowUp { get; init; } = [];
}

Unmoderated Task Writing Tips

  • More specific than moderated (no moderator to clarify)
  • Include context in the scenario itself
  • Define clear endpoints (how do they know they're done?)
  • Provide escape hatch ("If you can't complete this, click 'I'm stuck'")

Analysis Framework

Session Analysis Template

# Session Analysis: P[N]

**Participant:** [ID/Code]
**Date:** [Date]
**Duration:** [Time]

## Task Performance

| Task | Success | Time | Errors | Assists | SEQ |
|------|---------|------|--------|---------|-----|
| T1 | ✓ | 1:23 | 0 | 0 | 6 |
| T2 | ~ | 3:45 | 2 | 1 | 4 |
| T3 | ✗ | 5:00+ | 3 | - | 2 |

## Key Observations

### Positive
- [What worked well]

### Issues Found
1. **[Issue Name]** - Severity: [Critical/Major/Minor]
   - Location: [Where in interface]
   - Behavior: [What user did]
   - Quote: "[Participant verbalization]"
   - Impact: [Effect on task]

### Notable Quotes
- "[Quote]" - Re: [Topic]

## Recommendations
- [Immediate action]
- [Design consideration]

Severity Rating Scale

SeverityDefinitionAction
Critical (4)Prevents task completionMust fix before launch
Major (3)Causes significant difficultyShould fix before launch
Minor (2)Causes slight hesitationFix if possible
Cosmetic (1)Noted but didn't affect taskConsider for future

Rainbow Spreadsheet

Track issues across participants:

| Issue | P1 | P2 | P3 | P4 | P5 | Count | Severity |
|-------|----|----|----|----|-------|-----|----------|
| Can't find settings | X | X |   | X |   | 3/5 | Major |
| Confusing label | X | X | X | X |   | 4/5 | Major |
| Slow load time |   | X |   |   | X | 2/5 | Minor |

.NET Test Management Model

public class UsabilityTest
{
    public Guid Id { get; init; }
    public required string Name { get; init; }
    public required UsabilityTestType Type { get; init; }
    public required string ProductVersion { get; init; }
    public required List<UsabilityTask> Tasks { get; init; }
    public required int TargetParticipants { get; init; }

    public List<TestSession> Sessions { get; } = [];
    public List<UsabilityIssue> Issues { get; } = [];

    public UsabilityTestMetrics CalculateMetrics()
    {
        var completedSessions = Sessions.Where(s => s.Status == SessionStatus.Completed);

        return new UsabilityTestMetrics
        {
            TotalParticipants = completedSessions.Count(),
            OverallSuccessRate = CalculateOverallSuccess(completedSessions),
            AverageSusScore = CalculateAverageSus(completedSessions),
            TaskMetrics = Tasks.Select(t => CalculateTaskMetrics(t, completedSessions)).ToList(),
            IssuesBySeverity = Issues.GroupBy(i => i.Severity)
                .ToDictionary(g => g.Key, g => g.Count())
        };
    }
}

public class UsabilityIssue
{
    public Guid Id { get; init; }
    public required string Title { get; init; }
    public required string Description { get; init; }
    public required IssueSeverity Severity { get; init; }
    public required string Location { get; init; }
    public required List<Guid> AffectedParticipants { get; init; }
    public string? Recommendation { get; init; }
    public IssueStatus Status { get; set; } = IssueStatus.Open;
}

public record UsabilityTestMetrics
{
    public required int TotalParticipants { get; init; }
    public required decimal OverallSuccessRate { get; init; }
    public required decimal AverageSusScore { get; init; }
    public required List<TaskMetrics> TaskMetrics { get; init; }
    public required Dictionary<IssueSeverity, int> IssuesBySeverity { get; init; }
}

Checklist: Usability Test Planning

Design Phase

  • Test objectives defined
  • Tasks written and reviewed
  • Success criteria established
  • Metrics selected
  • Moderator script drafted

Preparation

  • Prototype/product ready
  • Recording tools tested
  • Participant schedule set
  • Pilot test completed
  • Script refined from pilot

Execution

  • Consistent protocol across sessions
  • Notes captured during sessions
  • Issues logged as encountered
  • Daily debriefs conducted

Analysis

  • All sessions reviewed
  • Metrics calculated
  • Issues compiled and rated
  • Findings synthesized
  • Recommendations prioritized

Related Skills

  • user-research-planning - Overall research planning
  • heuristic-evaluation - Expert review methods
  • accessibility-planning - Inclusive testing practices
  • prototype-strategy - Prototype fidelity for testing

Last Updated: 2025-12-27