Marketplace

ansible-error-handling

This skill should be used when implementing error handling in Ansible, using block/rescue/always patterns, creating retry logic with until/retries, handling expected failures gracefully, or providing clear error messages with assert and fail.

$ Installer

git clone https://github.com/basher83/lunar-claude /tmp/lunar-claude && cp -r /tmp/lunar-claude/plugins/infrastructure/ansible-workflows/skills/ansible-error-handling ~/.claude/skills/lunar-claude

// tip: Run this command in your terminal to install the skill


name: ansible-error-handling description: > This skill should be used when implementing error handling in Ansible, using block/rescue/always patterns, creating retry logic with until/retries, handling expected failures gracefully, or providing clear error messages with assert and fail.

Ansible Error Handling

Patterns for robust error handling in Ansible playbooks and roles.

Block/Rescue/Always Pattern

Handle errors and perform cleanup:

- name: Deploy application
  block:
    - name: Stop application
      ansible.builtin.systemd:
        name: myapp
        state: stopped

    - name: Deploy new version
      ansible.builtin.copy:
        src: myapp-v2.0
        dest: /usr/bin/myapp

    - name: Start application
      ansible.builtin.systemd:
        name: myapp
        state: started

  rescue:
    - name: Rollback to previous version
      ansible.builtin.copy:
        src: myapp-backup
        dest: /usr/bin/myapp

    - name: Start application (rollback)
      ansible.builtin.systemd:
        name: myapp
        state: started

    - name: Report failure
      ansible.builtin.fail:
        msg: "Deployment failed, rolled back to previous version"

  always:
    - name: Cleanup temp files
      ansible.builtin.file:
        path: /tmp/deploy-*
        state: absent

Execution Flow

  • block: Main tasks execute sequentially
  • rescue: Runs if ANY task in block fails
  • always: Runs regardless of success/failure

Retry with Until

Handle transient failures with retries:

- name: Wait for service to be ready
  ansible.builtin.uri:
    url: http://localhost:8080/health
    status_code: 200
  register: health_check
  until: health_check.status == 200
  retries: 30
  delay: 10
  # Total wait: up to 5 minutes (30 * 10s)

With Command Module

- name: Wait for cluster to stabilize
  ansible.builtin.command: pvecm status
  register: cluster_status
  until: "'Quorate: Yes' in cluster_status.stdout"
  retries: 12
  delay: 5
  changed_when: false

Retry Parameters

ParameterDescription
untilCondition that must be true to stop retrying
retriesMaximum number of attempts
delaySeconds between attempts

Assert for Validation

Validate inputs with clear error messages:

- name: Validate required variables
  ansible.builtin.assert:
    that:
      - vm_name is defined
      - vm_name | length > 0
      - vm_memory >= 1024
      - vm_cores >= 1
    fail_msg: |
      Invalid VM configuration:
      - vm_name: {{ vm_name | default('NOT SET') }}
      - vm_memory: {{ vm_memory | default('NOT SET') }} (min: 1024)
      - vm_cores: {{ vm_cores | default('NOT SET') }} (min: 1)
    success_msg: "VM configuration validated"
    quiet: true

Common Assertions

# Variable defined and non-empty
- vm_name is defined and vm_name | trim | length > 0

# Numeric range
- vm_memory >= 1024 and vm_memory <= 65536

# Regex match
- vm_name is match('^[a-z0-9-]+$')

# List has items
- vm_networks | length > 0

# Value in allowed list
- vm_ostype in ['l26', 'win10', 'win11']

Fail with Context

Provide actionable error messages:

- name: Check prerequisites
  ansible.builtin.command: which docker
  register: docker_check
  changed_when: false
  failed_when: false

- name: Fail if Docker not installed
  ansible.builtin.fail:
    msg: |
      Docker is not installed on {{ inventory_hostname }}.

      To install Docker:
        sudo apt update
        sudo apt install docker.io

      Or use the docker role:
        ansible-playbook playbooks/install-docker.yml
  when: docker_check.rc != 0

Graceful Failure Handling

Allow expected "failures":

- name: Try to stop service
  ansible.builtin.systemd:
    name: myservice
    state: stopped
  register: stop_result
  failed_when:
    - stop_result.failed
    - "'not found' not in stop_result.msg"
  # Only fail if error is NOT "service not found"

Multiple Acceptable Conditions

- name: Join cluster
  ansible.builtin.command: pvecm add {{ primary_node }}
  register: cluster_join
  failed_when:
    - cluster_join.rc != 0
    - "'already in a cluster' not in cluster_join.stderr"
    - "'cannot join' not in cluster_join.stderr"
  changed_when: cluster_join.rc == 0

Check Before Fail

Separate checking from failing for better control:

- name: Check if resource exists
  ansible.builtin.command: check-resource {{ resource_id }}
  register: resource_check
  changed_when: false
  failed_when: false  # Don't fail here

- name: Fail with context if missing
  ansible.builtin.fail:
    msg: |
      Resource {{ resource_id }} not found.
      Command output: {{ resource_check.stderr }}
      Hint: Ensure resource was created first.
  when: resource_check.rc != 0

Error Recovery Pattern

Attempt operation, handle specific errors:

- name: Attempt primary approach
  block:
    - name: Connect via primary endpoint
      ansible.builtin.uri:
        url: "https://{{ primary_host }}:8006/api2/json"
        validate_certs: true
      register: primary_result

  rescue:
    - name: Log primary failure
      ansible.builtin.debug:
        msg: "Primary endpoint failed: {{ primary_result.msg | default('unknown error') }}"

    - name: Try fallback endpoint
      ansible.builtin.uri:
        url: "https://{{ fallback_host }}:8006/api2/json"
        validate_certs: false
      register: fallback_result

Delegate Error Handling

Run checks from controller for better error context:

- name: Verify API endpoint from controller
  ansible.builtin.uri:
    url: "https://{{ inventory_hostname }}:8006/api2/json/version"
    validate_certs: false
  delegate_to: localhost
  register: api_check
  failed_when: false

- name: Report API status
  ansible.builtin.fail:
    msg: |
      Cannot reach Proxmox API on {{ inventory_hostname }}
      Status: {{ api_check.status | default('connection failed') }}
      Check: Network connectivity, firewall rules, pveproxy service
  when: api_check.status | default(0) != 200

Ignore Errors (Use Sparingly)

- name: Remove optional backup
  ansible.builtin.file:
    path: /backup/old-backup.tar.gz
    state: absent
  ignore_errors: true
  register: cleanup_result

- name: Report cleanup status
  ansible.builtin.debug:
    msg: "Cleanup {{ 'successful' if not cleanup_result.failed else 'skipped' }}"

When ignore_errors is Acceptable

  • Non-critical cleanup tasks
  • Optional operations that shouldn't block playbook
  • When the result is immediately checked anyway

Prefer failed_when

# BETTER than ignore_errors
- name: Remove backup
  ansible.builtin.file:
    path: /backup/old-backup.tar.gz
    state: absent
  register: cleanup_result
  failed_when:
    - cleanup_result.failed
    - "'does not exist' not in cleanup_result.msg | default('')"

Complete Example

---
- name: Deploy with comprehensive error handling
  hosts: app_servers
  become: true

  tasks:
    - name: Validate configuration
      ansible.builtin.assert:
        that:
          - app_version is defined
          - app_version is match('^\d+\.\d+\.\d+$')
        fail_msg: "Invalid app_version: {{ app_version | default('NOT SET') }}"

    - name: Deploy application
      block:
        - name: Download release
          ansible.builtin.get_url:
            url: "https://releases.example.com/{{ app_version }}.tar.gz"
            dest: /tmp/app.tar.gz
          register: download
          until: download is succeeded
          retries: 3
          delay: 5

        - name: Stop current version
          ansible.builtin.systemd:
            name: myapp
            state: stopped

        - name: Extract release
          ansible.builtin.unarchive:
            src: /tmp/app.tar.gz
            dest: /opt/myapp
            remote_src: true

        - name: Start new version
          ansible.builtin.systemd:
            name: myapp
            state: started

        - name: Verify health
          ansible.builtin.uri:
            url: http://localhost:8080/health
          register: health
          until: health.status == 200
          retries: 6
          delay: 10

      rescue:
        - name: Restore previous version
          ansible.builtin.copy:
            src: /opt/myapp-backup/
            dest: /opt/myapp/
            remote_src: true

        - name: Start previous version
          ansible.builtin.systemd:
            name: myapp
            state: started

        - name: Report deployment failure
          ansible.builtin.fail:
            msg: |
              Deployment of {{ app_version }} failed.
              Previous version restored.
              Check logs: journalctl -u myapp

      always:
        - name: Cleanup download
          ansible.builtin.file:
            path: /tmp/app.tar.gz
            state: absent

Additional Resources

For detailed error handling patterns and techniques, consult:

  • references/error-handling.md - Comprehensive error handling patterns, block/rescue/always examples, retry strategies

Related Skills

  • ansible-idempotency - changed_when/failed_when patterns
  • ansible-fundamentals - Core Ansible concepts

Repository

basher83
basher83
Author
basher83/lunar-claude/plugins/infrastructure/ansible-workflows/skills/ansible-error-handling
10
Stars
1
Forks
Updated2d ago
Added5d ago