Disaster Recovery 101: Architecting Auto-Failovers for DynamoDB and WooCommerce Deployments on Linode

Establishing Cross-Region Replication for DynamoDB

For critical applications like WooCommerce, ensuring data durability and availability across geographically dispersed regions is paramount. DynamoDB’s Global Tables feature provides a robust, multi-active, multi-region database solution. This isn’t just about read replicas; it’s about active-active replication where writes in one region are automatically propagated to all other replicas.

The primary mechanism for achieving this is by enabling Global Tables on your existing DynamoDB table. This process involves creating replica tables in your desired secondary regions and then associating them with the primary table to form a Global Table.

Enabling Global Tables via AWS CLI

While the AWS Management Console offers a visual way to set this up, programmatic configuration via the AWS CLI is essential for automation and Infrastructure as Code (IaC) practices. We’ll assume you have a DynamoDB table named woocommerce_products in your primary region (e.g., us-east-1).

First, create the replica table in the secondary region (e.g., eu-west-1). The table schema (partition key, sort key, indexes) must be identical. The provisioned throughput settings can differ per region, allowing for cost optimization or performance tuning based on local traffic patterns.

aws dynamodb create-table \
    --table-name woocommerce_products \
    --attribute-definitions \
        AttributeName=product_id,AttributeType=S \
    --key-schema \
        AttributeName=product_id,KeyType=HASH \
    --provisioned-throughput \
        ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --region eu-west-1 \
    --billing-mode PROVISIONED

Once the replica table is created and active, you can add it to the Global Table. This command associates the replica table in eu-west-1 with the Global Table originating from us-east-1.

aws dynamodb update-table \
    --table-name woocommerce_products \
    --replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]' \
    --region us-east-1

Repeat this process for any additional regions you wish to include in your Global Table. DynamoDB will then automatically handle the replication of data changes between all associated replica tables.

Architecting Auto-Failover for WooCommerce Application Servers

For the application layer, particularly a WooCommerce deployment running on Linode, an automated failover strategy typically involves a load balancer and multiple instances distributed across availability zones or even regions. Linode’s NodeBalancers are a key component here.

Leveraging Linode NodeBalancers and Health Checks

A robust auto-failover setup requires health checks to be configured on the NodeBalancer. These checks periodically probe your application servers to determine their availability and responsiveness. If a server fails its health check, the NodeBalancer will automatically stop sending traffic to it.

Consider a scenario with two Linode instances (app-01 and app-02) in different availability zones within the same region, behind a Linode NodeBalancer. The NodeBalancer will be configured to listen on port 80/443 and forward traffic to these instances on their respective HTTP ports (e.g., 8080).

A typical health check configuration for a WooCommerce application might involve checking a specific endpoint that returns a 200 OK status code if the application is healthy. This endpoint should ideally perform a minimal check, such as verifying database connectivity or a core application function.

Example Health Check Endpoint (PHP)

<?php
// healthcheck.php

// Basic check: Ensure database connection can be established
$db_host = getenv('DB_HOST');
$db_name = getenv('DB_NAME');
$db_user = getenv('DB_USER');
$db_pass = getenv('DB_PASS');

try {
    $pdo = new PDO("mysql:host=$db_host;dbname=$db_name", $db_user, $db_pass, [
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_TIMEOUT => 2 // 2-second timeout
    ]);
    // Optional: Perform a simple query
    $stmt = $pdo->query("SELECT 1");
    if ($stmt === false) {
        throw new Exception("Database query failed.");
    }
    http_response_code(200);
    echo "OK";
} catch (PDOException $e) {
    http_response_code(503); // Service Unavailable
    error_log("Database health check failed: " . $e->getMessage());
    echo "Database Error";
} catch (Exception $e) {
    http_response_code(503);
    error_log("General health check failed: " . $e->getMessage());
    echo "Application Error";
}
?>

This script should be placed in a publicly accessible but non-critical directory within your WooCommerce installation (e.g., /var/www/html/your-site/healthcheck.php). The NodeBalancer would then be configured to probe http://your-server-ip:8080/healthcheck.php.

Automated Instance Provisioning and Configuration

For true automation, especially in a multi-region failover scenario, you’ll need a mechanism to provision new instances in a secondary region if the primary region becomes unavailable. This can be achieved using tools like Terraform or Ansible, orchestrated by a monitoring system or a CI/CD pipeline.

When a catastrophic failure is detected (e.g., entire Linode region outage), a pre-defined script or playbook can be triggered. This script would:

Initiate the creation of new Linode instances in a healthy region.
Deploy the WooCommerce application code (e.g., from a Git repository).
Configure the application to connect to the cross-region DynamoDB replica.
Register the new instances with the Linode NodeBalancer in the target region.

Example Ansible Playbook Snippet for Instance Provisioning

---
- name: Provision WooCommerce instance in secondary region
  hosts: localhost
  connection: local
  gather_facts: no
  vars:
    region: "eu-west-1"
    instance_type: "g6-nanode-1" # Example instance type
    image: "linode/ubuntu22.04"
    ssh_key_id: "your_ssh_key_id" # Replace with your SSH Key ID
    nodebalancer_id: "your_nodebalancer_id" # Replace with your NodeBalancer ID
    app_domain: "your-woocommerce-domain.com"

  tasks:
    - name: Create Linode instance
      community.general.linode:
        state: present
        region: "{{ region }}"
        type: "{{ instance_type }}"
        image: "{{ image }}"
        label: "woocommerce-app-{{ lookup('pipe', 'date +%Y%m%d%H%M%S') }}"
        ssh_keys:
          - "{{ ssh_key_id }}"
      register: new_instance

    - name: Wait for instance to be ready
      wait_for:
        host: "{{ new_instance.instance.ipv4[0] }}"
        port: 22
        state: started
        delay: 30
        timeout: 300

    - name: Add new instance to inventory
      add_host:
        hostname: "{{ new_instance.instance.ipv4[0] }}"
        groupname: "new_woocommerce_servers"
        ansible_user: "root" # Or your deployment user

    - name: Deploy WooCommerce application
      include_role:
        name: deploy_woocommerce
      vars:
        target_host: "{{ new_instance.instance.ipv4[0] }}"
        db_host: "your-dynamodb-endpoint.amazonaws.com" # DynamoDB Global Table endpoint
        db_name: "woocommerce_products" # Or your DynamoDB table name
        # Other necessary variables for deployment

    - name: Add new instance to NodeBalancer backend
      community.general.linode_nodebalancer_node:
        state: present
        nodebalancer_id: "{{ nodebalancer_id }}"
        label: "{{ new_instance.instance.label }}"
        address: "{{ new_instance.instance.ipv4[0] }}"
        port: 8080 # Application port
        protocol: "http"
        weight: 100
        check: "http"
        check_path: "/healthcheck.php"
        check_interval: 15
        check_timeout: 5
      when: new_instance.instance.ipv4 is defined and new_instance.instance.ipv4 | length > 0

This playbook snippet demonstrates creating a Linode instance, waiting for SSH access, adding it to a temporary Ansible group, deploying the application (assuming a role named deploy_woocommerce exists), and finally registering it with a NodeBalancer. The db_host would point to the DynamoDB Global Table endpoint, ensuring the new instance connects to the replicated data.

Implementing DNS-Based Failover with Linode DNS Manager

While NodeBalancers handle load balancing and health checks within a region, a full disaster recovery strategy often requires a mechanism to redirect traffic at the DNS level if an entire region becomes unavailable. Linode DNS Manager can be configured to support this.

Geo-Targeted DNS Records and Failover

You can create multiple A records for your primary domain (e.g., shop.yourcompany.com), each pointing to the IP address of a Linode NodeBalancer in a different region. Linode DNS allows for geo-targeting, meaning DNS queries from specific geographic locations will be resolved to the IP address of the NodeBalancer closest to that location.

For automated failover, you can leverage Linode’s API to update DNS records programmatically. A monitoring service (external to your Linode infrastructure) can periodically check the health of your primary region. If the primary region is unresponsive, the monitoring service can trigger an API call to update the DNS records to point to the secondary region’s NodeBalancer IP.

Example DNS Update Script (Bash with Linode API v4)

#!/bin/bash

# Configuration
LINODE_API_TOKEN="YOUR_LINODE_API_TOKEN" # Replace with your actual token
DOMAIN_ID="YOUR_DOMAIN_ID"             # Find this via API or DNS Manager
RECORD_ID_PRIMARY="YOUR_PRIMARY_RECORD_ID" # A record for the primary region
RECORD_ID_SECONDARY="YOUR_SECONDARY_RECORD_ID" # A record for the secondary region
PRIMARY_NODEBALANCER_IP="PRIMARY_NODEBALANCER_IP_ADDRESS"
SECONDARY_NODEBALANCER_IP="SECONDARY_NODEBALANCER_IP_ADDRESS"
HEALTH_CHECK_URL="http://your-primary-region-health-endpoint.com/status" # External health check

# Check health of the primary region
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_CHECK_URL)

if [ "$HTTP_STATUS" -ne 200 ]; then
    echo "Primary region is unhealthy ($HTTP_STATUS). Initiating failover..."

    # Update DNS to point to secondary NodeBalancer
    curl -X PUT "https://api.linode.com/v4/domains/$DOMAIN_ID/records/$RECORD_ID_PRIMARY" \
        -H "Authorization: Bearer $LINODE_API_TOKEN" \
        -H "Content-Type: application/json" \
        -d "{\"data\": \"$SECONDARY_NODEBALANCER_IP\", \"type\": \"A\", \"name\": \"shop\", \"ttl_sec\": 300}"

    # Optionally, update the secondary record to point to primary if it's not already
    # This is useful if you want to failback automatically
    # curl -X PUT "https://api.linode.com/v4/domains/$DOMAIN_ID/records/$RECORD_ID_SECONDARY" \
    #     -H "Authorization: Bearer $LINODE_API_TOKEN" \
    #     -H "Content-Type: application/json" \
    #     -d "{\"data\": \"$PRIMARY_NODEBALANCER_IP\", \"type\": \"A\", \"name\": \"shop-backup\", \"ttl_sec\": 300}"

    echo "DNS records updated. Traffic should now be routed to the secondary region."
else
    echo "Primary region is healthy. No failover needed."
fi

This script uses curl to interact with the Linode API v4. It checks an external health endpoint. If the endpoint returns a non-200 status code, it updates the primary DNS A record to point to the secondary NodeBalancer’s IP address. The name field in the API call should match the subdomain (e.g., “shop” for shop.yourcompany.com). The ttl_sec should be set appropriately to balance failover speed with DNS caching.

Orchestrating the Complete Failover Workflow

A comprehensive disaster recovery strategy for a WooCommerce deployment on Linode involves integrating these components: DynamoDB Global Tables for data, Linode NodeBalancers with health checks for application availability within a region, automated instance provisioning for recovery, and DNS-based failover for global traffic redirection.

The workflow during a disaster event would look like this:

Monitoring: An external monitoring service (e.g., UptimeRobot, Pingdom, or a custom solution) continuously checks the health of the primary region’s public endpoint and critical services.
Detection: If the primary region becomes unresponsive for a defined period, the monitoring service triggers an alert.
Automated DNS Update: The alert triggers the DNS failover script (like the Bash example above), which updates Linode DNS records to direct traffic to the secondary region’s NodeBalancer.
Automated Instance Provisioning (if needed): Concurrently or subsequently, an orchestration tool (Terraform/Ansible) is invoked to provision new application instances in the secondary region, ensuring capacity.
Data Consistency: DynamoDB Global Tables ensure that data written in the secondary region (or during the failover period) is replicated across all regions once the primary region recovers.
Failback: A similar process, potentially manual or automated, is used to switch traffic back to the primary region once it’s confirmed to be healthy and stable.

This layered approach ensures that both data and application availability are maintained with minimal downtime, providing a resilient WooCommerce deployment.