Disaster Recovery 101: Architecting Auto-Failovers for DynamoDB and PHP Deployments on Linode

Establishing Multi-Region DynamoDB Replication

Automated failover for critical applications hinges on resilient data stores. For DynamoDB, this means leveraging its built-in global tables feature. This isn’t merely about backups; it’s about active-active replication across distinct AWS regions, enabling near-instantaneous read/write capabilities from any replica and providing the foundation for a seamless failover strategy. The setup is declarative and managed via the AWS CLI or SDKs. We’ll focus on the CLI for its directness in scripting.

First, ensure your DynamoDB table exists in your primary region. Let’s assume a table named user_profiles with a partition key user_id.

Creating the Global Table

To create a global table, you first need to enable DynamoDB Streams on your existing table. This stream captures item-level modifications. Then, you can create the global table, specifying the regions you want to replicate to. For this example, we’ll replicate from us-east-1 to eu-west-1.

Step 1: Enable DynamoDB Streams

aws dynamodb update-table --table-name user_profiles --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD --region us-east-1

Step 2: Create the Global Table Replica in a New Region

aws dynamodb create-global-table-replica --global-table-id arn:aws:dynamodb:us-east-1:123456789012:table/user_profiles --region-name eu-west-1

Replace arn:aws:dynamodb:us-east-1:123456789012:table/user_profiles with the actual ARN of your table. The --global-table-id parameter refers to the *primary* region’s table ARN. After this command, DynamoDB will provision the replica table in eu-west-1 and begin replicating data. You can monitor the status using aws dynamodb describe-global-table --region-name us-east-1.

Architecting PHP Application Failover on Linode

For our PHP application deployed on Linode, we’ll employ a multi-region strategy. This involves deploying identical application stacks in at least two Linode regions. The core of the failover mechanism will be a DNS-based approach, leveraging Linode’s DNS Manager and potentially a health check service.

Infrastructure Setup

Assume we have two identical Linode instances, one in us-east (e.g., Newark) and another in eu-central (e.g., Frankfurt). Each instance runs a standard LAMP/LEMP stack, with PHP connecting to its *local* DynamoDB replica. This local connection minimizes latency during normal operation.

Application Configuration:

Your PHP application’s database configuration must be dynamic. Instead of hardcoding endpoint URLs, use environment variables or a configuration file that can be updated during a failover event. For DynamoDB, the endpoint is region-specific. The AWS SDK for PHP handles this automatically if the region is correctly configured.

<?php
// config/database.php

return [
    'dynamodb' => [
        'region' => getenv('AWS_REGION') ?: 'us-east-1', // Default to primary region
        'version' => 'latest',
        'credentials' => [
            'key'    => getenv('AWS_ACCESS_KEY_ID'),
            'secret' => getenv('AWS_SECRET_ACCESS_KEY'),
        ],
    ],
];
?>

The application server’s environment variables (AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) will dictate which DynamoDB endpoint it connects to. During a failover, the AWS_REGION environment variable on the secondary region’s servers would be updated to point to the primary region’s DynamoDB endpoint, or vice-versa if failing back.

DNS Failover Strategy

We’ll use Linode’s DNS Manager to manage the primary A record for our application (e.g., app.yourdomain.com). This record will initially point to the IP address of the Linode instance in the primary region (us-east).

Step 1: Configure DNS Records in Linode DNS Manager

Create an A record for app.yourdomain.com pointing to the IP of your us-east Linode. Create a second A record for a health check subdomain, e.g., health.app.yourdomain.com, pointing to the IP of your eu-central Linode. This is a common pattern for active-passive DNS failover.

Step 2: Implement Health Checks

On each Linode instance, run a simple HTTP server that responds with a 200 OK status code if the application is healthy, and a non-200 status code (e.g., 503 Service Unavailable) if it’s unhealthy. This health check endpoint should verify connectivity to its local DynamoDB replica.

Example PHP health check script (/var/www/html/health.php):

<?php
require 'vendor/autoload.php'; // Assuming Composer is used

use Aws\DynamoDb\DynamoDbClient;
use Aws\Exception\AwsException;

// Load configuration
$config = require __DIR__ . '/../config/database.php';
$dbConfig = $config['dynamodb'];

// Set region from environment variable or default
$region = getenv('AWS_REGION') ?: $dbConfig['region'];

try {
    $dynamoDb = new DynamoDbClient([
        'region' => $region,
        'version' => $dbConfig['version'],
        'credentials' => $dbConfig['credentials'],
    ]);

    // Attempt a simple DynamoDB operation to check connectivity
    // e.g., DescribeTable for the user_profiles table
    $dynamoDb->describeTable(['TableName' => 'user_profiles']);

    // If no exception, the connection is good
    http_response_code(200);
    echo "OK";
} catch (AwsException $e) {
    // Log the error for debugging
    error_log("DynamoDB Health Check Failed: " . $e->getMessage());
    http_response_code(503);
    echo "Service Unavailable";
} catch (Exception $e) {
    error_log("General Health Check Error: " . $e->getMessage());
    http_response_code(503);
    echo "Service Unavailable";
}
?>

Ensure your web server (Nginx/Apache) is configured to serve this script and that the AWS_REGION environment variable is correctly set for the PHP process on each server.

Automating DNS Updates

The crucial part is automating the DNS record update when a failure is detected. This can be achieved using a monitoring service or a custom script that periodically checks the health endpoints and updates DNS via the Linode API.

Step 1: Obtain Linode API Credentials

Generate an API token from your Linode Cloud Manager account with sufficient permissions to manage DNS records.

Step 2: Create a Monitoring Script (Python Example)

This script will run on a separate, highly available monitoring server (or even a scheduled cron job on one of the Linode instances, though less ideal for true disaster recovery). It checks the health of both regions and updates the DNS A record accordingly.

import requests
import os
import json
import time

# --- Configuration ---
LINODE_API_TOKEN = os.environ.get("LINODE_API_TOKEN")
PRIMARY_REGION_IP = "YOUR_PRIMARY_LINODE_IP"  # e.g., 192.0.2.1
SECONDARY_REGION_IP = "YOUR_SECONDARY_LINODE_IP" # e.g., 198.51.100.1
PRIMARY_HEALTH_URL = f"http://{PRIMARY_REGION_IP}/health.php"
SECONDARY_HEALTH_URL = f"http://{SECONDARY_REGION_IP}/health.php"
DOMAIN_NAME = "app.yourdomain.com"
LINODE_ZONE_ID = "YOUR_LINODE_DNS_ZONE_ID" # Found in Linode DNS Manager URL or via API
RECORD_ID = "YOUR_APP_A_RECORD_ID" # The ID of the A record for app.yourdomain.com
CHECK_INTERVAL_SECONDS = 60
REQUEST_TIMEOUT = 5
# --- End Configuration ---

HEADERS = {
    "Authorization": f"Bearer {LINODE_API_TOKEN}",
    "Content-Type": "application/json"
}

def get_dns_record_id(domain, zone_id):
    """Fetches the ID of the A record for the given domain."""
    url = f"https://api.linode.com/v4/domains/{zone_id}/records"
    try:
        response = requests.get(url, headers=HEADERS)
        response.raise_for_status()
        data = response.json()
        for record in data.get("data", []):
            if record.get("type") == "A" and record.get("name") == domain:
                return record.get("id")
        print(f"Error: A record for {domain} not found in zone {zone_id}.")
        return None
    except requests.exceptions.RequestException as e:
        print(f"Error fetching DNS records: {e}")
        return None

def update_dns_record(zone_id, record_id, target_ip):
    """Updates a DNS A record with a new IP address."""
    url = f"https://api.linode.com/v4/domains/{zone_id}/records/{record_id}"
    payload = {
        "target": target_ip
    }
    try:
        response = requests.put(url, headers=HEADERS, data=json.dumps(payload))
        response.raise_for_status()
        print(f"Successfully updated DNS record {record_id} to {target_ip}")
        return True
    except requests.exceptions.RequestException as e:
        print(f"Error updating DNS record {record_id}: {e}")
        return False

def check_health(url):
    """Checks the health endpoint of a given URL."""
    try:
        response = requests.get(url, timeout=REQUEST_TIMEOUT)
        return response.status_code == 200
    except requests.exceptions.RequestException:
        return False

def main():
    global RECORD_ID
    if not LINODE_API_TOKEN:
        print("Error: LINODE_API_TOKEN environment variable not set.")
        return
    if not LINODE_ZONE_ID:
        print("Error: LINODE_ZONE_ID not configured.")
        return

    # Dynamically fetch RECORD_ID if not hardcoded
    if not RECORD_ID:
        RECORD_ID = get_dns_record_id(DOMAIN_NAME, LINODE_ZONE_ID)
        if not RECORD_ID:
            return # Error message already printed by get_dns_record_id

    print(f"Starting health checks. Interval: {CHECK_INTERVAL_SECONDS}s")
    while True:
        primary_healthy = check_health(PRIMARY_HEALTH_URL)
        secondary_healthy = check_health(SECONDARY_HEALTH_URL)

        current_target_ip = None
        try:
            # Fetch current DNS record to determine current state
            record_url = f"https://api.linode.com/v4/domains/{LINODE_ZONE_ID}/records/{RECORD_ID}"
            response = requests.get(record_url, headers=HEADERS)
            response.raise_for_status()
            current_target_ip = response.json().get("data", {}).get("target")
        except requests.exceptions.RequestException as e:
            print(f"Could not fetch current DNS record: {e}")
            # Continue with checks, but be cautious about updates

        if primary_healthy and current_target_ip != PRIMARY_REGION_IP:
            print("Primary region is healthy. Failing over to primary.")
            update_dns_record(LINODE_ZONE_ID, RECORD_ID, PRIMARY_REGION_IP)
        elif not primary_healthy and secondary_healthy and current_target_ip != SECONDARY_REGION_IP:
            print("Primary region is unhealthy, secondary is healthy. Failing over to secondary.")
            update_dns_record(LINODE_ZONE_ID, RECORD_ID, SECONDARY_REGION_IP)
        elif not primary_healthy and not secondary_healthy:
            print("Both regions are unhealthy. No DNS change made.")
        elif primary_healthy and current_target_ip == SECONDARY_REGION_IP:
            print("Primary region is healthy, but DNS points to secondary. Failing back to primary.")
            update_dns_record(LINODE_ZONE_ID, RECORD_ID, PRIMARY_REGION_IP)
        else:
            print("System is stable. No changes needed.")

        time.sleep(CHECK_INTERVAL_SECONDS)

if __name__ == "__main__":
    main()

Deployment:

Install Python and the requests library on your monitoring server: pip install requests.
Set the LINODE_API_TOKEN environment variable.
Fill in the configuration variables (IP addresses, domain, zone ID, record ID). You can find the LINODE_ZONE_ID in the URL when you view your domain in Linode DNS Manager (e.g., /dns/manage/12345, where 12345 is the ID). The RECORD_ID can be found by inspecting the network requests in your browser’s developer tools when viewing the DNS records, or by using the get_dns_record_id function.
Run the script: python your_monitor_script.py. For production, run it using a process manager like systemd or supervisor.

Application-Level Failover Considerations

While DNS failover handles traffic redirection, your PHP application needs to be aware of its operational region. If the application relies on region-specific services (e.g., S3 buckets, SQS queues), its configuration must be updated to reflect the new active region. This can be achieved by:

Updating environment variables on the newly active Linode instance (e.g., AWS_REGION). This can be done via SSH commands executed by the monitoring script after DNS update, or through a configuration management tool like Ansible.
Restarting the PHP-FPM service or web server to pick up the new environment variables.

Example of updating environment variables and restarting PHP-FPM via SSH (to be added to the Python script):

import paramiko

def update_remote_env_and_restart(hostname, username, password, region_var_value):
    try:
        client = paramiko.SSHClient()
        client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        client.connect(hostname, username=username, password=password)

        # Update environment variable in a file (e.g., /etc/environment or a custom script)
        # This is a simplified example; a robust solution might involve updating a .env file
        # or a systemd service file.
        # Example: Appending to a custom env file
        env_file_path = "/opt/your_app/.env"
        update_command = f"echo 'AWS_REGION={region_var_value}' >> {env_file_path}"
        stdin, stdout, stderr = client.exec_command(update_command)
        print(f"STDOUT: {stdout.read().decode()}")
        print(f"STDERR: {stderr.read().decode()}")

        # Restart PHP-FPM (adjust service name if necessary)
        restart_command = "sudo systemctl restart php8.1-fpm" # Example for PHP 8.1
        stdin, stdout, stderr = client.exec_command(restart_command)
        print(f"STDOUT: {stdout.read().decode()}")
        print(f"STDERR: {stderr.read().decode()}")

        client.close()
        print(f"Successfully updated environment and restarted PHP-FPM on {hostname}")
        return True
    except Exception as e:
        print(f"Error connecting to {hostname} or executing commands: {e}")
        return False

# In the main loop, after updating DNS:
# if update_dns_record(...):
#     if current_target_ip == SECONDARY_REGION_IP: # Failing back to primary
#         update_remote_env_and_restart(PRIMARY_LINODE_HOSTNAME, 'root', 'YOUR_SSH_PASSWORD', 'us-east-1')
#     else: # Failing over to secondary
#         update_remote_env_and_restart(SECONDARY_LINODE_HOSTNAME, 'root', 'YOUR_SSH_PASSWORD', 'eu-central-1')

Note: Storing SSH passwords directly in scripts is insecure. Use SSH keys for authentication and consider a secrets management solution.

Testing and Validation

Thorough testing is paramount. Simulate failures by:

Stopping the web server or PHP-FPM on the primary Linode instance.
Simulating network partitions.
Manually triggering the health check script to return an error.

Monitor the DNS propagation time and verify that traffic is correctly routed to the secondary region. Check application logs on both regions to ensure data consistency and proper operation. Perform a failback test to ensure the primary region can resume its role seamlessly.

Disaster Recovery 101: Architecting Auto-Failovers for DynamoDB and PHP Deployments on Linode

Establishing Multi-Region DynamoDB Replication

Creating the Global Table

Architecting PHP Application Failover on Linode

Infrastructure Setup

DNS Failover Strategy

Automating DNS Updates

Application-Level Failover Considerations

Testing and Validation

Recent Posts

Top Categories

Our Products

Our Services