Disaster Recovery 101: Architecting Auto-Failovers for DynamoDB and PHP Deployments on OVH
Establishing Multi-Region DynamoDB Replication
For robust disaster recovery, multi-region replication is paramount. DynamoDB’s Global Tables offer a managed solution for this. We’ll configure active-active replication between two OVH Public Cloud regions, for instance, ‘Gravelines’ (GRA) and ‘Roubaix’ (RBX).
The process involves creating a DynamoDB table in the primary region and then enabling global replication to the secondary region. This is typically done via the AWS CLI or SDKs. Assuming you have AWS credentials configured to interact with your OVH-managed AWS resources (often through a hybrid cloud setup or specific OVH services that integrate with AWS APIs), the CLI commands are as follows:
First, create the table in the primary region (e.g., GRA):
aws dynamodb create-table \
--region-name eu-west-3 \
--table-name MyApplicationTable \
--attribute-definitions AttributeName=id,AttributeType=S \
--key-schema AttributeName=id,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--billing-mode PROVISIONED
Next, enable global replication to the secondary region (e.g., RBX). This command assumes the table already exists in the target region or will be created by the replication process. For simplicity, we’ll assume it’s created separately or that the `update-table` command will handle it if it doesn’t exist.
To enable replication, you’ll use the `update-table` command with the `–replica-updates` parameter. This is a crucial step for active-active replication.
aws dynamodb update-table \
--region-name eu-west-3 \
--replica-updates '[{"Create": {"RegionName": "eu-west-3"}}, {"Create": {"RegionName": "eu-west-1"}}]'
The `update-table` command with `–replica-updates` is idempotent for creating replicas. If a replica already exists, it will be ignored. If you are setting up Global Tables for the first time, you might need to explicitly create the table in the secondary region first, then add it as a replica. The above command assumes a more modern approach where DynamoDB Global Tables manage the creation of the replica table in the specified region.
To verify replication status, you can use:
aws dynamodb describe-table --region-name eu-west-3 --table-name MyApplicationTable
Look for the Replicas section in the output. It should list the replica in eu-west-1 (RBX) with a status of ENABLE.
PHP Application Configuration for Multi-Region Awareness
Your PHP application needs to be aware of the multi-region setup. This typically involves configuring your database client to connect to the nearest or most appropriate region. For DynamoDB, the AWS SDK for PHP handles this gracefully.
The key is to dynamically set the region based on the deployment environment. We can achieve this using environment variables or configuration files.
Here’s a PHP snippet demonstrating how to initialize the DynamoDB client, prioritizing the local region:
<?php
require 'vendor/autoload.php'; // Assuming you use Composer
use Aws\DynamoDb\DynamoDbClient;
use Aws\Exception\AwsException;
// Determine the current region. This could be from an environment variable,
// an EC2 instance metadata service (if applicable), or a configuration file.
// For OVH Public Cloud, you might rely on environment variables set during deployment.
$currentRegion = getenv('AWS_REGION') ?: 'eu-west-3'; // Default to GRA if not set
try {
$dynamoDbClient = new DynamoDbClient([
'region' => $currentRegion,
'version' => 'latest',
// Add credentials if not using IAM roles or environment variables
// 'credentials' => [
// 'key' => 'YOUR_ACCESS_KEY_ID',
// 'secret' => 'YOUR_SECRET_ACCESS_KEY',
// ],
]);
// Example: Get an item
$result = $dynamoDbClient->getItem([
'TableName' => 'MyApplicationTable',
'Key' => [
'id' => ['S' => 'some-item-id'],
],
]);
if (isset($result['Item'])) {
// Process item
print_r($result['Item']);
} else {
echo "Item not found.\n";
}
} catch (AwsException $e) {
// Handle exceptions, potentially logging and triggering alerts
error_log("DynamoDB Error: " . $e->getMessage());
// In a failover scenario, you might want to try connecting to the other region
// or return a specific error to the user.
echo "An error occurred. Please try again later.\n";
}
?>
In a real-world scenario, the application would be deployed in both OVH regions (GRA and RBX). Each deployment would be configured to use its local DynamoDB endpoint. DynamoDB Global Tables automatically handle data synchronization between regions.
Automating Failover with Load Balancers and Health Checks
To achieve automatic failover for your PHP application instances, you’ll need a load balancing solution that supports health checks and can redirect traffic between regions. OVH Public Cloud Load Balancer is a suitable choice.
The strategy involves deploying identical PHP application stacks in both OVH regions. Each stack will be configured to connect to its local DynamoDB replica.
Step 1: Deploy Application Stacks
- Deploy your PHP application (e.g., using Docker containers orchestrated by Kubernetes or a simpler setup with systemd services) in OVH region GRA. Ensure it’s configured to use the DynamoDB endpoint in GRA (
eu-west-3). - Deploy the exact same application stack in OVH region RBX. Ensure it’s configured to use the DynamoDB endpoint in RBX (
eu-west-1).
Step 2: Configure OVH Load Balancer
You’ll typically use a single OVH Load Balancer instance (or a highly available pair if OVH offers that for their LB) that spans across regions, or more commonly, you’d have region-specific load balancers that are then fronted by a global DNS solution with health checks.
For simplicity, let’s assume a single OVH Load Balancer instance that can target backend servers in multiple regions. If OVH’s LB is strictly regional, you’d use a global DNS service (like OVH’s DNS, AWS Route 53, or Cloudflare) with health checks pointing to the regional LBs.
OVH Load Balancer Configuration (Conceptual):
# This is a conceptual representation. Actual OVH LB configuration is via their API/UI.
# Frontend Configuration
frontend http_frontend
bind *:80
mode http
default_backend web_servers
# Backend Pool 1: GRA Region
backend web_servers_gra
mode http
balance roundrobin
option httpchk GET /healthz HTTP/1.1
server app_gra_1 192.168.1.10:80 check port 80
server app_gra_2 192.168.1.11:80 check port 80
# Backend Pool 2: RBX Region
backend web_servers_rbx
mode http
balance roundrobin
option httpchk GET /healthz HTTP/1.1
server app_rbx_1 192.168.2.10:80 check port 80
server app_rbx_2 192.168.2.11:80 check port 80
# Combined Backend (if LB supports multi-region targets directly)
# Or, a global DNS solution would point to web_servers_gra and web_servers_rbx
# based on health checks of the regional load balancers.
backend all_web_servers
mode http
balance roundrobin
# Health check on the backend servers themselves
option httpchk GET /healthz HTTP/1.1
# Servers in GRA
server app_gra_1 192.168.1.10:80 check port 80
server app_gra_2 192.168.1.11:80 check port 80
# Servers in RBX
server app_rbx_1 192.168.2.10:80 check port 80
server app_rbx_2 192.168.2.11:80 check port 80
The critical part is the option httpchk. This tells the load balancer to periodically send an HTTP request (e.g., to a /healthz endpoint on your PHP application) and consider a server unhealthy if it doesn’t respond successfully (e.g., with a 2xx or 3xx status code) within a timeout period.
Step 3: Implement Health Check Endpoint in PHP
Your PHP application needs a simple endpoint that returns a 200 OK status if the application is healthy and can connect to its local DynamoDB instance. This endpoint should *not* perform heavy operations.
<?php
// healthz.php
// Assume $dynamoDbClient is initialized and available (e.g., via a singleton or dependency injection)
// If not, initialize it here, but keep it lightweight.
// It's better if the main client is reused.
// Attempt a very simple, non-intrusive DynamoDB operation.
// A simple GetItem on a known, possibly cached, non-critical item is good.
// Or, even better, check connectivity without a full DB query if possible.
// For this example, we'll assume a basic client is available and check its region.
header('Content-Type: application/json');
$response = ['status' => 'unhealthy', 'message' => 'Unknown error'];
$statusCode = 503; // Service Unavailable
try {
// Re-initialize client if not globally available, but keep it minimal.
// In a real app, this would be injected.
$currentRegion = getenv('AWS_REGION') ?: 'eu-west-3';
$dynamoDbClient = new Aws\DynamoDb\DynamoDbClient([
'region' => $currentRegion,
'version' => 'latest',
]);
// A very light check: try to describe the table. This confirms connectivity and auth.
// If this is too slow, consider a simple PutItem/DeleteItem on a dummy key
// that is immediately deleted, or a GetItem on a known-good, low-traffic item.
$dynamoDbClient->describeTable(['TableName' => 'MyApplicationTable']);
// If describeTable succeeds, the service is reachable and authenticated.
$response = ['status' => 'healthy', 'region' => $currentRegion];
$statusCode = 200; // OK
} catch (Aws\DynamoDb\Exception\ResourceNotFoundException $e) {
// Table not found, but service is reachable. This is still a problem.
$response = ['status' => 'unhealthy', 'message' => 'DynamoDB table not found', 'region' => $currentRegion];
$statusCode = 503;
error_log("Health Check Error (Table Not Found): " . $e->getMessage());
} catch (Aws\Exception\AwsException $e) {
// Other AWS errors (connection issues, auth errors, etc.)
$response = ['status' => 'unhealthy', 'message' => 'DynamoDB connection error', 'region' => $currentRegion];
$statusCode = 503;
error_log("Health Check Error (AWS Exception): " . $e->getMessage());
} catch (Exception $e) {
// General PHP errors
$response = ['status' => 'unhealthy', 'message' => 'Application error', 'region' => $currentRegion];
$statusCode = 503;
error_log("Health Check Error (General Exception): " . $e->getMessage());
}
http_response_code($statusCode);
echo json_encode($response);
exit;
?>
When the health check fails for all instances in one region, the load balancer will automatically stop sending traffic to that region and direct all requests to the healthy instances in the other region. Once the unhealthy region recovers, the load balancer will start sending traffic to it again.
DNS-Level Failover for Global Reachability
For true global disaster recovery, especially if your OVH Load Balancers are regional or if you need a single global entry point, DNS-level failover is essential. This involves using a managed DNS service that supports health checks and weighted routing or failover routing.
Scenario: Using OVH DNS with Health Checks
OVH provides DNS services. If they offer health check capabilities for DNS records (e.g., pointing to the IP addresses of your regional load balancers or directly to your application servers if they have static IPs), you can configure failover.
The setup would look like this:
- Configure your application in GRA to be accessible via a regional load balancer (e.g.,
lb-gra.yourdomain.com). - Configure your application in RBX to be accessible via a regional load balancer (e.g.,
lb-rbx.yourdomain.com). - In OVH DNS (or your chosen DNS provider), create two A records for your main domain (e.g.,
app.yourdomain.com): - Record 1: Points to the IP of
lb-gra.yourdomain.com. Configure a health check that monitors the health oflb-gra.yourdomain.com(or a specific endpoint on it). Set this as the primary record. - Record 2: Points to the IP of
lb-rbx.yourdomain.com. Configure a health check for this IP/endpoint. Set this as the secondary (failover) record.
When the health check for the primary record fails, the DNS service will automatically start resolving app.yourdomain.com to the IP address of the secondary record. This provides a seamless failover experience for users.
Example using a hypothetical DNS provider with health checks:
# Hypothetical DNS Configuration Snippet
# This is NOT actual OVH DNS syntax, but illustrates the concept.
domain: yourdomain.com
records:
- name: app
type: A
priority: 10 # Lower number = higher priority
value: <IP_ADDRESS_OF_GRA_LB>
health_check:
protocol: HTTP
port: 80
path: /healthz
interval: 30s
timeout: 5s
failures_before_failover: 3
- name: app
type: A
priority: 20
value: <IP_ADDRESS_OF_RBX_LB>
health_check:
protocol: HTTP
port: 80
path: /healthz
interval: 30s
timeout: 5s
failures_before_failover: 3
The priority field determines the primary/secondary relationship. The health_check configuration defines how the DNS provider monitors the availability of the IP address. If the primary IP becomes unresponsive, DNS resolution will switch to the secondary IP.
Monitoring and Alerting for Proactive Recovery
Automated failover is only effective if you are aware when it happens and if it’s working correctly. Comprehensive monitoring and alerting are crucial.
Key Metrics to Monitor:
- Application Health Checks: Monitor the success/failure rate of your
/healthzendpoint in both regions. - Load Balancer Metrics: Track active connections, request rates, error rates (5xx), and backend health status for each region.
- DynamoDB Metrics: Monitor
ConsumedReadCapacityUnits,ConsumedWriteCapacityUnits,ThrottledRequests, andSystemErrorsfor tables in both regions. Pay close attention to replication lag if available. - Network Latency: Monitor latency between regions and from end-users to your load balancers.
- Resource Utilization: CPU, memory, and network I/O for your application servers.
Alerting Strategy:
- Critical Alerts: Trigger alerts for sustained health check failures in a region, high error rates on load balancers, or significant DynamoDB throttling. These should notify your on-call engineers immediately.
- Warning Alerts: Notify for increased latency, approaching capacity limits, or minor increases in throttled requests.
- Failover Confirmation: Set up alerts that trigger when the DNS or Load Balancer health checks change state (e.g., from healthy to unhealthy, or when traffic shifts between regions). This confirms that the automated failover mechanism has engaged.
Tools like Prometheus with Alertmanager, Datadog, or OVH’s own monitoring solutions can be integrated to collect these metrics and trigger alerts via email, Slack, PagerDuty, etc.
By combining multi-region DynamoDB replication, resilient PHP application deployments, intelligent load balancing with health checks, and robust DNS failover, you can architect a highly available system on OVH Public Cloud that automatically recovers from regional outages.