Automating Multi-Region Redundancy for PHP Architectures on OVH

Establishing Multi-Region Redundancy with OVHcloud: A Deep Dive for PHP Architectures

Achieving robust disaster recovery for critical PHP applications necessitates a multi-region strategy. This post outlines a practical, production-grade approach leveraging OVHcloud’s infrastructure, focusing on automated failover and data synchronization for a resilient architecture.

Core Components: Load Balancers, Databases, and Application Servers

Our strategy hinges on three primary pillars:

Global Load Balancer (GLB): Directs traffic to the active region, performing health checks and facilitating failover. OVHcloud’s Load Balancer service is ideal for this.
Database Replication: Ensures data consistency across regions. We’ll focus on PostgreSQL replication, a robust and widely supported option.
Application Server Deployment: Identical, stateless application instances deployed in each region, managed via an orchestration tool like Ansible.

Database Replication Strategy: PostgreSQL Streaming Replication

For multi-region redundancy, synchronous or asynchronous streaming replication is paramount. We’ll configure a primary-replica setup where the primary database resides in Region A and a replica in Region B. For disaster recovery, the replica in Region B will be promoted to primary upon failure of Region A.

Configuring PostgreSQL Primary (Region A)

On the primary PostgreSQL server in Region A, modify postgresql.conf and pg_hba.conf.

`postgresql.conf` (Primary)

Ensure the following parameters are set:

wal_level = replica
max_wal_senders = 5
wal_keep_segments = 64
archive_mode = on
archive_command = 'cp %p /path/to/wal_archive/%f'

wal_level must be set to replica or higher. max_wal_senders defines the number of concurrent replication connections. wal_keep_segments prevents WAL files from being removed too soon. archive_command is crucial for point-in-time recovery and can be used as a fallback if streaming replication fails, though it adds latency.

`pg_hba.conf` (Primary)

Allow replication connections from the replica server in Region B. Replace <replica_ip_region_b> with the actual IP address of your PostgreSQL replica in Region B.

host    replication     replicator      <replica_ip_region_b>/32        md5

Create a replication user:

CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'your_replication_password';

Restart PostgreSQL on the primary server.

Configuring PostgreSQL Replica (Region B)

On the replica PostgreSQL server in Region B, stop the PostgreSQL service, ensure the data directory is empty, and then initialize it from the primary.

sudo systemctl stop postgresql
# Ensure data directory is empty (e.g., /var/lib/postgresql/13/main)
sudo rm -rf /var/lib/postgresql/13/main/*
sudo -u postgres pg_basebackup -h <primary_ip_region_a> -D /var/lib/postgresql/13/main -U replicator -P -v -W
# Create a recovery.conf file (or use postgresql.conf for newer versions)
sudo su - postgres -c "echo \"standby_mode = 'on'\" > /var/lib/postgresql/13/main/recovery.conf"
sudo su - postgres -c "echo \"primary_conninfo = 'host=<primary_ip_region_a> port=5432 user=replicator password=your_replication_password'\" >> /var/lib/postgresql/13/main/recovery.conf"
sudo su - postgres -c "echo \"restore_command = 'cp /path/to/wal_archive/%f %p'\" >> /var/lib/postgresql/13/main/recovery.conf"
# For PostgreSQL 12+, recovery settings are in postgresql.conf
# If using PostgreSQL 12+, edit postgresql.conf and add:
# primary_conninfo = 'host=<primary_ip_region_a> port=5432 user=replicator password=your_replication_password'
# restore_command = 'cp /path/to/wal_archive/%f %p'
# standby_mode = 'on'
sudo chown -R postgres:postgres /var/lib/postgresql/13/main
sudo systemctl start postgresql

pg_basebackup performs a base backup from the primary. recovery.conf (or equivalent settings in postgresql.conf) tells the replica how to connect to the primary and where to find WAL files for recovery. Ensure the restore_command points to a location accessible by the replica server, ideally an object storage bucket or a shared network mount replicated across regions.

Monitoring Replication Status

On the replica, check the logs for successful connection and streaming. You can also query:

SELECT pg_is_in_recovery(); -- Should return 't' (true)
SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(); -- Compare these LSNs to the primary's pg_current_wal_lsn()

Application Server Deployment and Orchestration

We’ll use Ansible for idempotent deployment of our PHP application across multiple OVHcloud regions. Each region will host a set of identical application servers.

Ansible Playbook Structure

Your Ansible inventory should define hosts per region:

[region_a_app_servers]
appserver1.region_a.example.com ansible_host=192.0.2.1
appserver2.region_a.example.com ansible_host=192.0.2.2

[region_b_app_servers]
appserver1.region_b.example.com ansible_host=192.0.2.10
appserver2.region_b.example.com ansible_host=192.0.2.11

[all_app_servers:children]
region_a_app_servers
region_b_app_servers

A simplified playbook for deploying the application:

--- - name: Deploy PHP Application hosts: all_app_servers become: yes vars: app_repo: "[email protected]:your_org/your_php_app.git" app_deploy_path: "/var/www/html/your_app" php_version: "8.1" db_host: "{{ hostvars[groups['region_a_app_servers'][0]]['ansible_host'] }}" # Default to Region A DB db_user: "appuser" db_password: "your_db_password" db_name: "appdb" tasks: - name: Ensure PHP and web server are installed apt: name: - php{{ php_version }} - libapache2-mod-php{{ php_version }} - php{{ php_version }}-mysql - php{{ php_version }}-pgsql - apache2 state: present update_cache: yes - name: Configure Apache VirtualHost template: src: templates/vhost.conf.j2 dest: /etc/apache2/sites-available/your_app.conf notify: restart apache - name: Enable site and rewrite module apache2_module: name: "{{ item }}" state: present loop: - rewrite - php{{ php_version }} notify: restart apache - name: Enable virtual host command: a2ensite your_app.conf args: creates: /etc/apache2/sites-enabled/your_app.conf notify: restart apache - name: Deploy application code from Git git: repo: "{{ app_repo }}" dest: "{{ app_deploy_path }}" version: main # Or a specific tag/branch force: yes - name: Install Composer dependencies composer: working_dir: "{{ app_deploy_path }}" environment: COMPOSER_HOME: "/root/.composer" # Or appropriate user - name: Create .env file from template template: src: templates/env.j2 dest: "{{ app_deploy_path }}/.env" owner: www-data group: www-data mode: '0644' notify: restart apache handlers: - name: restart apache service: name: apache2 state: restarted

The .env.j2 template should dynamically set the database connection details. Crucially, for Region B servers, the db_host variable needs to be overridden to point to the PostgreSQL replica in Region B during deployment or via a separate Ansible task.

Dynamic Database Host Configuration for Region B

You can achieve this by passing extra variables or using Ansible facts. A common approach is to use a conditional in your playbook or a separate role.

# In your playbook, within the tasks for region_b_app_servers
    - name: Set Region B DB host
      set_fact:
        db_host: "{{ hostvars[groups['region_b_app_servers'][0]]['ansible_host'] }}" # Assuming first host in group has DB IP fact
      when: inventory_hostname in groups['region_b_app_servers']

    # Then use {{ db_host }} in the .env.j2 template

Alternatively, you can define a separate inventory group for Region B's database and reference it.

OVHcloud Global Load Balancer (GLB) Configuration

The OVHcloud GLB is the entry point for your application. It will distribute traffic between your active region and your passive (DR) region. We'll configure it with health checks to automatically detect failures.

Setting up the GLB

Navigate to the OVHcloud Control Panel -> Network -> Load Balancer. Create a new Load Balancer instance.

Frontend Configuration

Define your frontend listener (e.g., TCP port 80 or 443). For HTTPS, you'll need to manage SSL certificates, either on the GLB itself or passed through to the backend servers.

Backend Pool Configuration

Create two backend pools:

Pool A (Primary Region): Add the IP addresses of your application servers in Region A.
Pool B (DR Region): Add the IP addresses of your application servers in Region B.

Health Checks

This is critical for failover. Configure health checks for each pool:

Protocol: HTTP
Port: 80 (or your application's port)
URI: A dedicated health check endpoint in your PHP application (e.g., /healthz). This endpoint should return a 200 OK status code if the application is healthy and connected to its database.
Interval: 10 seconds
Timeout: 5 seconds
Unhealthy Threshold: 3 consecutive failures
Healthy Threshold: 2 consecutive successes

The health check endpoint (/healthz) should perform a basic database query to ensure connectivity. If the database is unreachable, it should return a non-2xx status code.

<?php
// public/healthz.php
header('Content-Type: application/json');

$db_host = getenv('DB_HOST');
$db_user = getenv('DB_USER');
$db_pass = getenv('DB_PASS');
$db_name = getenv('DB_NAME');

try {
    $dsn = "pgsql:host={$db_host};port=5432;dbname={$db_name}";
    $options = [
        PDO::ATTR_ERRMODE            => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
        PDO::ATTR_EMULATE_PREPARES   => false,
        PDO::ATTR_TIMEOUT            => 2, // 2-second timeout
    ];
    $pdo = new PDO($dsn, $db_user, $db_pass, $options);
    // Simple query to check connectivity
    $stmt = $pdo->query('SELECT 1');
    $stmt->execute();

    echo json_encode(['status' => 'ok', 'message' => 'Database connected']);
    http_response_code(200);
} catch (\PDOException $e) {
    http_response_code(503); // Service Unavailable
    echo json_encode(['status' => 'error', 'message' => 'Database connection failed: ' . $e->getMessage()]);
}
?>

Load Balancing Algorithm and Failover Logic

Configure the GLB to use a load balancing algorithm (e.g., Round Robin). The key is the failover behavior. OVHcloud's GLB typically prioritizes the primary pool. If all servers in the primary pool become unhealthy, it will automatically switch traffic to the secondary pool.

Automating Failover and Failback

Manual failover is prone to error and delay. Automation is essential.

Database Failover (Manual Promotion)

When Region A's primary database fails, you need to promote the replica in Region B. This is typically a manual step, but can be scripted.

# On the PostgreSQL replica in Region B:
sudo su - postgres -c "pg_ctl promote -D /var/lib/postgresql/13/main"

# Verify promotion
sudo su - postgres -c "SELECT pg_is_in_recovery();" -- Should return 'f' (false)

After promotion, you must update the application servers in Region B to point to the new primary database in Region B. This can be done via Ansible.

# Run this playbook targeting only region_b_app_servers
---
- name: Update DB config for Region B after failover
  hosts: region_b_app_servers
  become: yes
  vars:
    db_host: "{{ hostvars[groups['region_b_app_servers'][0]]['ansible_host'] }}" # IP of the newly promoted primary in Region B
    db_user: "appuser"
    db_password: "your_db_password"
    db_name: "appdb"

  tasks:
    - name: Create/Update .env file for new primary DB
      template:
        src: templates/env.j2
        dest: "/var/www/html/your_app/.env"
        owner: www-data
        group: www-data
        mode: '0644'
      notify: restart apache
  handlers:
    - name: restart apache
      service:
        name: apache2
        state: restarted

GLB Failover

The OVHcloud GLB handles automatic failover between pools based on health checks. When the primary pool (Region A) becomes unhealthy, traffic will automatically be routed to the secondary pool (Region B). This is the "automatic" part of the failover.

Failback Strategy

Failback is often more complex than failover. Once the issue in Region A is resolved:

Database: You'll need to re-establish replication from the current primary (Region B) back to a new primary in Region A. This might involve setting up a new primary in Region A and replicating from Region B, or reversing roles.
Application Servers: Update application servers in Region A to point to the primary database in Region A.
GLB: Once Region A's application servers are healthy and pointing to the correct database, the GLB will automatically start sending traffic back to Pool A as its health checks pass.

A robust failback procedure often involves a maintenance window and careful orchestration to avoid data loss or inconsistencies.

Monitoring and Alerting

Comprehensive monitoring is non-negotiable. Implement:

OVHcloud Monitoring: Utilize OVHcloud's built-in monitoring for GLB health, server status, and resource utilization.
Application-Level Monitoring: Tools like Prometheus/Grafana, Datadog, or New Relic to monitor application performance, error rates, and database connection status.
Database Replication Lag: Monitor PostgreSQL replication lag using tools like pg_stat_replication on the primary and by checking LSNs on the replica.
Alerting: Configure alerts for critical events: GLB pool failure, database replication errors, high replication lag, and health check failures. Integrate with PagerDuty, Opsgenie, or Slack.

Security Considerations

Ensure secure communication between components:

Database Connections: Use SSL/TLS for PostgreSQL replication and application connections.
Ansible: Use SSH keys and encrypt sensitive variables (e.g., database passwords) using Ansible Vault.
Network Security: Configure OVHcloud Security Groups and Firewall rules to restrict access to only necessary ports and IP ranges.
GLB: If using HTTPS, ensure proper SSL certificate management.

Conclusion

Implementing multi-region redundancy requires careful planning and execution. By combining OVHcloud's GLB, PostgreSQL streaming replication, and an orchestration tool like Ansible, you can build a highly available and resilient PHP architecture capable of withstanding regional outages. Continuous monitoring and well-defined failover/failback procedures are key to maintaining operational integrity.